Issue 436: clean up hash tables and functions

Title	clean up hash tables and functions
Priority	wish	Status	resolved
Superseder		Nosy List	florian, jendrik, malte, pvonreth, silvan
Assigned To	pvonreth	Keywords
Optional summary

Created on 2014-05-21.19:07:56 by malte, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
testme.cc	malte, 2015-03-31.08:25:38	text/x-c++src

Messages
msg4304 (view)	Author: malte	Date: 2015-07-02.01:02:20
> I assume that for other configurations, you did not get such large > differences, did you? If you look at the old discussion, there's a suggestion that hash_set/hash_map might be slightly more memory-efficient than unordered_set/unordered_map. I looked at all the cases in your experiment data where we lose coverage (across all configurations and domains: 13 cases). In all cases, the old code was very close to the memory limit when it solved the problem. The limit is 2 GiB, or 2097152 KiB. Here is the peak memory of the old code for the 13 cases we lose, in sorted order: 1890552 1937408 1939648 1997052 1997100 2039932 2040848 2047912 2051104 2055864 2078196 2078196 2080324 So we're always close to the memory limit: always within 10%, and in most cases within much less than 5%. My suggestion: live with the loss in coverage, and take this as an indication that we could improve M&S by being a bit more careful about memory. If and when we find time to work on the code some more, memory could be one of the main focus areas.
msg4303 (view)	Author: jendrik	Date: 2015-07-01.17:24:53
I don't remember any numbers, but I think none of the tested configs was affected really badly.
msg4302 (view)	Author: silvan	Date: 2015-07-01.15:35:24
For the record, I obtained the following results when comparing a revision before merging issue436 (074175a9b206) and after merging it (26be536ac799): http://ai.cs.unibas.ch/_tmp_files/sieverss/2015-06-26-fetch-regression-0-0.5-comp.html Given the small changes in code related to merge-and-shrink with these two revisions, I really wonder if the performance difference of unordered_set vs hash_set (used in state_registry) or unordered_map vs hash_map (used in shrink_bisimulation9 is really that large. I assume that for other configurations, you did not get such large differences, did you?
msg4292 (view)	Author: pvonreth	Date: 2015-06-30.14:48:24
I'm sorry but the winkde.org server went offline without anouncement and will p probably not come back. The results are not the only thing I lost there.
msg4291 (view)	Author: silvan	Date: 2015-06-30.09:47:37
Patrick, do you still have the tables of the experiments linked below? I'd be interested to have a look at them. This issue decreased the performance of standard merge-and-shrink configurations, and I wonder why this has not been noticed during the experiments. I assume that either the configs I tested have not been tested in this issue, or that due to the many "merge from defaults" included in this issue, the experiments compared against a too old version of the default branch. Anyway, I'd just be interested to have a look at the experiments of this issue.
msg4127 (view)	Author: jendrik	Date: 2015-03-31.12:33:42
Merged and pushed. Thanks for your work on this, Patrick!
msg4126 (view)	Author: malte	Date: 2015-03-31.08:25:38
I've attached some test code, which on my machine shows that a test program that uses a hash_map<int, int> can grow this map further than an equivalent program using an unordered_map<int, int>. If you want to try this out, run it with a ulimit like so: g++ -Wno-deprecated -O3 -std=c++11 testme.cc -o testme # for unordered map: ( ulimit -v 1048576 && ./testme unordered_map ) # for hash_map ( ulimit -v 1048576 && ./testme hash_map ) That unordered_map is less memory-efficient than hash_map is consistent with informal experiments we ran previously. (This is the main reason why we hadn't changed this before.) I don't think the effect is large enough to keep us from merging the code, though. Maybe at some point we can look into special-purpose hash implementations for the few large hashes we use (mainly the one used for state registries, possibly others inside merge and shrink?).
msg4125 (view)	Author: malte	Date: 2015-03-31.08:01:07
These are the regular scatter plots, not the special ones I mentioned in msg4108 that more clearly show slight differences. (Florian or Jendrik, we discussed this point offline, I think?) But even so, the picture seems clear enough for the optimal configurations, and I don't think we need the other kind of scatter plots -- just a note for future experiments where the smaller differences might be more important. It seems we get slight improvements here and there (e.g. w.r.t. time in the landmark configurations), but an overall loss of coverage that is perhaps a bit too consistent to be random. Looking at the scatter plots for memory usage shows that the cases we lose are all ones where we were extremely close to the edge of exhausting the memory limit before. My hypothesis is that the hash_set/hash_map classes are either a little bit more memory-efficient and/or implemented to deal more gracefully with out-of-memory conditions compared to unordered_set/unordered_map. (For example, vectors tend to double in capacity when they need to be grown. If there isn't enough memory for that, an implementation might either give up completely or grow by smaller factor instead. There may be something similar inside the hash tables.)
msg4124 (view)	Author: pvonreth	Date: 2015-03-30.21:44:41
here are the results for the second run of the experiments including scatter plots http://winkde.org/~pvonreth/other/tmp/fd/issue436-sat-v2-eval/ http://winkde.org/~pvonreth/other/tmp/fd/issue436-opt-v2-eval/
msg4111 (view)	Author: malte	Date: 2015-03-24.18:54:46
No further comments on the code at this point.
msg4110 (view)	Author: pvonreth	Date: 2015-03-24.13:32:20
I've merged the default branch into my branch and fixed the conflicts.
msg4109 (view)	Author: malte	Date: 2015-03-24.13:05:09
I followed the link for the pull request, but it doesn't look reviewable -- it shows merge conflicts in two files.
msg4108 (view)	Author: malte	Date: 2015-03-24.13:03:55
There seems to be a slight negative trend overall. Do we know why? A while ago we devised some plots that can be used for comparison plots of attributes where differences tend to be slight (say, typically less than a factor of 2). Can we generate some such plots for total time and memory?
msg4107 (view)	Author: jendrik	Date: 2015-03-24.11:21:27
Patrick has prepared a pull request at https://bitbucket.org/TheOneRing/hg.fast-downward.org/pull-request/9/ and I have reviewed the code. He has also made an experiment for the relevant satisficing and optimal configurations. The results are here: http://winkde.org/~pvonreth/other/tmp/fd/
msg3729 (view)	Author: jendrik	Date: 2014-10-08.21:30:21
After this is done, we should revisit issue213.
msg3458 (view)	Author: malte	Date: 2014-09-19.16:08:03
Any volunteers to work on this?
msg3181 (view)	Author: malte	Date: 2014-05-21.19:07:56
We should consider switching from GNU's hash tables (hash_map, hash_set) to the C++ standard hash tables (unordered_map, unordered_set). We stuck with the GNU versions in the past because they used less memory when we last compared, but now we have better ways of measuring time and memory impact properly and could make a better informed decision than back in the day. Also, we should clean up the hash functions used. Currently, more or less the same hash function is implemented in several places. The basic hashing facilities should live in only one place. Also, some old code uses a different hash function (multiplicative congruential with factor 17) for no good reason. (See relaxation_heuristic and domain_transition_graph -- there may be other examples. Best to grep for 'hash'.)

History
Date	User	Action	Args
2015-07-02 01:02:20	malte	set	messages: + msg4304
2015-07-01 17:24:53	jendrik	set	messages: + msg4303
2015-07-01 15:35:24	silvan	set	messages: + msg4302
2015-06-30 14:48:24	pvonreth	set	messages: + msg4292
2015-06-30 09:47:37	silvan	set	nosy: + silvan messages: + msg4291
2015-03-31 12:33:42	jendrik	set	status: chatting -> resolved messages: + msg4127
2015-03-31 08:25:38	malte	set	files: + testme.cc messages: + msg4126
2015-03-31 08:01:08	malte	set	messages: + msg4125
2015-03-30 21:44:41	pvonreth	set	messages: + msg4124
2015-03-24 18:54:46	malte	set	messages: + msg4111
2015-03-24 13:32:20	pvonreth	set	messages: + msg4110
2015-03-24 13:05:09	malte	set	messages: + msg4109
2015-03-24 13:03:55	malte	set	messages: + msg4108
2015-03-24 11:21:27	jendrik	set	messages: + msg4107
2015-03-18 13:59:34	pvonreth	set	assignedto: pvonreth nosy: + pvonreth
2014-10-08 21:30:21	jendrik	set	messages: + msg3729
2014-09-19 16:08:03	malte	set	messages: + msg3458
2014-09-19 15:42:25	florian	set	nosy: + florian
2014-08-20 13:34:37	jendrik	set	nosy: + jendrik
2014-05-21 19:07:56	malte	create

Issue436