Issue 693: new hash function - Fast Downward issue tracker

Title	new hash function
Priority	feature	Status	resolved
Superseder		Nosy List	jendrik, malte
Assigned To	jendrik	Keywords
Optional summary

Created on 2016-12-14.09:37:54 by jendrik, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
compile.err	malte, 2017-01-02.18:45:10	application/octet-stream
v4-base-blind	jendrik, 2017-01-11.11:49:04	application/octet-stream
v4-blind	jendrik, 2017-01-11.11:49:12	application/octet-stream

Messages
msg6425 (view)	Author: malte	Date: 2017-07-09.12:11:50
Thanks!
msg6424 (view)	Author: jendrik	Date: 2017-07-09.12:05:05
Here are the results, I don't see a big effect: http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v8-blind-issue693-v7-issue693-v8-compare.html For completeness, here is the latest branch version (v8) vs. the default branch (v7-base): http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v8-blind-issue693-v7-base-issue693-v8-compare.html Merged and pushed.
msg6423 (view)	Author: malte	Date: 2017-07-08.15:38:25
Looks good. I'd be interested in seeing another set of experiments (blind search only is enough) to see if the change without the wrapper class led to a speedup. But you can merge either way.
msg6419 (view)	Author: jendrik	Date: 2017-07-07.19:45:28
I'm done handling your comments. Can I merge this?
msg6418 (view)	Author: malte	Date: 2017-07-07.18:44:59
Experimental results look good. I'm done with my bitbucket comments.
msg6412 (view)	Author: jendrik	Date: 2017-07-06.09:18:50
The results are in: http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v7-opt-issue693-v7-base-issue693-v7-compare.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v7-sat-issue693-v7-base-issue693-v7-compare.html
msg6410 (view)	Author: jendrik	Date: 2017-07-06.00:08:56
I have made the changes we discussed offline and will run experiments now: https://bitbucket.org/jendrikseipp/downward/pull-requests/66
msg6349 (view)	Author: malte	Date: 2017-05-07.17:54:51
I'd like to have a closer look at the code and the experimental results also see how they differ from machine to machine. There's a few other things in front of this in the review queue, though.
msg6347 (view)	Author: jendrik	Date: 2017-05-07.17:48:21
How do you propose to continue here?
msg6295 (view)	Author: jendrik	Date: 2017-04-29.09:23:41
You can find the code under experiments/issue693/hash-microbenchmark in the repo linked below.
msg6286 (view)	Author: malte	Date: 2017-04-28.21:35:00
Some very good-looking numbers mixed with some I find hard to understand. Can I get the code for the microbenchmarks somewhere? I wonder how much it changes from computer to computer.
msg6284 (view)	Author: jendrik	Date: 2017-04-28.21:09:42
I implemented a version of HashState using the Boost hash function. Here are the results (for each bitwidth I report the second iteration to account for cache effects): std::unordered_set: hash_combine + Boost hash FastHashSet: HashState + Boost hash BurtleBurtleHashSet: hashword() from lookup3 utils::HashSet: HashState + feed() HashWordHash: HashState + feed_ints() SpookyV2HashSet: SpookyHashV2 32-bit: Running nothing 100000 times: 0.000201s Running insert int into std::unordered_set 100000 times: 0.794704s Running insert int into FastHashSet 100000 times: 0.788237s Running insert int into utils::HashSet 100000 times: 0.886164s Running insert int into SpookyHash 100000 times: 1.15538s Running insert pair<int, int> into std::unordered_set 100000 times: 0.806538s Running insert pair<int, int> into FastHashSet 100000 times: 0.807989s Running insert pair<int, int> into utils::HashSet 100000 times: 0.849372s Running insert vector<int> of size 1 into std::unordered_set 100000 times: 0.416037s Running insert vector<int> of size 1 into FastHashSet 100000 times: 0.421279s Running insert vector<int> of size 1 into BurtleBurtleHashSet 100000 times: 0.445918s Running insert vector<int> of size 1 into utils::HashSet 100000 times: 0.450099s Running insert vector<int> of size 1 into HashWordHash 100000 times: 0.445355s Running insert vector<int> of size 1 into SpookyV2HashSet 100000 times: 0.735016s Running insert vector<int> of size 10 into std::unordered_set 100000 times: 0.712605s Running insert vector<int> of size 10 into FastHashSet 100000 times: 0.672364s Running insert vector<int> of size 10 into BurtleBurtleHashSet 100000 times: 0.695451s Running insert vector<int> of size 10 into utils::HashSet 100000 times: 0.735414s Running insert vector<int> of size 10 into HashWordHash 100000 times: 0.695138s Running insert vector<int> of size 10 into SpookyV2HashSet 100000 times: 1.23303s Running insert vector<int> of size 100 into std::unordered_set 100000 times: 3.91018s Running insert vector<int> of size 100 into FastHashSet 100000 times: 3.89545s Running insert vector<int> of size 100 into BurtleBurtleHashSet 100000 times: 3.72564s Running insert vector<int> of size 100 into utils::HashSet 100000 times: 3.92713s Running insert vector<int> of size 100 into HashWordHash 100000 times: 3.75994s Running insert vector<int> of size 100 into SpookyV2HashSet 100000 times: 4.67119s 64-bit: Running nothing 100000 times: 0.000199s Running insert int into std::unordered_set 100000 times: 0.767982s Running insert int into FastHashSet 100000 times: 0.75542s Running insert int into utils::HashSet 100000 times: 0.796044s Running insert int into SpookyHash 100000 times: 0.837655s Running insert pair<int, int> into std::unordered_set 100000 times: 0.774631s Running insert pair<int, int> into FastHashSet 100000 times: 0.798033s Running insert pair<int, int> into utils::HashSet 100000 times: 0.808774s Running insert vector<int> of size 1 into std::unordered_set 100000 times: 0.440595s Running insert vector<int> of size 1 into FastHashSet 100000 times: 0.418915s Running insert vector<int> of size 1 into BurtleBurtleHashSet 100000 times: 0.445275s Running insert vector<int> of size 1 into utils::HashSet 100000 times: 0.462981s Running insert vector<int> of size 1 into HashWordHash 100000 times: 0.472938s Running insert vector<int> of size 1 into SpookyV2HashSet 100000 times: 0.510124s Running insert vector<int> of size 10 into std::unordered_set 100000 times: 0.662004s Running insert vector<int> of size 10 into FastHashSet 100000 times: 0.728206s Running insert vector<int> of size 10 into BurtleBurtleHashSet 100000 times: 0.686017s Running insert vector<int> of size 10 into utils::HashSet 100000 times: 0.827219s Running insert vector<int> of size 10 into HashWordHash 100000 times: 0.718663s Running insert vector<int> of size 10 into SpookyV2HashSet 100000 times: 0.739045s Running insert vector<int> of size 100 into std::unordered_set 100000 times: 3.78722s Running insert vector<int> of size 100 into FastHashSet 100000 times: 4.2276s Running insert vector<int> of size 100 into BurtleBurtleHashSet 100000 times: 3.89874s Running insert vector<int> of size 100 into utils::HashSet 100000 times: 4.2328s Running insert vector<int> of size 100 into HashWordHash 100000 times: 4.02751s Running insert vector<int> of size 100 into SpookyV2HashSet 100000 times: 2.97786s
msg6131 (view)	Author: malte	Date: 2017-01-17.17:00:21
Thanks! For the next such experiment, I'd recommend running separate rounds of the whole thing, not separate rounds of the whole test, so rather than: for each test: run each test 10 times but rather 10 times: run each test Otherwise we still run the risk of CPU frequency scaling kicking in too late for the first tests. But this is not necessary here, I think, as the data looks fine to me. Agreed on SpookyHash. I think the next thing we could try is to speed up our 32-bit Jenkins' hash by reducing the overhead and evaluating it within the context of this mini-benchmark, and if we're successful, put it back into the planner. Discussing detailed suggestions on how to try to reduce the overhead would take a lot of time, I think, so you can either go ahead with some ideas you have, or we can discuss it next week when I'm back from Dagstuhl.
msg6130 (view)	Author: jendrik	Date: 2017-01-17.15:50:05
I addressed your comments and repeated the experiment: $ export DOWNWARD_BITWIDTH=32 && make && ./benchmark32 Running nothing 10000 times: 2.7e-05s 1.6e-05s 1.6e-05s 1.7e-05s 1.6e-05s 1.8e-05s 1.8e-05s 1.7e-05s 1.7e-05s 1.7e-05s Running insert int into std::unordered_set 10000 times: 0.081036s 0.079106s 0.079247s 0.084152s 0.08487s 0.08151s 0.083751s 0.080348s 0.079144s 0.07914s Running insert int into utils::HashSet 10000 times: 0.084762s 0.085265s 0.084493s 0.0838s 0.08397s 0.085129s 0.085078s 0.086782s 0.104818s 0.087344s Running insert pair<int, int> into std::unordered_set 10000 times: 0.08351s 0.082174s 0.081757s 0.083469s 0.085102s 0.084095s 0.08828s 0.086784s 0.087166s 0.08912s Running insert pair<int, int> into utils::HashSet 10000 times: 0.094726s 0.097081s 0.093484s 0.09684s 0.105848s 0.088336s 0.085695s 0.087439s 0.092227s 0.098264s Running insert vector<int> of size 1 into std::unordered_set 10000 times: 0.04596s 0.056363s 0.048646s 0.047186s 0.04645s 0.0438s 0.042544s 0.042643s 0.042583s 0.042839s Running insert vector<int> of size 1 into utils::HashSet 10000 times: 0.048968s 0.04695s 0.054449s 0.048542s 0.047915s 0.047623s 0.056844s 0.063033s 0.057115s 0.057683s Running insert vector<int> of size 10 into std::unordered_set 10000 times: 0.078425s 0.094143s 0.091946s 0.101197s 0.068526s 0.069803s 0.069481s 0.070252s 0.07039s 0.069613s Running insert vector<int> of size 10 into utils::HashSet 10000 times: 0.079197s 0.078231s 0.078407s 0.081361s 0.077901s 0.075849s 0.075296s 0.074961s 0.075143s 0.076078s Running insert vector<int> of size 100 into std::unordered_set 10000 times: 0.389071s 0.389709s 0.39034s 0.392658s 0.388745s 0.411826s 0.389423s 0.406438s 0.395687s 0.389923s Running insert vector<int> of size 100 into utils::HashSet 10000 times: 0.41582s 0.417067s 0.414103s 0.413918s 0.411115s 0.415822s 0.413745s 0.411823s 0.41271s 0.438373s Running insert vector<int> of size 101 into std::unordered_set 10000 times: 0.400385s 0.392617s 0.394046s 0.39738s 0.39478s 0.391783s 0.393181s 0.399569s 0.393019s 0.392476s Running insert vector<int> of size 101 into utils::HashSet 10000 times: 0.419522s 0.418543s 0.423371s 0.423999s 0.427279s 0.433636s 0.423573s 0.423132s 0.445995s 0.431476s $ export DOWNWARD_BITWIDTH=64 && make && ./benchmark64 Running nothing 10000 times: 3.2e-05s 3.5e-05s 2.5e-05s 2.5e-05s 2e-05s 2.5e-05s 2.2e-05s 2.4e-05s 2.5e-05s 2.5e-05s Running insert int into std::unordered_set 10000 times: 0.077637s 0.075893s 0.075891s 0.076898s 0.075936s 0.076199s 0.075817s 0.075498s 0.076068s 0.082677s Running insert int into utils::HashSet 10000 times: 0.079573s 0.079173s 0.078947s 0.078562s 0.078708s 0.079013s 0.079192s 0.078822s 0.079377s 0.078879s Running insert pair<int, int> into std::unordered_set 10000 times: 0.077939s 0.077992s 0.077781s 0.079007s 0.077732s 0.077526s 0.077522s 0.077508s 0.078121s 0.078726s Running insert pair<int, int> into utils::HashSet 10000 times: 0.083359s 0.081937s 0.081923s 0.081819s 0.090081s 0.08283s 0.081909s 0.082625s 0.081634s 0.082201s Running insert vector<int> of size 1 into std::unordered_set 10000 times: 0.043533s 0.042524s 0.042537s 0.042486s 0.041758s 0.041832s 0.042071s 0.042023s 0.041996s 0.042037s Running insert vector<int> of size 1 into utils::HashSet 10000 times: 0.052521s 0.049945s 0.050969s 0.05069s 0.05067s 0.050303s 0.050282s 0.050177s 0.050549s 0.05069s Running insert vector<int> of size 10 into std::unordered_set 10000 times: 0.06806s 0.067649s 0.067984s 0.068276s 0.070085s 0.067969s 0.067595s 0.06761s 0.068054s 0.067954s Running insert vector<int> of size 10 into utils::HashSet 10000 times: 0.085893s 0.084267s 0.084279s 0.084266s 0.084104s 0.084591s 0.084925s 0.084917s 0.084119s 0.084324s Running insert vector<int> of size 100 into std::unordered_set 10000 times: 0.390046s 0.389598s 0.389963s 0.390719s 0.391212s 0.391607s 0.389507s 0.389408s 0.388558s 0.393362s Running insert vector<int> of size 100 into utils::HashSet 10000 times: 0.429928s 0.429662s 0.431475s 0.428931s 0.430573s 0.431571s 0.430258s 0.432028s 0.429097s 0.431196s Running insert vector<int> of size 101 into std::unordered_set 10000 times: 0.39213s 0.395047s 0.392967s 0.392531s 0.393602s 0.413045s 0.407416s 0.395272s 0.397204s 0.396622s Running insert vector<int> of size 101 into utils::HashSet 10000 times: 0.435027s 0.435036s 0.434441s 0.438296s 0.435236s 0.436028s 0.442919s 0.434608s 0.435859s 0.43567s I had a brief look at SpookyHash and think that due to its complexity it should only be a last resort. How do you propose to continue here?
msg6129 (view)	Author: malte	Date: 2017-01-17.14:42:29
Oops, I seem to have accidentally deleted my message. Trying again: Very useful, thanks! 1) I'm surprised that HashSet is faster for pair<int,int> than for int (in 32-bit mode). I think this is because the processor isn't warmed up at the start of execution (CPU frequency scaling, cold caches). I suggest running 5-10 rounds of everything (in one process) and checking that the numbers have stabilized towards the end. It's also important to keep the background load the same throughout execution. 2) I think the "bad" hash function benefits from the fact that the inserted numbers are sequential in terms of cache locality. Instead of using "i", I suggest using something scrambled like (0xdeadbeef * i) ^ 0xfeedcafe. 3) It's alright that the new hash function is slower than the old one; it's not unexpected, as it scrambles the input better. We should of course still check if there's overhead we can reduce. 4) Regarding 32-bit vs. 64-bit: if we want to faster on 64-bit compiles, an obvious avenue is to use Jenkins's 64-bit hash function (SpookyHash) which he write is ~3 times faster than lookup3. It's much more complex, though, and I expect it will only be useful for longer hash keys, not for int, pair<int,int> and similar types. (How useful it will be for vector<int> will depend on the size of the vector -- the longer, the better.) I'd expect that we'd be able to get much better performance for these by using special-purpose functions for small keys that avoid much of the constant-time overhead of our generic functions. > Interestingly, the time for hashing int vectors with utils::HashSet > decreases from 3.3 seconds to 2.9 seconds if we cast > the vector size to int before feeding it to the HashState. This might simply be because it throws 4 bytes of information away, but for something to work with arbitrarily large vectors, this isn't the right thing to do. What are the runtimes if the vector is always one element longer? For our use cases, I'd also be interested in seeing how hashing vectors of fixed length behaves for some sample lengths, say 10, 100, 1000.
msg6128 (view)	Author: jendrik	Date: 2017-01-17.12:18:23
I had the suspicion that it could be the case that our Jenkins hashing code is optimized for 32-bit compiles. To test this, I wrote a micro benchmark (added to the repo): $ export DOWNWARD_BITWIDTH=32 && make && ./benchmark32 Running nothing 100000 times: 0.000481 seconds Running insert int into std::unordered_set 100000 times: 0.716415 seconds Running insert int into utils::HashSet 100000 times: 0.836888 seconds Running insert pair<int, int> into std::unordered_set 100000 times: 0.818349 seconds Running insert pair<int, int> into utils::HashSet 100000 times: 0.831098 seconds Running insert vector<int> into std::unordered_set 100000 times: 3.2735 seconds Running insert vector<int> into utils::HashSet 100000 times: 3.46468 seconds $ export DOWNWARD_BITWIDTH=64 && make && ./benchmark64 Running nothing 100000 times: 0.000596 seconds Running insert int into std::unordered_set 100000 times: 0.746506 seconds Running insert int into utils::HashSet 100000 times: 0.791947 seconds Running insert pair<int, int> into std::unordered_set 100000 times: 0.778041 seconds Running insert pair<int, int> into utils::HashSet 100000 times: 0.80977 seconds Running insert vector<int> into std::unordered_set 100000 times: 2.70406 seconds Running insert vector<int> into utils::HashSet 100000 times: 3.31512 seconds Our new hash functions are slower than the old hash functions in all test cases on both 32-bit and 64-bit. Focusing on the last test case (inserting vector<int>, which is probably comparable to inserting PackedStateWrapper) we see that the runtimes of both hash functions are almost equal in 32-bit mode, but the old hash functions benefit much more from switching to 64-bit than the new ones. Do you think we can do something about this? Interestingly, the time for hashing int vectors with utils::HashSet decreases from 3.3 seconds to 2.9 seconds if we cast the vector size to int before feeding it to the HashState.
msg6127 (view)	Author: jendrik	Date: 2017-01-16.11:43:56
All data comes from a single experiment.
msg6126 (view)	Author: malte	Date: 2017-01-16.11:34:50
Did you generate all data that you mention in your previous message in the same experiment, or are they combined from different experiments? I'm asking because, looking at several experiments that test the same revision, it looks like we're seeing huge grid noise effects in some domains.
msg6125 (view)	Author: jendrik	Date: 2017-01-16.11:12:00
I think the tables comparing v5 and v6 show * peak memory is almost unaffected by the change * runtime is more or less unaffected in 32-bit mode: 28 domains have higher total_time score and 17 have lower total_time score. * runtime increases in 64-bit mode: 11 domains have higher total_time score and 34 have lower total_time score. Here are some relative scatter plots comparing "total_time" for v5 and v6: http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-relative-scatter-blind-release32-issue693-v5-vs-issue693-v6-total_time.png http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-relative-scatter-blind-release64-issue693-v5-vs-issue693-v6-total_time.png I'm surprised that only 64-bit is adversely affected, since we use utils::get_hash64() in both versions. The only difference I can see is that we need to copy a bigger pointer when constructing PackedStateWrapper in 64-bit mode. However, I doubt that this can explain the timing effects.
msg6111 (view)	Author: jendrik	Date: 2017-01-13.11:10:49
Here are the results for blind search using a 1 minute timeout. The only difference to "default" is that the state registry uses the new hash function: http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v5-vs-issue693-v6-release32.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v5-vs-issue693-v6-release64.html
msg6109 (view)	Author: malte	Date: 2017-01-13.00:09:48
In fact, even a 1-minute timeout would be enough.
msg6108 (view)	Author: malte	Date: 2017-01-13.00:09:25
A short timeout is enough. 32-bit and 64-bit sounds good, though.
msg6107 (view)	Author: jendrik	Date: 2017-01-13.00:06:47
For the following experiments, should we test both 32-bit and 64-bit or only one of the two? Should I use a 5 minute or 30 minute timeout? I'm asking, because it sounds like a lot of experiments.
msg6101 (view)	Author: malte	Date: 2017-01-12.22:31:13
Sounds good! I had a look at the experiment and at the pull request for v5; both look good.
msg6099 (view)	Author: jendrik	Date: 2017-01-12.22:18:46
I chose a slightly different path, but ended up with the desired result. I have added the old hash function to v4 and kept the new hash functions. The new hash functions are unused for now. This code version is tagged "v5". I have run an experiment with a search timeout of 5 minutes checking that nothing changes and I think the results confirm this: http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v5-base-vs-issue693-v5-release32.html I'll change the hash function for the state registry next, since it is one of the most important places where we hash.
msg6087 (view)	Author: jendrik	Date: 2017-01-12.00:06:58
Your plan sounds good, I'll try it tomorrow.
msg6086 (view)	Author: malte	Date: 2017-01-11.20:58:40
OK, I've now stared at the profiles for an hour or so; time to come up with a plan. (I'm not sure if the kcachegrind numbers will be much more insightful; I think any profile will be hard to analyze because we've changed so many places.) To the extent that we can trust instruction counts to accurately reflect runtime, we slow down from 18.8 billion units to 20.4 billion units. Of the 1.6 billion difference, about 0.47 billion (so around 30% of the additional time) seem to be attributable to hashing states in the state registry, and 0.3 billion (around 20%) to hashing ScalarEvaluator pointers in the evaluation context. But I wouldn't really trust these numbers because they ignore cache locality, and also they account for only half of the penalty. I'd really like to study this a bit further, and for this I think it would be good not to change so many things at the same time. Looking at the diff between v4 and v4-base, there are three main changes: 1. getting rid of the old hash functions 2. introducing the new hash functions and HashSet/HashMap aliases 3. using the new hash tables everywhere I would like to split this up a bit so that we can test the performance impact in smaller increments. My suggestion is to have both hash table implementations available in this branch simultaneously, so that we can switch them out individually more easily. One possibility is to split off a new experimental branch off issue693-v4-base where we proceed as follows: 1. Introduce the new hash functions, but not by overwriting the existing hash.h, but by adding to it. Let's call the new aliases utils::JenkinsHashSet and utils::JenkinsHashMap, and also introduce aliases in hash.h for our old hash maps, say utils::OldHashSet and utils::OldHashMap or whatever. 2. Use OldHashSet and OldHashMap everywhere where we currently use hash_set and hash_map. 3. Do a short experiment/profile to verify that this has no performance impact. It shouldn't, but better safe than sorry because the above is quite a large code change. We're then in a position to test the impact of the new hash tables with individual changes (one line in most cases) for each major point where we use hash tables. What do you think? Going from what we currently have to the step after 2. shouldn't be too difficult, I hope. We'd basically need to globally search and replace utils::HashMap and utils::HashSet by utils::OldHashMap and utils::OldHashSet everywhere outside hash.h, and in hash.h we'd need to re-add the old code and add the OldHashMap and OldHashSet aliases.
msg6083 (view)	Author: malte	Date: 2017-01-11.19:49:38
I had a look. We're optimizing rather low-level code here, so I'm worried that the simulation options used for these profiles were too coarse. I assume you used callgrind? (qcachegrind showed me that it estimated the number of clock cycles as the number of instructions read. I think it would only do that if callgrind was used, not if cachegrind was used.) Can you create updated profiles with cachegrind?
msg6082 (view)	Author: malte	Date: 2017-01-11.19:35:00
I'm looking at the profiles at the moment -- don't know yet.
msg6081 (view)	Author: jendrik	Date: 2017-01-11.19:34:27
What's your suggestion for how we continue here?
msg6080 (view)	Author: malte	Date: 2017-01-11.19:16:37
One factor that influences whether a function is inlined is how large it is. I assume the old hash functions were much smaller than the new ones.
msg6069 (view)	Author: jendrik	Date: 2017-01-11.11:49:04
I have profiled miconic:s8-4.pddl for both v4 and v4-base for blind search. I'll attach the resulting profiles. Looking at the children of EagerSearch::step(), which uses almost all of the time, we find that only the following functions use more relative time in v4 compared to v4-base: 1) get_successor_state: 24.15% -> 24.55% hash packed states 2) OpenList::insert: 4.27% -> 5.02% hash ScalarEvaluator pointers for HeuristicCache 3) utils::HashSet::count(): ?% -> 3.39% hash GlobalOperator pointers for testing preferredness It seems for 3) the function is inlined in the old code version, but not in the new one. I have no idea why that happens though. Since the hash set of preferred operators is always empty, it shouldn't make a (big) difference anyway.
msg6062 (view)	Author: malte	Date: 2017-01-10.18:34:29
Thanks for the update! I think this confirms the suspicion that for very simple types and hash functions that are used in inner loops with little going on apart from adding stuff to the hash tables, the overhead of the new hash functions can hurt us, which makes sense. We hash everything as if it were a generic sequence, and for very small keys, the relative overhead for this is high. At the end, we may have to make a decision between hash quality and hashing speed. But let's gather more data first.
msg6060 (view)	Author: jendrik	Date: 2017-01-10.18:29:12
Before reading your last comment, I tried the following: I had a closer look at the results. The slowdown is very pronounced for the systematic patterns. E.g., the total_time for parcprinter-opt11-strips tasks get doubled in 32-bit mode. I profiled the systematic pattern generation on p05.pddl from that domain. It turns out that almost all of the time before the search starts is spent doing dominance pruning, i.e., in the function collection_dominates() in dominance_pruning.cc. The only hashing that is done there is to hash pairs of pointers. If it had been a primitive type with an obvious hash value (identity) that was hashed, I would have brought up adding template specializations for primitive types for utils::get_hash() (but I wouldn't have liked that solution). Since it's hashing pointer pairs that's causing a slowdown, I'm not sure how to proceed. I'll profile a miconic task for blind search now.
msg6059 (view)	Author: malte	Date: 2017-01-10.18:15:21
The performance is a bit worse than I'd hope for. I'd rather investigate this a bit more closely before merging. I suggest focusing on blind search first, since it should be the configuration where it should be easiest to pinpoint where the performance change comes from (as it only uses few planner components). For blind search, a good domain to focus on is miconic because it is the domain with the strongest negative effect on score_total_time by far (although this also has to do with the fact that is has so many tasks), and because the domain is quite simple and the tasks scale in a straight-forward way. On the larger tasks, blind search runtime increases by around 10-15% in Miconic. My first instinct is that with blind search, the only important place where the new hash code comes into play is hashing for the state registry, and the most obvious explanation would be that the new hash function is slower than the old one. I'd like to confirm whether or not this is true, and if this is the case, I would like to see if this is because the Jenkins hash is intrinsically slower, or because there is an overhead due to our design with HashState, feed and friends. (I'd hope that almost all of this, but not quite all of this, would be optimized away.) Can you run a profile on a suitable Miconic tasks and see if that generates any insights?
msg6034 (view)	Author: jendrik	Date: 2017-01-08.10:37:03
Here are the results of the experiment comparing the new utils::HashMap and utils::HashSet (v4) to its ancestor on default (v4-base): http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v4-base-vs-issue693-v4-release32.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v4-base-vs-issue693-v4-release64.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v4-base-vs-issue693-v4-release32.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v4-base-vs-issue693-v4-release64.html I'd say memory looks ok for 32-bit and 64-bit. However, the code seems to get a little slower on average.
msg6008 (view)	Author: malte	Date: 2017-01-03.17:09:18
Sounds good! I've added a few more comments.
msg6007 (view)	Author: jendrik	Date: 2017-01-03.16:36:11
OK, I'll wait with the experiments until we're happy with the code.
msg6006 (view)	Author: malte	Date: 2017-01-03.15:52:35
I think a new experiment comparing v3 to its ancestor on default would be good; I don't think we can meaningfully interpret these numbers. BTW, we should really use different configurations for satisficing search, as none of these configurations is a good one. lmcount is a poor heuristic on its own, and no configuration without preferred operators is competitive with a configuration with preferred operators. I would include --alias lama-first and then 2-3 variations of eager_greedy or lazy_greedy with preferred operators. For example, eager_greedy with FF and preferred operators, lazy_greedy with FF and preferred operators, and perhaps one more configuration using cg, cea or add.
msg6005 (view)	Author: jendrik	Date: 2017-01-03.15:43:46
I have run experiments comparing the new hashing functions before (v2) and after the addition of the HashState class (v3). Unfortunately, I missed that in between the two versions the default branch was merged into the issue branch. If you want, I can run experiments again comparing the last "default" revision that was merged into the issue branch against the last revision on the issue branch. In any case, here are the results: opt: http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v2-vs-issue693-v3-release32.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v2-vs-issue693-v3-release64.html sat: http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v2-vs-issue693-v3-release32.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v2-vs-issue693-v3-release64.html Here is the diff between the two revisions: https://bitbucket.org/jendrikseipp/downward/branches/compare/issue693-v3..issue693-v2#diff
msg6004 (view)	Author: malte	Date: 2017-01-03.15:42:15
Sounds good! In the message, perhaps you can show what the alternatives are. For example, we cannot write "unordered_set<vector<int>>" because std::hash doesn't exist for std::vector<int>. One alternative is to define our own template and write "utils::hash_set<vector<int>>". The other alternative is to manually specify a hash function for unordered_set. It would be good to include what this would look like in the message to downward-dev.
msg6003 (view)	Author: jendrik	Date: 2017-01-03.15:28:25
OK, then the only question is whether we want to add utils::hash_set and utils::hash_map templates (see msg5998), right? I'll write to downward-dev about this.
msg6002 (view)	Author: malte	Date: 2017-01-03.15:23:23
Yes, this rules out defining std::hash for types like vector<int> or pair<int,int>. I've also looked at other sources to confirm this.
msg6001 (view)	Author: jendrik	Date: 2017-01-03.15:11:43
http://en.cppreference.com/w/cpp/language/extending_std: "It is allowed to add template specializations for any standard library template to the namespace std only if the declaration depends on a user-defined type ..." If I understand this correctly we're not allowed to redefine the std::hash() struct. Do you agree?
msg6000 (view)	Author: malte	Date: 2017-01-02.19:38:53
> Feel free to add the high-level documentation. Done. If we stop providing a catch-all implementation for std::hash (see msg5998), then the last comment in hash.h needs to be updated.
msg5999 (view)	Author: jendrik	Date: 2017-01-02.19:12:14
All I could find is http://en.cppreference.com/w/cpp/language/extending_std . Not sure whether it explains the error we're seeing though.
msg5998 (view)	Author: malte	Date: 2017-01-02.19:01:36
I was wondering before if it was preferable not to inject our hash functions into the std namespace. Then the error should go away, and we can be sure that we don't accidentally use a potentially poor stdlib hash in places where we want to use our own hash. Not using std::hash would mean that we have to specify the extra template arguments when using std::unordered_set or std::unordered_map, but we could encapsulate this by providing our own template that picks the right arguments. For example, we could call our template utils::hash_set and utils::hash_map. If we want to go this route, I'd suggest discussing it briefly on downward-dev.
msg5997 (view)	Author: jendrik	Date: 2017-01-02.18:54:44
It compiles on gcc 4.8 (bitbucket pipelines) and gcc 5.4 (my laptop). Judging by the error message it seems that we're not allowed to overwrite the std::hash struct in newer gcc versions. I'll do a google search for this.
msg5996 (view)	Author: malte	Date: 2017-01-02.18:45:10
The current code (700f56c30ef6) doesn't compile on my machine at home (gcc 6.2.0). Does it compile for you? I'm attaching what the build process writes to stderr.
msg5968 (view)	Author: jendrik	Date: 2016-12-21.21:00:41
Experiments are running. Feel free to add the high-level documentation.
msg5964 (view)	Author: jendrik	Date: 2016-12-21.16:59:10
I updated the pull request with the implementation we discussed. Once you're happy with the implementation I can run experiments.
msg5899 (view)	Author: malte	Date: 2016-12-17.03:50:21
Are you sure it only works well on 64-bit architectures? From the blog post/description, I only see that it requires 64-bit math, but that's generally supported in C++11. 64-bit math can be quite inefficient on a 32-bit architecture, and we would probably see this in the runtime with a 32-bit compile, but we're planning to discontinue 32-bit compiles, and x64 has been around for 13 years now. But actually, reading more about this, I wonder whether all this (either lookup3 or SpookyHash) isn't complete overkill anyway, and we might be better served with something simple such as the one-at-a-time hash, which would have a much lower implementation complexity and could fit well into our current framework of combining functions for hashing functions for different types. > I'd propose to focus on the hash function for the state registry in this issue > and consider 64-bit alternatives once we've made the jump to 64-bit compiles. > What do you think? It's not particularly clean to use two completely different approaches to hashing in two different parts of the code. If we think our current default hash distributes keys badly, I'm not sure it's a good idea to use it everywhere in the code. So this would make me a bit sad, but I would probably survive it. ;-)
msg5897 (view)	Author: jendrik	Date: 2016-12-16.19:18:46
Here is the link for comparing v1 and v2: https://bitbucket.org/jendrikseipp/downward/branches/compare/issue693-v2..issue693-v1#diff I already saw SpookyHash when looking for suitable hash functions. Unfortunately, it only works well on 64-bit architectures (see SpookyHash homepage and http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html). I'd propose to focus on the hash function for the state registry in this issue and consider 64-bit alternatives once we've made the jump to 64-bit compiles. What do you think?
msg5896 (view)	Author: malte	Date: 2016-12-16.16:32:00
Regarding the unsigned_int vs. size_t question, after some more thought I think it would actually be much more future-proof if we computed a 64-bit (or larger) hash rather than a 32-bit hash. A 64-bit hash value can be used for a 32-bit hash table, but not vice versa. I'm sorry to suggest extra work, but I think it would be good if our hash functions could at least in principle support large hash tables, even if our state registries are currently limited to 2^31 - 1 states. It is not that unlikely that a few years down the road we will look into external search algorithms or will occasionally want to run experiments with significantly more RAM. And a 64-bit hash function would also enable us to continue using the standard library's hash tables in places where we don't want to customize. The author of the hash function we're using writes "More recently I published SpookyHash, which is many times faster than lookup3 on 64-bit platforms, even for short keys." SpookyHash provides a 128-bit hash that can also be used as a 32-bit hash or 64-bit hash. See http://www.burtleburtle.net/bob/hash/index.html#lookup. As an added bonus, it looks like SpookyHash can support potentially non-contiguous data nicely via the Init/Update/Final methods. This would allow us to implement hash functions for something like vector<FactPair> without non-portable reinterpret_casts, and I think we could not do that with lookup3.
msg5895 (view)	Author: malte	Date: 2016-12-16.16:11:37
Is there a way to look at a diff of v1 vs. v2 in bitbucket? The numbers in the experiment look OK to me. I'm not sure the experimental trends are more than noise. For example, if you compare score_total_time of base and v2 domain by domain, then base wins 26 time and v2 wins 18 times. That's not a big deal.
msg5889 (view)	Author: jendrik	Date: 2016-12-14.21:21:21
The old hash_combine() function is currently used to hash - State objects (used by diverse_potentials()) - patterns - vector<FactPair> in DTGFactory Our old hash functions in hash.h operate on size_t, but the new hash function uses unsigned int. How should we go about this?
msg5888 (view)	Author: jendrik	Date: 2016-12-14.18:59:33
I'll look into where our hash function is currently used. Here are the experiment results: v1: hash function as in issue213-v5. v2: hash function as in issue213-v5, but adapted to our code style. http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v2-blind-issue693-base-issue693-v1-compare.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v2-blind-issue693-base-issue693-v2-compare.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v2-blind-issue693-v1-issue693-v2-compare.html
msg5884 (view)	Author: malte	Date: 2016-12-14.17:50:43
I'm done commenting on bitbucket for now, but one more major comment: rather than adding a new hash function, I'd prefer getting rid of our old hash function entirely, replacing it with the new one. Is that possible? (I guess it depends on which types of objects we currently hash.)
msg5879 (view)	Author: jendrik	Date: 2016-12-14.10:12:40
I'll run experiments now.
msg5878 (view)	Author: jendrik	Date: 2016-12-14.10:12:19
I made a pull request at https://bitbucket.org/jendrikseipp/downward/pull-requests/66
msg5877 (view)	Author: jendrik	Date: 2016-12-14.09:37:54
Split off from issue213.

History
Date	User	Action	Args
2017-07-09 12:11:50	malte	set	messages: + msg6425
2017-07-09 12:05:05	jendrik	set	status: reviewing -> resolved messages: + msg6424
2017-07-08 15:38:25	malte	set	messages: + msg6423
2017-07-07 19:45:28	jendrik	set	status: testing -> reviewing messages: + msg6419
2017-07-07 18:44:59	malte	set	messages: + msg6418
2017-07-06 09:18:50	jendrik	set	messages: + msg6412
2017-07-06 00:08:57	jendrik	set	messages: + msg6410 summary: TODO: use the mini-benchmark to reduce the overhead of the new hash function. ->
2017-05-07 17:54:51	malte	set	messages: + msg6349
2017-05-07 17:48:21	jendrik	set	messages: + msg6347
2017-04-29 09:23:41	jendrik	set	messages: + msg6295 summary: TODO: use the mini-benchmark to reduce the overhead of the new hash function. -> TODO: use the mini-benchmark to reduce the overhead of the new hash function.
2017-04-28 21:35:00	malte	set	messages: + msg6286
2017-04-28 21:09:42	jendrik	set	messages: + msg6284
2017-04-27 23:23:43	jendrik	set	summary: TODO: use the mini-benchmark to reduce the overhead of the new hash function.
2017-01-17 17:00:21	malte	set	messages: + msg6131
2017-01-17 15:50:05	jendrik	set	messages: + msg6130
2017-01-17 14:42:29	malte	set	messages: + msg6129
2017-01-17 12:18:23	jendrik	set	messages: + msg6128
2017-01-16 11:43:56	jendrik	set	messages: + msg6127
2017-01-16 11:34:50	malte	set	messages: + msg6126
2017-01-16 11:12:00	jendrik	set	messages: + msg6125
2017-01-13 11:10:49	jendrik	set	messages: + msg6111
2017-01-13 00:09:48	malte	set	messages: + msg6109
2017-01-13 00:09:25	malte	set	messages: + msg6108
2017-01-13 00:06:47	jendrik	set	messages: + msg6107
2017-01-12 22:31:13	malte	set	messages: + msg6101
2017-01-12 22:18:46	jendrik	set	messages: + msg6099
2017-01-12 00:06:58	jendrik	set	messages: + msg6087
2017-01-11 20:58:40	malte	set	messages: + msg6086
2017-01-11 19:49:39	malte	set	messages: + msg6083
2017-01-11 19:35:00	malte	set	messages: + msg6082
2017-01-11 19:34:27	jendrik	set	messages: + msg6081
2017-01-11 19:16:37	malte	set	messages: + msg6080
2017-01-11 11:49:12	jendrik	set	files: + v4-blind
2017-01-11 11:49:04	jendrik	set	files: + v4-base-blind messages: + msg6069
2017-01-10 18:34:29	malte	set	messages: + msg6062
2017-01-10 18:29:12	jendrik	set	messages: + msg6060
2017-01-10 18:15:21	malte	set	messages: + msg6059
2017-01-08 10:37:03	jendrik	set	messages: + msg6034
2017-01-03 17:09:18	malte	set	messages: + msg6008
2017-01-03 16:36:11	jendrik	set	messages: + msg6007
2017-01-03 15:52:35	malte	set	messages: + msg6006
2017-01-03 15:43:46	jendrik	set	messages: + msg6005
2017-01-03 15:42:15	malte	set	messages: + msg6004
2017-01-03 15:28:25	jendrik	set	messages: + msg6003
2017-01-03 15:23:23	malte	set	messages: + msg6002
2017-01-03 15:11:43	jendrik	set	messages: + msg6001
2017-01-02 19:38:53	malte	set	messages: + msg6000
2017-01-02 19:12:14	jendrik	set	messages: + msg5999
2017-01-02 19:01:36	malte	set	messages: + msg5998
2017-01-02 18:54:44	jendrik	set	messages: + msg5997
2017-01-02 18:45:10	malte	set	files: + compile.err messages: + msg5996
2016-12-21 21:00:53	jendrik	set	status: reviewing -> testing
2016-12-21 21:00:41	jendrik	set	messages: + msg5968
2016-12-21 20:05:13	jendrik	set	title: new hash function for state registry -> new hash function
2016-12-21 16:59:10	jendrik	set	messages: + msg5964
2016-12-17 03:50:21	malte	set	messages: + msg5899
2016-12-16 19:18:46	jendrik	set	messages: + msg5897
2016-12-16 16:32:00	malte	set	messages: + msg5896
2016-12-16 16:11:37	malte	set	messages: + msg5895
2016-12-14 21:21:21	jendrik	set	messages: + msg5889
2016-12-14 18:59:34	jendrik	set	messages: + msg5888
2016-12-14 17:50:43	malte	set	messages: + msg5884
2016-12-14 10:12:40	jendrik	set	messages: + msg5879
2016-12-14 10:12:19	jendrik	set	status: unread -> reviewing messages: + msg5878
2016-12-14 09:37:54	jendrik	create

Issue693