Issue693

Title new hash function
Priority feature Status resolved
Superseder Nosy List jendrik, malte
Assigned To jendrik Keywords
Optional summary

Created on 2016-12-14.09:37:54 by jendrik, last changed by malte.

Files
File name Uploaded Type Edit Remove
compile.err malte, 2017-01-02.18:45:10 application/octet-stream
v4-base-blind jendrik, 2017-01-11.11:49:04 application/octet-stream
v4-blind jendrik, 2017-01-11.11:49:12 application/octet-stream
Messages
msg6425 (view) Author: malte Date: 2017-07-09.12:11:50
Thanks!
msg6424 (view) Author: jendrik Date: 2017-07-09.12:05:05
Here are the results, I don't see a big effect:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v8-blind-issue693-v7-issue693-v8-compare.html

For completeness, here is the latest branch version (v8) vs. the default branch (v7-base):

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v8-blind-issue693-v7-base-issue693-v8-compare.html

Merged and pushed.
msg6423 (view) Author: malte Date: 2017-07-08.15:38:25
Looks good. I'd be interested in seeing another set of experiments (blind search
only is enough) to see if the change without the wrapper class led to a speedup.
But you can merge either way.
msg6419 (view) Author: jendrik Date: 2017-07-07.19:45:28
I'm done handling your comments. Can I merge this?
msg6418 (view) Author: malte Date: 2017-07-07.18:44:59
Experimental results look good. I'm done with my bitbucket comments.
msg6412 (view) Author: jendrik Date: 2017-07-06.09:18:50
The results are in:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v7-opt-issue693-v7-base-issue693-v7-compare.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v7-sat-issue693-v7-base-issue693-v7-compare.html
msg6410 (view) Author: jendrik Date: 2017-07-06.00:08:56
I have made the changes we discussed offline and will run experiments now:
https://bitbucket.org/jendrikseipp/downward/pull-requests/66
msg6349 (view) Author: malte Date: 2017-05-07.17:54:51
I'd like to have a closer look at the code and the experimental results also see
how they differ from machine to machine. There's a few other things in front of
this in the review queue, though.
msg6347 (view) Author: jendrik Date: 2017-05-07.17:48:21
How do you propose to continue here?
msg6295 (view) Author: jendrik Date: 2017-04-29.09:23:41
You can find the code under 
experiments/issue693/hash-microbenchmark 
in the repo linked below.
msg6286 (view) Author: malte Date: 2017-04-28.21:35:00
Some very good-looking numbers mixed with some I find hard to understand.

Can I get the code for the microbenchmarks somewhere? I wonder how much it
changes from computer to computer.
msg6284 (view) Author: jendrik Date: 2017-04-28.21:09:42
I implemented a version of HashState using the Boost hash function. Here are the
results (for each bitwidth I report the second iteration to account for
cache effects):

std::unordered_set:     hash_combine + Boost hash
FastHashSet:            HashState + Boost hash
BurtleBurtleHashSet:    hashword() from lookup3
utils::HashSet:         HashState + feed()
HashWordHash:           HashState + feed_ints()
SpookyV2HashSet:        SpookyHashV2


32-bit:

Running nothing 100000 times: 0.000201s

Running insert int into std::unordered_set 100000 times: 0.794704s
Running insert int into FastHashSet 100000 times: 0.788237s
Running insert int into utils::HashSet 100000 times: 0.886164s
Running insert int into SpookyHash 100000 times: 1.15538s

Running insert pair<int, int> into std::unordered_set 100000 times: 0.806538s
Running insert pair<int, int> into FastHashSet 100000 times: 0.807989s
Running insert pair<int, int> into utils::HashSet 100000 times: 0.849372s

Running insert vector<int> of size 1 into std::unordered_set 100000 times: 0.416037s
Running insert vector<int> of size 1 into FastHashSet 100000 times: 0.421279s
Running insert vector<int> of size 1 into BurtleBurtleHashSet 100000 times: 0.445918s
Running insert vector<int> of size 1 into utils::HashSet 100000 times: 0.450099s
Running insert vector<int> of size 1 into HashWordHash 100000 times: 0.445355s
Running insert vector<int> of size 1 into SpookyV2HashSet 100000 times: 0.735016s

Running insert vector<int> of size 10 into std::unordered_set 100000 times: 0.712605s
Running insert vector<int> of size 10 into FastHashSet 100000 times: 0.672364s
Running insert vector<int> of size 10 into BurtleBurtleHashSet 100000 times: 0.695451s
Running insert vector<int> of size 10 into utils::HashSet 100000 times: 0.735414s
Running insert vector<int> of size 10 into HashWordHash 100000 times: 0.695138s
Running insert vector<int> of size 10 into SpookyV2HashSet 100000 times: 1.23303s

Running insert vector<int> of size 100 into std::unordered_set 100000 times: 3.91018s
Running insert vector<int> of size 100 into FastHashSet 100000 times: 3.89545s
Running insert vector<int> of size 100 into BurtleBurtleHashSet 100000 times: 3.72564s
Running insert vector<int> of size 100 into utils::HashSet 100000 times: 3.92713s
Running insert vector<int> of size 100 into HashWordHash 100000 times: 3.75994s
Running insert vector<int> of size 100 into SpookyV2HashSet 100000 times: 4.67119s


64-bit:

Running nothing 100000 times: 0.000199s

Running insert int into std::unordered_set 100000 times: 0.767982s
Running insert int into FastHashSet 100000 times: 0.75542s
Running insert int into utils::HashSet 100000 times: 0.796044s
Running insert int into SpookyHash 100000 times: 0.837655s

Running insert pair<int, int> into std::unordered_set 100000 times: 0.774631s
Running insert pair<int, int> into FastHashSet 100000 times: 0.798033s
Running insert pair<int, int> into utils::HashSet 100000 times: 0.808774s

Running insert vector<int> of size 1 into std::unordered_set 100000 times: 0.440595s
Running insert vector<int> of size 1 into FastHashSet 100000 times: 0.418915s
Running insert vector<int> of size 1 into BurtleBurtleHashSet 100000 times: 0.445275s
Running insert vector<int> of size 1 into utils::HashSet 100000 times: 0.462981s
Running insert vector<int> of size 1 into HashWordHash 100000 times: 0.472938s
Running insert vector<int> of size 1 into SpookyV2HashSet 100000 times: 0.510124s

Running insert vector<int> of size 10 into std::unordered_set 100000 times: 0.662004s
Running insert vector<int> of size 10 into FastHashSet 100000 times: 0.728206s
Running insert vector<int> of size 10 into BurtleBurtleHashSet 100000 times: 0.686017s
Running insert vector<int> of size 10 into utils::HashSet 100000 times: 0.827219s
Running insert vector<int> of size 10 into HashWordHash 100000 times: 0.718663s
Running insert vector<int> of size 10 into SpookyV2HashSet 100000 times: 0.739045s

Running insert vector<int> of size 100 into std::unordered_set 100000 times: 3.78722s
Running insert vector<int> of size 100 into FastHashSet 100000 times: 4.2276s
Running insert vector<int> of size 100 into BurtleBurtleHashSet 100000 times: 3.89874s
Running insert vector<int> of size 100 into utils::HashSet 100000 times: 4.2328s
Running insert vector<int> of size 100 into HashWordHash 100000 times: 4.02751s
Running insert vector<int> of size 100 into SpookyV2HashSet 100000 times: 2.97786s
msg6131 (view) Author: malte Date: 2017-01-17.17:00:21
Thanks! For the next such experiment, I'd recommend running separate rounds of
the whole thing, not separate rounds of the whole test, so rather than:

for each test:
    run each test 10 times

but rather

10 times:
    run each test

Otherwise we still run the risk of CPU frequency scaling kicking in too late for
the first tests.

But this is not necessary here, I think, as the data looks fine to me.


Agreed on SpookyHash. I think the next thing we could try is to speed up our
32-bit Jenkins' hash by reducing the overhead and evaluating it within the
context of this mini-benchmark, and if we're successful, put it back into the
planner. 

Discussing detailed suggestions on how to try to reduce the overhead would take
a lot of time, I think, so you can either go ahead with some ideas you have, or
we can discuss it next week when I'm back from Dagstuhl.
msg6130 (view) Author: jendrik Date: 2017-01-17.15:50:05
I addressed your comments and repeated the experiment:


$ export DOWNWARD_BITWIDTH=32 && make && ./benchmark32 
Running nothing 10000 times: 2.7e-05s 1.6e-05s 1.6e-05s 1.7e-05s 1.6e-05s 1.8e-05s 1.8e-05s 1.7e-05s 1.7e-05s 1.7e-05s

Running insert int into std::unordered_set 10000 times: 0.081036s 0.079106s 0.079247s 0.084152s 0.08487s 0.08151s 0.083751s 0.080348s 0.079144s 0.07914s
Running insert int into utils::HashSet 10000 times: 0.084762s 0.085265s 0.084493s 0.0838s 0.08397s 0.085129s 0.085078s 0.086782s 0.104818s 0.087344s

Running insert pair<int, int> into std::unordered_set 10000 times: 0.08351s 0.082174s 0.081757s 0.083469s 0.085102s 0.084095s 0.08828s 0.086784s 0.087166s 0.08912s
Running insert pair<int, int> into utils::HashSet 10000 times: 0.094726s 0.097081s 0.093484s 0.09684s 0.105848s 0.088336s 0.085695s 0.087439s 0.092227s 0.098264s

Running insert vector<int> of size 1 into std::unordered_set 10000 times: 0.04596s 0.056363s 0.048646s 0.047186s 0.04645s 0.0438s 0.042544s 0.042643s 0.042583s 0.042839s
Running insert vector<int> of size 1 into utils::HashSet 10000 times: 0.048968s 0.04695s 0.054449s 0.048542s 0.047915s 0.047623s 0.056844s 0.063033s 0.057115s 0.057683s

Running insert vector<int> of size 10 into std::unordered_set 10000 times: 0.078425s 0.094143s 0.091946s 0.101197s 0.068526s 0.069803s 0.069481s 0.070252s 0.07039s 0.069613s
Running insert vector<int> of size 10 into utils::HashSet 10000 times: 0.079197s 0.078231s 0.078407s 0.081361s 0.077901s 0.075849s 0.075296s 0.074961s 0.075143s 0.076078s

Running insert vector<int> of size 100 into std::unordered_set 10000 times: 0.389071s 0.389709s 0.39034s 0.392658s 0.388745s 0.411826s 0.389423s 0.406438s 0.395687s 0.389923s
Running insert vector<int> of size 100 into utils::HashSet 10000 times: 0.41582s 0.417067s 0.414103s 0.413918s 0.411115s 0.415822s 0.413745s 0.411823s 0.41271s 0.438373s

Running insert vector<int> of size 101 into std::unordered_set 10000 times: 0.400385s 0.392617s 0.394046s 0.39738s 0.39478s 0.391783s 0.393181s 0.399569s 0.393019s 0.392476s
Running insert vector<int> of size 101 into utils::HashSet 10000 times: 0.419522s 0.418543s 0.423371s 0.423999s 0.427279s 0.433636s 0.423573s 0.423132s 0.445995s 0.431476s


$ export DOWNWARD_BITWIDTH=64 && make && ./benchmark64
Running nothing 10000 times: 3.2e-05s 3.5e-05s 2.5e-05s 2.5e-05s 2e-05s 2.5e-05s 2.2e-05s 2.4e-05s 2.5e-05s 2.5e-05s

Running insert int into std::unordered_set 10000 times: 0.077637s 0.075893s 0.075891s 0.076898s 0.075936s 0.076199s 0.075817s 0.075498s 0.076068s 0.082677s
Running insert int into utils::HashSet 10000 times: 0.079573s 0.079173s 0.078947s 0.078562s 0.078708s 0.079013s 0.079192s 0.078822s 0.079377s 0.078879s

Running insert pair<int, int> into std::unordered_set 10000 times: 0.077939s 0.077992s 0.077781s 0.079007s 0.077732s 0.077526s 0.077522s 0.077508s 0.078121s 0.078726s
Running insert pair<int, int> into utils::HashSet 10000 times: 0.083359s 0.081937s 0.081923s 0.081819s 0.090081s 0.08283s 0.081909s 0.082625s 0.081634s 0.082201s

Running insert vector<int> of size 1 into std::unordered_set 10000 times: 0.043533s 0.042524s 0.042537s 0.042486s 0.041758s 0.041832s 0.042071s 0.042023s 0.041996s 0.042037s
Running insert vector<int> of size 1 into utils::HashSet 10000 times: 0.052521s 0.049945s 0.050969s 0.05069s 0.05067s 0.050303s 0.050282s 0.050177s 0.050549s 0.05069s

Running insert vector<int> of size 10 into std::unordered_set 10000 times: 0.06806s 0.067649s 0.067984s 0.068276s 0.070085s 0.067969s 0.067595s 0.06761s 0.068054s 0.067954s
Running insert vector<int> of size 10 into utils::HashSet 10000 times: 0.085893s 0.084267s 0.084279s 0.084266s 0.084104s 0.084591s 0.084925s 0.084917s 0.084119s 0.084324s

Running insert vector<int> of size 100 into std::unordered_set 10000 times: 0.390046s 0.389598s 0.389963s 0.390719s 0.391212s 0.391607s 0.389507s 0.389408s 0.388558s 0.393362s
Running insert vector<int> of size 100 into utils::HashSet 10000 times: 0.429928s 0.429662s 0.431475s 0.428931s 0.430573s 0.431571s 0.430258s 0.432028s 0.429097s 0.431196s

Running insert vector<int> of size 101 into std::unordered_set 10000 times: 0.39213s 0.395047s 0.392967s 0.392531s 0.393602s 0.413045s 0.407416s 0.395272s 0.397204s 0.396622s
Running insert vector<int> of size 101 into utils::HashSet 10000 times: 0.435027s 0.435036s 0.434441s 0.438296s 0.435236s 0.436028s 0.442919s 0.434608s 0.435859s 0.43567s


I had a brief look at SpookyHash and think that due to its complexity it should only be a last resort. How do you propose to continue here?
msg6129 (view) Author: malte Date: 2017-01-17.14:42:29
Oops, I seem to have accidentally deleted my message. Trying again:

Very useful, thanks!

1) I'm surprised that HashSet is faster for pair<int,int> than for int (in
32-bit mode). I think this is because the processor isn't warmed up at the start
of execution (CPU frequency scaling, cold caches). I suggest running 5-10 rounds
of everything (in one process) and checking that the numbers have stabilized
towards the end. It's also important to keep the background load the same
throughout execution.

2) I think the "bad" hash function benefits from the fact that the inserted
numbers are sequential in terms of cache locality. Instead of using "i", I
suggest using something scrambled like (0xdeadbeef * i) ^ 0xfeedcafe.

3) It's alright that the new hash function is slower than the old one; it's not
unexpected, as it scrambles the input better. We should of course still check if
there's overhead we can reduce.

4) Regarding 32-bit vs. 64-bit: if we want to faster on 64-bit compiles, an
obvious avenue is to use Jenkins's 64-bit hash function (SpookyHash) which he
write is ~3 times faster than lookup3. It's much more complex, though, and I
expect it will only be useful for longer hash keys, not for int, pair<int,int>
and similar types. (How useful it will be for vector<int> will depend on the
size of the vector -- the longer, the better.) I'd expect that we'd be able to
get much better performance for these by using special-purpose functions for
small keys that avoid much of the constant-time overhead of our generic functions.


> Interestingly, the time for hashing int vectors with utils::HashSet
> decreases from 3.3 seconds to 2.9 seconds if we cast
> the vector size to int before feeding it to the HashState.

This might simply be because it throws 4 bytes of information away, but for
something to work with arbitrarily large vectors, this isn't the right thing to
do. What are the runtimes if the vector is always one element longer?

For our use cases, I'd also be interested in seeing how hashing vectors of fixed
length behaves for some sample lengths, say 10, 100, 1000.
msg6128 (view) Author: jendrik Date: 2017-01-17.12:18:23
I had the suspicion that it could be the case that our Jenkins hashing code is optimized for 32-bit compiles. To test 
this, I wrote a micro benchmark (added to the repo):



$ export DOWNWARD_BITWIDTH=32 && make && ./benchmark32 
Running nothing 100000 times: 0.000481 seconds

Running insert int into std::unordered_set 100000 times: 0.716415 seconds
Running insert int into utils::HashSet 100000 times: 0.836888 seconds

Running insert pair<int, int> into std::unordered_set 100000 times: 0.818349 seconds
Running insert pair<int, int> into utils::HashSet 100000 times: 0.831098 seconds

Running insert vector<int> into std::unordered_set 100000 times: 3.2735 seconds
Running insert vector<int> into utils::HashSet 100000 times: 3.46468 seconds



$ export DOWNWARD_BITWIDTH=64 && make && ./benchmark64 
Running nothing 100000 times: 0.000596 seconds

Running insert int into std::unordered_set 100000 times: 0.746506 seconds
Running insert int into utils::HashSet 100000 times: 0.791947 seconds

Running insert pair<int, int> into std::unordered_set 100000 times: 0.778041 seconds
Running insert pair<int, int> into utils::HashSet 100000 times: 0.80977 seconds

Running insert vector<int> into std::unordered_set 100000 times: 2.70406 seconds
Running insert vector<int> into utils::HashSet 100000 times: 3.31512 seconds



Our new hash functions are slower than the old hash functions in all test cases on both 32-bit and 64-bit. Focusing on 
the last test case (inserting vector<int>, which is probably comparable to inserting PackedStateWrapper) we see that the 
runtimes of both hash functions are almost equal in 32-bit mode, but the old hash functions benefit much more from 
switching to 64-bit than the new ones. Do you think we can do something about this?

Interestingly, the time for hashing int vectors with utils::HashSet decreases from 3.3 seconds to 2.9 seconds if we cast 
the vector size to int before feeding it to the HashState.
msg6127 (view) Author: jendrik Date: 2017-01-16.11:43:56
All data comes from a single experiment.
msg6126 (view) Author: malte Date: 2017-01-16.11:34:50
Did you generate all data that you mention in your previous message in the same
experiment, or are they combined from different experiments? I'm asking because,
looking at several experiments that test the same revision, it looks like we're
seeing huge grid noise effects in some domains.
msg6125 (view) Author: jendrik Date: 2017-01-16.11:12:00
I think the tables comparing v5 and v6 show 

* peak memory is almost unaffected by the change
* runtime is more or less unaffected in 32-bit mode: 28 domains have higher total_time score and 17 have lower total_time score.
* runtime increases in 64-bit mode: 11 domains have higher total_time score and 34 have lower total_time score.

Here are some relative scatter plots comparing "total_time" for v5 and v6:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-relative-scatter-blind-release32-issue693-v5-vs-issue693-v6-total_time.png
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-relative-scatter-blind-release64-issue693-v5-vs-issue693-v6-total_time.png

I'm surprised that only 64-bit is adversely affected, since we use utils::get_hash64() in both versions. The only difference I can see 
is that we need to copy a bigger pointer when constructing PackedStateWrapper in 64-bit mode. However, I doubt that this can explain 
the timing effects.
msg6111 (view) Author: jendrik Date: 2017-01-13.11:10:49
Here are the results for blind search using a 1 minute timeout. The only difference to "default" is 
that the state registry uses the new hash function:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v5-vs-issue693-v6-release32.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v5-vs-issue693-v6-release64.html
msg6109 (view) Author: malte Date: 2017-01-13.00:09:48
In fact, even a 1-minute timeout would be enough.
msg6108 (view) Author: malte Date: 2017-01-13.00:09:25
A short timeout is enough. 32-bit and 64-bit sounds good, though.
msg6107 (view) Author: jendrik Date: 2017-01-13.00:06:47
For the following experiments, should we test both 32-bit and 64-bit or only one of 
the two? Should I use a 5 minute or 30 minute timeout? I'm asking, because it 
sounds like a lot of experiments.
msg6101 (view) Author: malte Date: 2017-01-12.22:31:13
Sounds good! I had a look at the experiment and at the pull request for v5; both
look good.
msg6099 (view) Author: jendrik Date: 2017-01-12.22:18:46
I chose a slightly different path, but ended up with the desired result. I have added the old hash 
function to v4 and kept the new hash functions. The new hash functions are unused for now. This code 
version is tagged "v5".

I have run an experiment with a search timeout of 5 minutes checking that nothing changes and I think 
the results confirm this:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v5-base-vs-issue693-v5-release32.html

I'll change the hash function for the state registry next, since it is one of the most important 
places where we hash.
msg6087 (view) Author: jendrik Date: 2017-01-12.00:06:58
Your plan sounds good, I'll try it tomorrow.
msg6086 (view) Author: malte Date: 2017-01-11.20:58:40
OK, I've now stared at the profiles for an hour or so; time to come up with a
plan. (I'm not sure if the kcachegrind numbers will be much more insightful; I
think any profile will be hard to analyze because we've changed so many places.)

To the extent that we can trust instruction counts to accurately reflect
runtime, we slow down from 18.8 billion units to 20.4 billion units.

Of the 1.6 billion difference, about 0.47 billion (so around 30% of the
additional time) seem to be attributable to hashing states in the state
registry, and 0.3 billion (around 20%) to hashing ScalarEvaluator pointers in
the evaluation context. But I wouldn't really trust these numbers because they
ignore cache locality, and also they account for only half of the penalty.

I'd really like to study this a bit further, and for this I think it would be
good not to change so many things at the same time. Looking at the diff between
v4 and v4-base, there are three main changes:

1. getting rid of the old hash functions
2. introducing the new hash functions and HashSet/HashMap aliases
3. using the new hash tables everywhere

I would like to split this up a bit so that we can test the performance impact
in smaller increments. My suggestion is to have both hash table implementations
available in this branch simultaneously, so that we can switch them out
individually more easily.

One possibility is to split off a new experimental branch off issue693-v4-base
where we proceed as follows:

1. Introduce the new hash functions, but not by overwriting the existing hash.h,
but by adding to it. Let's call the new aliases utils::JenkinsHashSet and
utils::JenkinsHashMap, and also introduce aliases in hash.h for our old hash
maps, say utils::OldHashSet and utils::OldHashMap or whatever.

2. Use OldHashSet and OldHashMap everywhere where we currently use hash_set and
hash_map.

3. Do a short experiment/profile to verify that this has no performance impact.
It shouldn't, but better safe than sorry because the above is quite a large code
change.

We're then in a position to test the impact of the new hash tables with
individual changes (one line in most cases) for each major point where we use
hash tables.

What do you think? Going from what we currently have to the step after 2.
shouldn't be too difficult, I hope. We'd basically need to globally search and
replace utils::HashMap and utils::HashSet by utils::OldHashMap and
utils::OldHashSet everywhere outside hash.h, and in hash.h we'd need to re-add
the old code and add the OldHashMap and OldHashSet aliases.
msg6083 (view) Author: malte Date: 2017-01-11.19:49:38
I had a look. We're optimizing rather low-level code here, so I'm worried that
the simulation options used for these profiles were too coarse. I assume you
used callgrind? (qcachegrind showed me that it estimated the number of clock
cycles as the number of instructions read. I think it would only do that if
callgrind was used, not if cachegrind was used.) Can you create updated profiles
with cachegrind?
msg6082 (view) Author: malte Date: 2017-01-11.19:35:00
I'm looking at the profiles at the moment -- don't know yet.
msg6081 (view) Author: jendrik Date: 2017-01-11.19:34:27
What's your suggestion for how we continue here?
msg6080 (view) Author: malte Date: 2017-01-11.19:16:37
One factor that influences whether a function is inlined is how large it is. I
assume the old hash functions were much smaller than the new ones.
msg6069 (view) Author: jendrik Date: 2017-01-11.11:49:04
I have profiled miconic:s8-4.pddl for both v4 and v4-base for blind
search. I'll attach the resulting profiles. Looking at the children of
EagerSearch::step(), which uses almost all of the time, we find that
only the following functions use more relative time in v4 compared to
v4-base:

1) get_successor_state:        24.15% -> 24.55%
   hash packed states
2) OpenList::insert:           4.27% -> 5.02%
   hash ScalarEvaluator pointers for HeuristicCache
3) utils::HashSet::count():    ?% -> 3.39%
   hash GlobalOperator pointers for testing preferredness

It seems for 3) the function is inlined in the old code version, but
not in the new one. I have no idea why that happens though. Since the
hash set of preferred operators is always empty, it shouldn't make a
(big) difference anyway.
msg6062 (view) Author: malte Date: 2017-01-10.18:34:29
Thanks for the update! I think this confirms the suspicion that for very simple
types and hash functions that are used in inner loops with little going on apart
from adding stuff to the hash tables, the overhead of the new hash functions can
hurt us, which makes sense. We hash everything as if it were a generic sequence,
and for very small keys, the relative overhead for this is high.

At the end, we may have to make a decision between hash quality and hashing
speed. But let's gather more data first.
msg6060 (view) Author: jendrik Date: 2017-01-10.18:29:12
Before reading your last comment, I tried the following:

I had a closer look at the results. The slowdown is very pronounced for the 
systematic patterns. E.g., the total_time for parcprinter-opt11-strips tasks get 
doubled in 32-bit mode. I profiled the systematic pattern generation on p05.pddl 
from that domain. It turns out that almost all of the time before the search starts 
is spent doing dominance pruning, i.e., in the function collection_dominates() in 
dominance_pruning.cc. The only hashing that is done there is to hash pairs of 
pointers. If it had been a primitive type with an obvious hash value (identity) 
that was hashed, I would have brought up adding template specializations for 
primitive types for utils::get_hash() (but I wouldn't have liked that solution). 
Since it's hashing pointer pairs that's causing a slowdown, I'm not sure how to 
proceed.

I'll profile a miconic task for blind search now.
msg6059 (view) Author: malte Date: 2017-01-10.18:15:21
The performance is a bit worse than I'd hope for. I'd rather investigate this a
bit more closely before merging. I suggest focusing on blind search first, since
it should be the configuration where it should be easiest to pinpoint where the
performance change comes from (as it only uses few planner components).

For blind search, a good domain to focus on is miconic because it is the domain
with the strongest negative effect on score_total_time by far (although this
also has to do with the fact that is has so many tasks), and because the domain
is quite simple and the tasks scale in a straight-forward way. 

On the larger tasks, blind search runtime increases by around 10-15% in Miconic.
My first instinct is that with blind search, the only important place where the
new hash code comes into play is hashing for the state registry, and the most
obvious explanation would be that the new hash function is slower than the old
one. I'd like to confirm whether or not this is true, and if this is the case, I
would like to see if this is because the Jenkins hash is intrinsically slower,
or because there is an overhead due to our design with HashState, feed and
friends. (I'd hope that almost all of this, but not quite all of this, would be
optimized away.)

Can you run a profile on a suitable Miconic tasks and see if that generates any
insights?
msg6034 (view) Author: jendrik Date: 2017-01-08.10:37:03
Here are the results of the experiment comparing the new utils::HashMap and utils::HashSet (v4) to its 
ancestor on default (v4-base):

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v4-base-vs-issue693-v4-release32.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v4-base-vs-issue693-v4-release64.html

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v4-base-vs-issue693-v4-release32.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v4-base-vs-issue693-v4-release64.html

I'd say memory looks ok for 32-bit and 64-bit. However, the code seems to get a little slower on average.
msg6008 (view) Author: malte Date: 2017-01-03.17:09:18
Sounds good! I've added a few more comments.
msg6007 (view) Author: jendrik Date: 2017-01-03.16:36:11
OK, I'll wait with the experiments until we're happy with the code.
msg6006 (view) Author: malte Date: 2017-01-03.15:52:35
I think a new experiment comparing v3 to its ancestor on default would be good;
I don't think we can meaningfully interpret these numbers.

BTW, we should really use different configurations for satisficing search, as
none of these configurations is a good one. lmcount is a poor heuristic on its
own, and no configuration without preferred operators is competitive with a
configuration with preferred operators. I would include --alias lama-first and
then 2-3 variations of eager_greedy or lazy_greedy with preferred operators. For
example, eager_greedy with FF and preferred operators, lazy_greedy with FF and
preferred operators, and perhaps one more configuration using cg, cea or add.
msg6005 (view) Author: jendrik Date: 2017-01-03.15:43:46
I have run experiments comparing the new hashing functions before (v2) and after the addition of 
the HashState class (v3). Unfortunately, I missed that in between the two versions the default 
branch was merged into the issue branch. If you want, I can run experiments again comparing the 
last "default" revision that was merged into the issue branch against the last revision on the 
issue branch. 

In any case, here are the results:

opt:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v2-vs-issue693-v3-release32.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-opt-issue693-v2-vs-issue693-v3-release64.html

sat:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v2-vs-issue693-v3-release32.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-sat-issue693-v2-vs-issue693-v3-release64.html

Here is the diff between the two revisions:

https://bitbucket.org/jendrikseipp/downward/branches/compare/issue693-v3..issue693-v2#diff
msg6004 (view) Author: malte Date: 2017-01-03.15:42:15
Sounds good!

In the message, perhaps you can show what the alternatives are. For example, we
cannot write "unordered_set<vector<int>>" because std::hash doesn't exist for
std::vector<int>.

One alternative is to define our own template and write
"utils::hash_set<vector<int>>". The other alternative is to manually specify a
hash function for unordered_set. It would be good to include what this would
look like in the message to downward-dev.
msg6003 (view) Author: jendrik Date: 2017-01-03.15:28:25
OK, then the only question is whether we want to add utils::hash_set and 
utils::hash_map templates (see msg5998), right? I'll write to downward-dev about 
this.
msg6002 (view) Author: malte Date: 2017-01-03.15:23:23
Yes, this rules out defining std::hash for types like vector<int> or
pair<int,int>. I've also looked at other sources to confirm this.
msg6001 (view) Author: jendrik Date: 2017-01-03.15:11:43
http://en.cppreference.com/w/cpp/language/extending_std: "It is allowed to add 
template specializations for any standard library template to the namespace std 
only if the declaration depends on a user-defined type ..."

If I understand this correctly we're not allowed to redefine the std::hash() 
struct. Do you agree?
msg6000 (view) Author: malte Date: 2017-01-02.19:38:53
> Feel free to add the high-level documentation.

Done. If we stop providing a catch-all implementation for std::hash (see
msg5998), then the last comment in hash.h needs to be updated.
msg5999 (view) Author: jendrik Date: 2017-01-02.19:12:14
All I could find is http://en.cppreference.com/w/cpp/language/extending_std . Not 
sure whether it explains the error we're seeing though.
msg5998 (view) Author: malte Date: 2017-01-02.19:01:36
I was wondering before if it was preferable not to inject our hash functions
into the std namespace. Then the error should go away, and we can be sure that
we don't accidentally use a potentially poor stdlib hash in places where we want
to use our own hash.

Not using std::hash would mean that we have to specify the extra template
arguments when using std::unordered_set or std::unordered_map, but we could
encapsulate this by providing our own template that picks the right arguments.
For example, we could call our template utils::hash_set and utils::hash_map. 

If we want to go this route, I'd suggest discussing it briefly on downward-dev.
msg5997 (view) Author: jendrik Date: 2017-01-02.18:54:44
It compiles on gcc 4.8 (bitbucket pipelines) and gcc 5.4 (my laptop). Judging by 
the error message it seems that we're not allowed to overwrite the std::hash struct 
in newer gcc versions. I'll do a google search for this.
msg5996 (view) Author: malte Date: 2017-01-02.18:45:10
The current code (700f56c30ef6) doesn't compile on my machine at home (gcc
6.2.0). Does it compile for you? I'm attaching what the build process writes to
stderr.
msg5968 (view) Author: jendrik Date: 2016-12-21.21:00:41
Experiments are running. Feel free to add the high-level documentation.
msg5964 (view) Author: jendrik Date: 2016-12-21.16:59:10
I updated the pull request with the implementation we discussed. Once you're happy 
with the implementation I can run experiments.
msg5899 (view) Author: malte Date: 2016-12-17.03:50:21
Are you sure it only works well on 64-bit architectures? From the blog
post/description, I only see that it requires 64-bit math, but that's generally
supported in C++11. 64-bit math can be quite inefficient on a 32-bit
architecture, and we would probably see this in the runtime with a 32-bit
compile, but we're planning to discontinue 32-bit compiles, and x64 has been
around for 13 years now.

But actually, reading more about this, I wonder whether all this (either lookup3
or SpookyHash) isn't complete overkill anyway, and we might be better served
with something simple such as the one-at-a-time hash, which would have a much
lower implementation complexity and could fit well into our current framework of
combining functions for hashing functions for different types.

> I'd propose to focus on the hash function for the state registry in this issue
> and consider 64-bit alternatives once we've made the jump to 64-bit compiles.
> What do you think?

It's not particularly clean to use two completely different approaches to
hashing in two different parts of the code. If we think our current default hash
distributes keys badly, I'm not sure it's a good idea to use it everywhere in
the code. So this would make me a bit sad, but I would probably survive it. ;-)
msg5897 (view) Author: jendrik Date: 2016-12-16.19:18:46
Here is the link for comparing v1 and v2:

https://bitbucket.org/jendrikseipp/downward/branches/compare/issue693-v2..issue693-v1#diff

I already saw SpookyHash when looking for suitable hash functions. Unfortunately, it only works well on 64-bit 
architectures (see SpookyHash homepage and http://blog.reverberate.org/2012/01/state-of-hash-functions-2012.html). 

I'd propose to focus on the hash function for the state registry in this issue and consider 64-bit alternatives once 
we've made the jump to 64-bit compiles. What do you think?
msg5896 (view) Author: malte Date: 2016-12-16.16:32:00
Regarding the unsigned_int vs. size_t question, after some more thought I think
it would actually be much more future-proof if we computed a 64-bit (or larger)
hash rather than a 32-bit hash. A 64-bit hash value can be used for a 32-bit
hash table, but not vice versa.

I'm sorry to suggest extra work, but I think it would be good if our hash
functions could at least in principle support large hash tables, even if our
state registries are currently limited to 2^31 - 1 states. It is not that
unlikely that a few years down the road we will look into external search
algorithms or will occasionally want to run experiments with significantly more
RAM. And a 64-bit hash function would also enable us to continue using the
standard library's hash tables in places where we don't want to customize.

The author of the hash function we're using writes "More recently I published
SpookyHash, which is many times faster than lookup3 on 64-bit platforms, even
for short keys." SpookyHash provides a 128-bit hash that can also be used as a
32-bit hash or 64-bit hash. See
http://www.burtleburtle.net/bob/hash/index.html#lookup.

As an added bonus, it looks like SpookyHash can support potentially
non-contiguous data nicely via the Init/Update/Final methods. This would allow
us to implement hash functions for something like vector<FactPair> without
non-portable reinterpret_casts, and I think we could not do that with lookup3.
msg5895 (view) Author: malte Date: 2016-12-16.16:11:37
Is there a way to look at a diff of v1 vs. v2 in bitbucket?

The numbers in the experiment look OK to me. I'm not sure the experimental
trends are more than noise. For example, if you compare score_total_time of base
and v2 domain by domain, then base wins 26 time and v2 wins 18 times. That's not
a big deal.
msg5889 (view) Author: jendrik Date: 2016-12-14.21:21:21
The old hash_combine() function is currently used to hash

- State objects (used by diverse_potentials())
- patterns
- vector<FactPair> in DTGFactory

Our old hash functions in hash.h operate on size_t, but the new hash function 
uses unsigned int. How should we go about this?
msg5888 (view) Author: jendrik Date: 2016-12-14.18:59:33
I'll look into where our hash function is currently used.

Here are the experiment results:

v1: hash function as in issue213-v5.
v2: hash function as in issue213-v5, but adapted to our code style.

http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v2-blind-issue693-base-issue693-v1-compare.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v2-blind-issue693-base-issue693-v2-compare.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue693-v2-blind-issue693-v1-issue693-v2-compare.html
msg5884 (view) Author: malte Date: 2016-12-14.17:50:43
I'm done commenting on bitbucket for now, but one more major comment: rather
than adding a new hash function, I'd prefer getting rid of our old hash function
entirely, replacing it with the new one. Is that possible? (I guess it depends
on which types of objects we currently hash.)
msg5879 (view) Author: jendrik Date: 2016-12-14.10:12:40
I'll run experiments now.
msg5878 (view) Author: jendrik Date: 2016-12-14.10:12:19
I made a pull request at https://bitbucket.org/jendrikseipp/downward/pull-requests/66
msg5877 (view) Author: jendrik Date: 2016-12-14.09:37:54
Split off from issue213.
History
Date User Action Args
2017-07-09 12:11:50maltesetmessages: + msg6425
2017-07-09 12:05:05jendriksetstatus: reviewing -> resolved
messages: + msg6424
2017-07-08 15:38:25maltesetmessages: + msg6423
2017-07-07 19:45:28jendriksetstatus: testing -> reviewing
messages: + msg6419
2017-07-07 18:44:59maltesetmessages: + msg6418
2017-07-06 09:18:50jendriksetmessages: + msg6412
2017-07-06 00:08:57jendriksetmessages: + msg6410
summary: TODO: use the mini-benchmark to reduce the overhead of the new hash function. ->
2017-05-07 17:54:51maltesetmessages: + msg6349
2017-05-07 17:48:21jendriksetmessages: + msg6347
2017-04-29 09:23:41jendriksetmessages: + msg6295
summary: TODO: use the mini-benchmark to reduce the overhead of the new hash function. -> TODO: use the mini-benchmark to reduce the overhead of the new hash function.
2017-04-28 21:35:00maltesetmessages: + msg6286
2017-04-28 21:09:42jendriksetmessages: + msg6284
2017-04-27 23:23:43jendriksetsummary: TODO: use the mini-benchmark to reduce the overhead of the new hash function.
2017-01-17 17:00:21maltesetmessages: + msg6131
2017-01-17 15:50:05jendriksetmessages: + msg6130
2017-01-17 14:42:29maltesetmessages: + msg6129
2017-01-17 12:18:23jendriksetmessages: + msg6128
2017-01-16 11:43:56jendriksetmessages: + msg6127
2017-01-16 11:34:50maltesetmessages: + msg6126
2017-01-16 11:12:00jendriksetmessages: + msg6125
2017-01-13 11:10:49jendriksetmessages: + msg6111
2017-01-13 00:09:48maltesetmessages: + msg6109
2017-01-13 00:09:25maltesetmessages: + msg6108
2017-01-13 00:06:47jendriksetmessages: + msg6107
2017-01-12 22:31:13maltesetmessages: + msg6101
2017-01-12 22:18:46jendriksetmessages: + msg6099
2017-01-12 00:06:58jendriksetmessages: + msg6087
2017-01-11 20:58:40maltesetmessages: + msg6086
2017-01-11 19:49:39maltesetmessages: + msg6083
2017-01-11 19:35:00maltesetmessages: + msg6082
2017-01-11 19:34:27jendriksetmessages: + msg6081
2017-01-11 19:16:37maltesetmessages: + msg6080
2017-01-11 11:49:12jendriksetfiles: + v4-blind
2017-01-11 11:49:04jendriksetfiles: + v4-base-blind
messages: + msg6069
2017-01-10 18:34:29maltesetmessages: + msg6062
2017-01-10 18:29:12jendriksetmessages: + msg6060
2017-01-10 18:15:21maltesetmessages: + msg6059
2017-01-08 10:37:03jendriksetmessages: + msg6034
2017-01-03 17:09:18maltesetmessages: + msg6008
2017-01-03 16:36:11jendriksetmessages: + msg6007
2017-01-03 15:52:35maltesetmessages: + msg6006
2017-01-03 15:43:46jendriksetmessages: + msg6005
2017-01-03 15:42:15maltesetmessages: + msg6004
2017-01-03 15:28:25jendriksetmessages: + msg6003
2017-01-03 15:23:23maltesetmessages: + msg6002
2017-01-03 15:11:43jendriksetmessages: + msg6001
2017-01-02 19:38:53maltesetmessages: + msg6000
2017-01-02 19:12:14jendriksetmessages: + msg5999
2017-01-02 19:01:36maltesetmessages: + msg5998
2017-01-02 18:54:44jendriksetmessages: + msg5997
2017-01-02 18:45:10maltesetfiles: + compile.err
messages: + msg5996
2016-12-21 21:00:53jendriksetstatus: reviewing -> testing
2016-12-21 21:00:41jendriksetmessages: + msg5968
2016-12-21 20:05:13jendriksettitle: new hash function for state registry -> new hash function
2016-12-21 16:59:10jendriksetmessages: + msg5964
2016-12-17 03:50:21maltesetmessages: + msg5899
2016-12-16 19:18:46jendriksetmessages: + msg5897
2016-12-16 16:32:00maltesetmessages: + msg5896
2016-12-16 16:11:37maltesetmessages: + msg5895
2016-12-14 21:21:21jendriksetmessages: + msg5889
2016-12-14 18:59:34jendriksetmessages: + msg5888
2016-12-14 17:50:43maltesetmessages: + msg5884
2016-12-14 10:12:40jendriksetmessages: + msg5879
2016-12-14 10:12:19jendriksetstatus: unread -> reviewing
messages: + msg5878
2016-12-14 09:37:54jendrikcreate