Issue 731: remove legacy hash functions - Fast Downward issue tracker

Title	remove legacy hash functions
Priority	feature	Status	resolved
Superseder		Nosy List	jendrik, malte
Assigned To	jendrik	Keywords
Optional summary

Created on 2017-07-06.00:35:45 by jendrik, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
test_hash.cc	malte, 2017-09-01.14:52:31	text/x-c++src

Messages
msg7824 (view)	Author: malte	Date: 2018-10-01.13:08:47
Looks good, thanks! :-)
msg7823 (view)	Author: jendrik	Date: 2018-10-01.12:57:10
Done. Thanks for catching this!
msg7822 (view)	Author: malte	Date: 2018-10-01.12:47:45
Hi Jendrik, that change is not the right thing on 64-bit Linux and breaks the hashing of very large vectors. ("Break" is a strong word, but it no longer satisfies what we describe regarding prefix codes etc.) It would be more future-proof to cast to uint64_t instead of unsigned int. Can we do that? I would also add a TODO observing that using uint64_t is wasteful on 32-bit platforms and point to msg7812 for an explanation why we do it like this.
msg7818 (view)	Author: jendrik	Date: 2018-09-30.22:37:58
I had to make one more change (http://hg.fast-downward.org/rev/e7a0a2510be7), but now the MacOS buildbot is happy again. We should look into issue847 soon to remove the need for leaking the private values of TaskID and OperatorID.
msg7817 (view)	Author: jendrik	Date: 2018-09-30.20:57:17
I like your "feed_hash" idea. I think this is similar to what the abseil guys recommend for their hashing mechanism (https://abseil.io/tips/152). I opened issue847 for this. For a short-term solution, I have changed OperatorID::hash() to return an int and TaskID::hash() to return uint64_t.
msg7815 (view)	Author: malte	Date: 2018-09-30.17:41:39
One question worth asking is why OperatorID::hash returns a size_t when internally it uses an int, and our hashers can actually deal with ints directly. This may be worth changing. With our own hashing scheme, "hash" is not a method that needs to have the same interface for all classes because it is not part of the protocol. The function that is part of the protocol is "feed". In fact, the amount of boilerplate we get because we currently have to implement both a feed function and a hash method (because computing the hash method requires access to the internals) is not good. We decided to go with a global function because this gives more generality, but this function often refers to an actual method to have access to private state. In the std library, this is somewhat analogous to the role of the function std::begin vs. the begin methods of many classes. One idea, following the stdlib, is to have a default template implementation of feed for typename T that delegates to a method T::feed_hash. Then implementors only need to provide that method, and in the case of OperatorID, we could directly feed the int. We would then no longer implement any "hash()" methods for our classes or global "feed()" functions, only the "feed_hash" methods, reducing boilerplate and having more choice in which types to use. If we still need to feed a size_t somewhere, I suppose we have to come up with a solution, but I would wait if we need that first. If we do, we probably need an #ifdef or similar kind of conditional compilation.
msg7812 (view)	Author: jendrik	Date: 2018-09-29.20:07:22
I had a brief look and think the problem is exactly the same as the one described here: https://stackoverflow.com/questions/11603818/ , i.e., none of uint32_t and uint64_t are defined using the same type as size_t. Unfortunately, the suggested solution (adding an overload for "unsigned long") doesn't work on Linux since on 64-bit Linux "uint64_t" is defined as "unsigned long", meaning we would define the same function twice there (resulting in a compilation error). Also, we want to know the size of the input type, which can differ between platforms for "unsigned long", if I remember correctly. I am not sure how to best handle this problem. Do you have an idea? Here are the types that I looked up using the method from https://stackoverflow.com/a/12490756/1277298 . Linux 32bit: typedef unsigned int size_t; typedef unsigned int uint32_t; typedef unsigned long long int uint64_t; Linux 64bit: typedef long unsigned int size_t; typedef unsigned int uint32_t; typedef unsigned long int uint64_t; Darwin from stackoverflow (probably 64bit): typedef unsigned long __darwin_size_t; typedef unsigned int uint32_t; typedef unsigned long long uint64_t;
msg7811 (view)	Author: malte	Date: 2018-09-29.16:03:21
Sounds good, but for what it's worth, I don't think access to the Mac is needed to understand the error messages, at least the first ones. It seems to be something reasonably straightforward about ambiguous overloads. (If I recall this correctly.) Let me know if you need any support there.
msg7810 (view)	Author: jendrik	Date: 2018-09-28.22:29:15
I'll look into the buildbot issue on Monday in the office when I can directly test things on the Mac.
msg7809 (view)	Author: malte	Date: 2018-09-28.20:20:35
Looks like there are some issues with the buildbot that need to be addressed. Once this is all resolved, I think it would be good to let rest of the group know about the new hash functionality and how/when to use them. (I'd recommend sending an internal email, not something to the fast-downward list.)
msg7808 (view)	Author: jendrik	Date: 2018-09-28.19:59:00
I made one more change to hash the new TaskID objects with the new hash function and merged the issue branch.
msg7751 (view)	Author: malte	Date: 2018-09-21.16:30:08
Done, looks good to merge. I'm slightly in two minds about some of the score_total_time penalties in satisficing configurations, but that's not a reason to stand in the way of progress. ;-)
msg7750 (view)	Author: jendrik	Date: 2018-09-21.16:27:59
Sounds good.
msg7749 (view)	Author: malte	Date: 2018-09-21.16:20:59
I am looking at this now because it is at the front of the review queue. Due to the many changes on default, this has some merge conflicts now in the pull request, but nothing that looks too complicated to resolve, so I'll just have a look at the code again, and then another look at the experiments. The way I understand the history, v4 and v5 are identical except that v5 includes a merge from default that affects iPDB, so it's OK that we don't have satisficing experiments for v5.
msg6729 (view)	Author: jendrik	Date: 2017-12-06.08:43:26
Now that issue735 is merged, I ran experiments for the PDB configurations again: http://ai.cs.unibas.ch/_tmp_files/seipp/issue731-v5-opt-issue731-v5-base-issue731-v5-compare.html Do you think we can merge this?
msg6496 (view)	Author: malte	Date: 2017-09-01.16:55:10
I've looked at the dominance pruning code, and I think there are a few things that could be tried. I'll comment some more over at issue735.
msg6494 (view)	Author: jendrik	Date: 2017-09-01.16:49:46
Interesting! I pushed your code to the repo. Where do we go from here?
msg6493 (view)	Author: malte	Date: 2017-09-01.15:52:24
> The new test code should already be pushed. Thanks! With larger hash sets (10M entries), I get a much larger effect, and it is again larger if after insertions we do 10 rounds of lookups for every key. For the read-then-lookup test, the old hash table is more than 10 times faster on my machine: Running nothing 1 times: 1e-06s Running insert sequential int with BoostHash 1 times: 0.885316s Running insert sequential int with BurtleFeed 1 times: 5.48213s Running insert scrambled int with BoostHash 1 times: 6.22216s Running insert scrambled int with BurtleFeed 1 times: 6.28391s Running insert, then read sequential int with BoostHash 1 times: 3.14857s Running insert, then read sequential int with BurtleFeed 1 times: 34.718s Running insert, then read scrambled int with BoostHash 1 times: 20.511s Running insert, then read scrambled int with BurtleFeed 1 times: 25.3854s Running nothing 1 times: 0s Running insert sequential int with BoostHash 1 times: 1.85816s Running insert sequential int with BurtleFeed 1 times: 5.49561s Running insert scrambled int with BoostHash 1 times: 6.29328s Running insert scrambled int with BurtleFeed 1 times: 6.40746s Running insert, then read sequential int with BoostHash 1 times: 3.15231s Running insert, then read sequential int with BurtleFeed 1 times: 35.194s Running insert, then read scrambled int with BoostHash 1 times: 20.461s Running insert, then read scrambled int with BurtleFeed 1 times: 25.3585s For reference, here is the experiment code I used (no changes outside main): int main(int, char **) { const int REPETITIONS = 2; const int NUM_CALLS = 1; const int NUM_INSERTIONS = 10000000; const int NUM_READ_PASSES = 10; for (int i = 0; i < REPETITIONS; ++i) { benchmark("nothing", NUM_CALLS, [] () {}); cout << endl; benchmark("insert sequential int with BoostHash", NUM_CALLS, [&]() { unordered_set<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(i); } }); benchmark("insert sequential int with BurtleFeed", NUM_CALLS, [&]() { utils::HashSet<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(i); } }); cout << endl; benchmark("insert scrambled int with BoostHash", NUM_CALLS, [&]() { unordered_set<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(scramble(i)); } }); benchmark("insert scrambled int with BurtleFeed", NUM_CALLS, [&]() { utils::HashSet<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(scramble(i)); } }); cout << endl; benchmark("insert, then read sequential int with BoostHash", NUM_CALLS, [&]() { unordered_set<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(i); } for (int j = 0; j < NUM_READ_PASSES; ++j) { for (int i = 0; i < NUM_INSERTIONS; ++i) { s.count(i); } } }); benchmark("insert, then read sequential int with BurtleFeed", NUM_CALLS, [&]() { utils::HashSet<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(i); } for (int j = 0; j < NUM_READ_PASSES; ++j) { for (int i = 0; i < NUM_INSERTIONS; ++i) { s.count(i); } } }); cout << endl; benchmark("insert, then read scrambled int with BoostHash", NUM_CALLS, [&]() { unordered_set<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(scramble(i)); } for (int j = 0; j < NUM_READ_PASSES; ++j) { for (int i = 0; i < NUM_INSERTIONS; ++i) { s.count(i); } } }); benchmark("insert, then read scrambled int with BurtleFeed", NUM_CALLS, [&]() { utils::HashSet<int> s; for (int i = 0; i < NUM_INSERTIONS; ++i) { s.insert(scramble(i)); } for (int j = 0; j < NUM_READ_PASSES; ++j) { for (int i = 0; i < NUM_INSERTIONS; ++i) { s.count(i); } } }); cout << endl; } return 0; }
msg6492 (view)	Author: jendrik	Date: 2017-09-01.15:32:39
The new test code should already be pushed.
msg6491 (view)	Author: malte	Date: 2017-09-01.15:31:07
That's not nearly a large enough effect, though. I expect that the effect is much larger for much larger hash sets. Can you push the test code to the repository?
msg6490 (view)	Author: jendrik	Date: 2017-09-01.15:30:37
Regarding msg6487, I opened issue735 for analyzing dominance pruning and added your second point to the meeting agenda.
msg6488 (view)	Author: jendrik	Date: 2017-09-01.15:25:54
Your prediction holds. The Boost hash profits from sequential inserts: Running insert sequential int with BoostHash 100000 times: 0.797894s Running insert sequential int with BurtleFeed 100000 times: 0.924217s Running insert scrambled int with BoostHash 100000 times: 0.858256s Running insert scrambled int with BurtleFeed 100000 times: 0.922044s
msg6487 (view)	Author: malte	Date: 2017-09-01.15:07:11
PS: the fact that we have long running configurations that spend more than 90% of their time on dominance pruning suggests that we should look a bit more closely whether dominance pruning is always a good idea, whether we should set a time limit for it, etc. Can you open a separate issue for this? We might be crippling our PDB-based planners in some domains. Woodworking with astar(cpdbs(systematic(2))) would make a good test domain for this. More generally, it is not ideal that we find out such bottlenecks more or less by accident. Perhaps we can discuss how we can do things like these better in the future, e.g. by having the results run include a categorical parameter that explains what the bottleneck was and a numerical parameter that explains how much time was spent on this bottleneck. To be really useful, this should also adequately cover the case where we don't solve a task, i.e., where the bottleneck leads to running out of time and memory. I think this shouldn't be too difficult as long as the log output is clear enough to tell us at regular checkpoints what the code is currently doing, at which point it started doing it, and perhaps also what the peak memory usage at this point was.
msg6486 (view)	Author: malte	Date: 2017-09-01.14:52:31
The benchmark code looks fine. One possible explanation is that the old hash function leads to better cache locality for certain access patterns. For hashing pair<X , Y > , the following functions in the old hash code are relevant (removing comments and some namespace stuff, otherwise taken from the pull request): template<typename T> inline void hash_combine(size_t &hash, const T &value) { std::hash<T> hasher; hash ^= hasher(value) + 0x9e3779b9 + (hash << 6) + (hash >> 2); } template<typename TA, typename TB> struct hash<std::pair<TA, TB>> { size_t operator()(const std::pair<TA, TB> &pair) const { size_t hash = 0; utils::hash_combine(hash, pair.first); utils::hash_combine(hash, pair.second); return hash; } }; This is more-or-less equivalent to the following computation for `pair<X , Y > p`, assuming "using namespace std": size_t h = 0; h ^= hash<X >()(pair.first) + 0x9e3779b9 + (h << 6) + (h >> 2); h ^= hash<Y >()(pair.second) + 0x9e3779b9 + (h << 6) + (h >> 2); return h; This can be computed quickly, but the micro-benchmark shows that this shouldn't matter very much, so this looks like a red herring. I think the performance difference is not because of how fast the hash values are computed but because of what the hash values are. There are two main ways in which the distribution of hash values can affect performance: 1) If there are many collisions, performance degrades. 2) If many similar hash values are considered shortly after one another, we can get better cache locality when accessing the hash buckets. Our new code tries to scramble the hash values much more, which means that 1) isn't a likely explanation, but 2) is. I checked that with gcc (at least on my machine, code attached), hash<void *>()(x) just returns x itself. Ditto for ints. So the code above simplifies to: size_t h = 0; h ^= reinterpret_cast<size_t>(pair.first) + 0x9e3779b9 + (h << 6) + (h >> 2); h ^= reinterpret_cast<size_t>(pair.second) + 0x9e3779b9 + (h << 6) + (h >> 2); return h; For many kinds of code, it's quite to have consecutive accesses to "consecutive pairs", i.e., something like the pair <p, q> followed by the pair <p, q + 1> (where "+ 1" is pointer arithmetic, so the numerical value actually increases by sizeof Y). The code above doesn't really scramble pair.second very much, so these will often end up with very similar hash values. For example, if sizeof Y = 4 and (0x9e3779b9 + (h << 6) + (h >> 2) doesn't have the 4-bit set, the two final hash values will only differ by 4, and hence we'll sequentially access hash buckets in close proximity, giving us good memory locality. To test this, I think it would be interesting to test hash tables with sequential access patterns. For example, a hash set where we insert consecutive integers from 0 to N-1, unscrambled, and then a separate experiment where we insert consecutive integers from 0 to N-1, scrambled. I haven't looked at the PDB code yet to see how the hashes are actually used there, so this could be a next step. Reasons why Woodworking might be more heavily affected than other domains could be either that it has more work than usually to do in dominance pruning, or that locality of access is more important there than in other domains, for example because in other domains the working set of this code is often small enough to more or less completely fit whatever cache (L1, L2, L3) is causing this, so that locality of access is less important. If that is the case, it might be difficult to reproduce the behaviour on machines with different cache sizes.
msg6484 (view)	Author: jendrik	Date: 2017-09-01.13:59:46
Let's try number 2. I compared the "before" and "after" logs for systematic(2) on a woodworking task and it seems that all of the additional runtime comes from dominance pruning. Dominance pruning hashes pointer pairs. To analyze our old and new hashing algorithm for hashing pointer pairs, I created a new microbenchmark (with the new hash.h file). Here are the results (of the second benchmark run): [...]/experiments/issue731/hash-microbenchmark$ export DOWNWARD_BITWIDTH=32 && make && ./benchmark${DOWNWARD_BITWIDTH} Running nothing 100000 times: 0.000215s Running insert pair<void , void > with BoostHash 100000 times: 0.9221s Running insert pair<void , void > with BurtleFeed 100000 times: 0.996635s Running count pair<void , void > with BoostHash 100000 times: 0.232802s Running count pair<void , void > with BurtleFeed 100000 times: 0.341353s These numbers don't explain the change in runtime we're seeing. I'm not sure if my approach to generating random pointer pairs is valid. You can inspect the code in the pull request (https://bitbucket.org/jendrikseipp/downward/pull-requests/73).
msg6474 (view)	Author: malte	Date: 2017-08-31.23:47:21
If it weren't for the systematic PDB results, I think this would be fine to merge. I'd like to look a bit more carefully at what goes on in this configuration. The results for woodworking-opt08-strips in this configuration, where runtime doubles for all solved tasks, give me pause. Plus 700 seconds just because of the new hash function? That's a lot. How does this happen when in our microbenchmarks the new hash function was at worst a few percentage points slower than the old one? (And of course the planner does other things than computing hash values, too.) So two paths to proceed: 1) Remove the changes that make the difference in systematic PDBs, verify that this configuration is back to old performance, merge, then look at the systematic PDB stuff in a separate issue. 2) Look at the systematic PDB stuff right now, merge everything in one go.
msg6473 (view)	Author: malte	Date: 2017-08-31.15:58:32
> (The unexplained errors for the satisficing experiment are due to VAL > running out of memory. We should probably discuss what to do about that > in our next Fast Downward meeting.) We could try the INVAL validator instead (https://github.com/patrikhaslum/INVAL). I'm not sure if anyone ever compared its validation time and memory requirements to VAL, but unless it's very difficult to get to run, perhaps we can at least manually run it on 2-3 cases where VAL is having difficulties and see how it behaves. It's still worth discussing this in the Fast Downward meeting, but having some data on INVAL beforehand could be useful for the discussion.
msg6472 (view)	Author: jendrik	Date: 2017-08-31.11:00:58
The results are in and I think they look better than expected: http://ai.cs.unibas.ch/_tmp_files/seipp/issue731-v4-opt-issue731-base-issue731-v4-compare.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue731-v4-sat-issue731-base-issue731-v4-compare.html (The unexplained errors for the satisficing experiment are due to VAL running out of memory. We should probably discuss what to do about that in our next Fast Downward meeting.) As usual we're seeing quite some noise: blind search is among the configurations with the highest overall difference for score_search_time (-1.37), but I don't see any place where new hash functions are used in this configuration. The most critical configuration is probably the one using systematic PDBs. It's total score_total_time decreases by 3.41. What are your thoughts on this?
msg6471 (view)	Author: malte	Date: 2017-08-28.15:28:34
Code looks good. Based on our earlier experiments, I'm skeptical about performance, so we might end up leading results for the intermediate commits as well. Let's wait and see.
msg6470 (view)	Author: jendrik	Date: 2017-08-28.11:55:52
I have removed the remaining legacy hash functions and created a pull request at https://bitbucket.org/jendrikseipp/downward/pull-requests/73 . Experiments are running. I have divided the PR into 4 commits. The current experiment compares the fourth commit (v4) to the base revision. If we can't interpret the combined revisions, we can make more granular experiments later.
msg6416 (view)	Author: jendrik	Date: 2017-07-07.18:29:40
You're right, thanks for correcting this! I updated the comment in the code for issue693 accordingly.
msg6415 (view)	Author: malte	Date: 2017-07-07.18:24:37
Do we really have evidence that they cause more collisions? I thought the reason you looked into other hash functions was that the existing ones often hashed to adjacent keys, which is bad for closed hashing (but good for open hashing because of better memory locality). For me, a main reason to get rid of them is because it's simpler to only use and maintain one set of user-defined hash functions rather than two.
msg6411 (view)	Author: jendrik	Date: 2017-07-06.00:35:45
We plan to remove the legacy hash functions in utils/hash.h since they produce too many collisions and implementing std::hash<T> for non-user-defined types T causes undefined behaviour (http://en.cppreference.com/w/cpp/language/extending_std). We want to use the new utils::HashMap and utils::HashSet for hashing types that have no specialization for std::hash.

History
Date	User	Action	Args
2018-10-01 13:08:47	malte	set	messages: + msg7824
2018-10-01 12:57:10	jendrik	set	messages: + msg7823
2018-10-01 12:47:45	malte	set	messages: + msg7822
2018-09-30 22:37:58	jendrik	set	messages: + msg7818
2018-09-30 20:57:17	jendrik	set	messages: + msg7817
2018-09-30 17:41:39	malte	set	messages: + msg7815
2018-09-29 20:07:22	jendrik	set	messages: + msg7812
2018-09-29 16:03:21	malte	set	messages: + msg7811
2018-09-28 22:29:15	jendrik	set	messages: + msg7810
2018-09-28 20:20:36	malte	set	messages: + msg7809
2018-09-28 19:59:00	jendrik	set	status: chatting -> resolved messages: + msg7808
2018-09-21 16:30:08	malte	set	messages: + msg7751
2018-09-21 16:27:59	jendrik	set	messages: + msg7750
2018-09-21 16:20:59	malte	set	messages: + msg7749
2017-12-06 08:43:26	jendrik	set	status: deferred -> chatting messages: + msg6729 summary: Blocked by issue735. ->
2017-09-01 22:18:45	jendrik	set	status: reviewing -> deferred summary: Blocked by issue735.
2017-09-01 16:55:10	malte	set	messages: + msg6496
2017-09-01 16:49:46	jendrik	set	messages: + msg6494
2017-09-01 15:52:24	malte	set	messages: + msg6493
2017-09-01 15:32:39	jendrik	set	messages: + msg6492
2017-09-01 15:31:07	malte	set	messages: + msg6491
2017-09-01 15:30:37	jendrik	set	messages: + msg6490
2017-09-01 15:25:54	jendrik	set	messages: + msg6488
2017-09-01 15:07:11	malte	set	messages: + msg6487
2017-09-01 14:52:31	malte	set	files: + test_hash.cc messages: + msg6486
2017-09-01 14:52:08	malte	set	messages: - msg6485
2017-09-01 14:52:03	malte	set	messages: + msg6485
2017-09-01 13:59:46	jendrik	set	messages: + msg6484
2017-08-31 23:47:21	malte	set	messages: + msg6474
2017-08-31 15:58:32	malte	set	messages: + msg6473
2017-08-31 11:00:58	jendrik	set	messages: + msg6472
2017-08-28 15:28:34	malte	set	messages: + msg6471
2017-08-28 11:55:52	jendrik	set	status: chatting -> reviewing assignedto: jendrik messages: + msg6470
2017-07-07 18:29:40	jendrik	set	messages: + msg6416
2017-07-07 18:24:37	malte	set	status: unread -> chatting messages: + msg6415
2017-07-06 00:35:45	jendrik	create

Issue731