Title clean up hash tables and functions
Priority wish Status resolved
Superseder Nosy List florian, jendrik, malte, pvonreth, silvan
Assigned To pvonreth Keywords
Optional summary

Created on 2014-05-21.19:07:56 by malte, last changed by malte.

File name Uploaded Type Edit Remove malte, 2015-03-31.08:25:38 text/x-c++src
msg4304 (view) Author: malte Date: 2015-07-02.01:02:20
> I assume that for other configurations, you did not get such large
> differences, did you?

If you look at the old discussion, there's a suggestion that hash_set/hash_map
might be slightly more memory-efficient than unordered_set/unordered_map. I
looked at all the cases in your experiment data where we lose coverage (across
all configurations and domains: 13 cases). In all cases, the old code was very
close to the memory limit when it solved the problem. The limit is 2 GiB, or
2097152 KiB. Here is the peak memory of the old code for the 13 cases we lose,
in sorted order:


So we're always close to the memory limit: always within 10%, and in most cases
within much less than 5%.

My suggestion: live with the loss in coverage, and take this as an indication
that we could improve M&S by being a bit more careful about memory. If and when
we find time to work on the code some more, memory could be one of the main
focus areas.
msg4303 (view) Author: jendrik Date: 2015-07-01.17:24:53
I don't remember any numbers, but I think none of the tested configs was affected 
really badly.
msg4302 (view) Author: silvan Date: 2015-07-01.15:35:24
For the record, I obtained the following results when comparing a revision
before merging issue436 (074175a9b206) and after merging it (26be536ac799):

Given the small changes in code related to merge-and-shrink with these two
revisions, I really wonder if the performance difference of unordered_set vs
hash_set (used in state_registry) or unordered_map vs hash_map (used in
shrink_bisimulation9 is really that large. I assume that for other
configurations, you did not get such large differences, did you?
msg4292 (view) Author: pvonreth Date: 2015-06-30.14:48:24
I'm sorry but the server went offline without anouncement and will p
probably not come back. The results are not the only thing I lost there.
msg4291 (view) Author: silvan Date: 2015-06-30.09:47:37
Patrick, do you still have the tables of the experiments linked below? I'd be
interested to have a look at them. This issue decreased the performance of
standard merge-and-shrink configurations, and I wonder why this has not been
noticed during the experiments. I assume that either the configs I tested have
not been tested in this issue, or that due to the many "merge from defaults"
included in this issue, the experiments compared against a too old version of
the default branch. Anyway, I'd just be interested to have a look at the
experiments of this issue.
msg4127 (view) Author: jendrik Date: 2015-03-31.12:33:42
Merged and pushed. Thanks for your work on this, Patrick!
msg4126 (view) Author: malte Date: 2015-03-31.08:25:38
I've attached some test code, which on my machine shows that a test program that
uses a hash_map<int, int> can grow this map further than an equivalent program
using an unordered_map<int, int>. If you want to try this out, run it with a
ulimit like so:

g++ -Wno-deprecated -O3 -std=c++11 -o testme
# for unordered map:
( ulimit -v 1048576 && ./testme unordered_map )
# for hash_map
( ulimit -v 1048576 && ./testme hash_map )

That unordered_map is less memory-efficient than hash_map is consistent with
informal experiments we ran previously. (This is the main reason why we hadn't
changed this before.) 

I don't think the effect is large enough to keep us from merging the code,
though. Maybe at some point we can look into special-purpose hash
implementations for the few large hashes we use (mainly the one used for state
registries, possibly others inside merge and shrink?).
msg4125 (view) Author: malte Date: 2015-03-31.08:01:07
These are the regular scatter plots, not the special ones I mentioned in msg4108
that more clearly show slight differences. (Florian or Jendrik, we discussed
this point offline, I think?) But even so, the picture seems clear enough for
the optimal configurations, and I don't think we need the other kind of scatter
plots -- just a note for future experiments where the smaller differences might
be more important.

It seems we get slight improvements here and there (e.g. w.r.t. time in the
landmark configurations), but an overall loss of coverage that is perhaps a bit
too consistent to be random. Looking at the scatter plots for memory usage shows
that the cases we lose are all ones where we were extremely close to the edge of
exhausting the memory limit before. My hypothesis is that the hash_set/hash_map
classes are either a little bit more memory-efficient and/or implemented to deal
more gracefully with out-of-memory conditions compared to
unordered_set/unordered_map. (For example, vectors tend to double in capacity
when they need to be grown. If there isn't enough memory for that, an
implementation might either give up completely or grow by smaller factor
instead. There may be something similar inside the hash tables.)
msg4124 (view) Author: pvonreth Date: 2015-03-30.21:44:41
here are the results for the second run of the experiments including scatter 
msg4111 (view) Author: malte Date: 2015-03-24.18:54:46
No further comments on the code at this point.
msg4110 (view) Author: pvonreth Date: 2015-03-24.13:32:20
I've merged the default branch into my branch and fixed the conflicts.
msg4109 (view) Author: malte Date: 2015-03-24.13:05:09
I followed the link for the pull request, but it doesn't look reviewable -- it
shows merge conflicts in two files.
msg4108 (view) Author: malte Date: 2015-03-24.13:03:55
There seems to be a slight negative trend overall. Do we know why? A while ago
we devised some plots that can be used for comparison plots of attributes where
differences tend to be slight (say, typically less than a factor of 2). Can we
generate some such plots for total time and memory?
msg4107 (view) Author: jendrik Date: 2015-03-24.11:21:27
Patrick has prepared a pull request at and I have 
reviewed the code. He has also made an experiment for the relevant satisficing and 
optimal configurations. The results are here:
msg3729 (view) Author: jendrik Date: 2014-10-08.21:30:21
After this is done, we should revisit issue213.
msg3458 (view) Author: malte Date: 2014-09-19.16:08:03
Any volunteers to work on this?
msg3181 (view) Author: malte Date: 2014-05-21.19:07:56
We should consider switching from GNU's hash tables (hash_map, hash_set) to the
C++ standard hash tables (unordered_map, unordered_set). We stuck with the GNU
versions in the past because they used less memory when we last compared, but
now we have better ways of measuring time and memory impact properly and could
make a better informed decision than back in the day.

Also, we should clean up the hash functions used. Currently, more or less the
same hash function is implemented in several places. The basic hashing
facilities should live in only one place.

Also, some old code uses a different hash function (multiplicative congruential
with factor 17) for no good reason. (See relaxation_heuristic and
domain_transition_graph -- there may be other examples. Best to grep for 'hash'.)
Date User Action Args
2015-07-02 01:02:20maltesetmessages: + msg4304
2015-07-01 17:24:53jendriksetmessages: + msg4303
2015-07-01 15:35:24silvansetmessages: + msg4302
2015-06-30 14:48:24pvonrethsetmessages: + msg4292
2015-06-30 09:47:37silvansetnosy: + silvan
messages: + msg4291
2015-03-31 12:33:42jendriksetstatus: chatting -> resolved
messages: + msg4127
2015-03-31 08:25:38maltesetfiles: +
messages: + msg4126
2015-03-31 08:01:08maltesetmessages: + msg4125
2015-03-30 21:44:41pvonrethsetmessages: + msg4124
2015-03-24 18:54:46maltesetmessages: + msg4111
2015-03-24 13:32:20pvonrethsetmessages: + msg4110
2015-03-24 13:05:09maltesetmessages: + msg4109
2015-03-24 13:03:55maltesetmessages: + msg4108
2015-03-24 11:21:27jendriksetmessages: + msg4107
2015-03-18 13:59:34pvonrethsetassignedto: pvonreth
nosy: + pvonreth
2014-10-08 21:30:21jendriksetmessages: + msg3729
2014-09-19 16:08:03maltesetmessages: + msg3458
2014-09-19 15:42:25floriansetnosy: + florian
2014-08-20 13:34:37jendriksetnosy: + jendrik
2014-05-21 19:07:56maltecreate