Message5866

Author malte
Recipients andrew.coles, erez, florian, jendrik, malte, silvia
Date 2016-12-12.23:21:13
Content
Thanks!

I don't think tweaking the parameter is the best solution at the moment. Unless
there is a performance issue in the implementation that we still have to
identify, perhaps this hash table implementation just is a bit slower than the
standard library one. It stands to reason that the standard library
implementation is optimized for something. Perhaps it is optimized for speed. ;-)

There are a few alternatives we can try. We wanted to avoid separate chaining
because of its space requirements, but we can also try bringing the space
requirements of separate chaining down for our particular use case by
compressing the information. Currently, the values -2 to -2^31 are unused, and
we could exploit this to use a single 4-byte cell to store either a state ID or
(the equivalent of) a pointer to a chain of collided values.

Assuming a load factor of 1.0, which is not too unreasonable when using separate
chaining, a "basic" but space-efficient implementation of separate chaining
takes 12 bytes per entry if there is no dynamic allocation overhead (which I
think is reasonable for a hash table that does not support deletion): 4 bytes
per bucket plus 8 bytes per linked list entry that holds the data.

I think that with some compression tricks, we can bring this down to less than 7
bytes per entry, i.e., the same amount of memory as with closed hashing at a
load factor of 0.57. (Closed hashing of course needs load factors to be lower
than what we can use with open hashing.)

This assumes that we don't need to store the hash keys -- if we do need to store
them, the comparison becomes more favourable for closed hashing because the
relative overhead of the "pointers" becomes smaller. I'm attaching a Python
script that allows estimating the size requirements of different hash table
implementations.
Files
File name Uploaded
hash_sizes.py malte, 2016-12-12.23:21:13
History
Date User Action Args
2016-12-12 23:21:13maltesetmessageid: <1481581273.93.0.797199802162.issue213@unibas.ch>
2016-12-12 23:21:13maltesetrecipients: + malte, erez, andrew.coles, silvia, jendrik, florian
2016-12-12 23:21:13maltelinkissue213 messages
2016-12-12 23:21:13maltecreate