I investigated the memory increase. First off, I could not reproduce it on my laptop, but I could reproduce it on the grid. I observed that big chunks of memory are allocated when the IntHashSet for StateIDs is enlarged. Running the v1 version of lm_hm on mystery/prob30, there were two large jumps: from 291'268KB to 315'844KB, and then from 315'844KB to 381'384KB.
In the code, there are two places where enlarge() is called:
(i) the number of entries exceeds the capacity
(ii) When the bucket of the new key is too far away from the ideal bucket of the key and it could not be swapped (? Sorry, I don't understand the details here.)
I assume that case (i) should behave the same in both base and v1, but (ii) sounds like this could happen non-deterministically and we were just unlucky in v1.
In summary, the memory increase does not have anything to do with this issue. I don't really know enough about the IntHasher to judge whether or not we are ok with this behavior, but I assume that we can't really do any better.
|