Issue 386: Avoid overhead when creating temporary states

Title	Avoid overhead when creating temporary states
Priority	feature	Status	resolved
Superseder		Nosy List	erez, florian, gabi, jendrik, malte, martin, mkatz, silvan
Assigned To	florian	Keywords
Optional summary

Created on 2013-08-01.02:50:34 by florian, last changed by florian.

Files
File name	Uploaded	Type	Edit	Remove
memory_profiles.png	florian, 2014-02-07.11:44:30	image/png

Messages
msg2939 (view)	Author: florian	Date: 2014-02-07.22:06:58
merged
msg2938 (view)	Author: malte	Date: 2014-02-07.21:27:34
Thanks a lot! I think this is ready to merge, so feel free to merge and close this one. (As always, anyone feeling otherwise feel free to reopen.)
msg2937 (view)	Author: florian	Date: 2014-02-07.20:46:47
> Florian, can you summarize how the current implementation for this issue > affects coverage, runtime and memory usage? This patch introduces a layer of indirection and a subscriber mechanism in the PerStateInformation class. Instead of working only for the global state registry, a PerStateInformation object now looks up the relevant storage in a hashmap indexed with StateRegistry*. We expected this to be a minor change that uses a few additional bytes per PerStateInformation object (only 1-2 at the moment) and has a next-to-nothing time overhead (the result of the hashmap lookup is cached). The timing results are as expected (some noise but nothing strange there). The memory results are also not surprising in most cases. We typically use a little more memory than before (~20kB) and use a little less with iPDB (because temporary states that are not used in the search no longer cause an overhead). Differences in memory usage between the old and the new version are usually below 100kB. There is one exception to this: with Merge-and-Shrink the differences are much more pronounced, in particular on the domain Sokoban. There the memory usage differences are in the order of 10-100 MB. In one case the new version uses 1.8GB while the old uses 1.6GB. The attached memory profile is for this task and only measures a peak memory of 1.3GB, which would mean a 500MB loss (to fragmentation?). The additional memory requirements of the new version on this task is reproducible in release and debug mode.
msg2935 (view)	Author: florian	Date: 2014-02-07.20:04:12
> We could try to change the memory access pattern in merge-and-shrink I added this as issue415
msg2932 (view)	Author: malte	Date: 2014-02-07.19:17:16
Florian, can you summarize how the current implementation for this issue affects coverage, runtime and memory usage, i.e. give us a summary interpretation of the experiment results? I don't think it makes sense for everyone to sift through a 50 MB report file.
msg2931 (view)	Author: malte	Date: 2014-02-07.19:03:52
We could try to change the memory access pattern in merge-and-shrink to try to avoid fragmentation. I don't think this would have to be terribly complicated, but it would certainly require some thought about the data structures used for the abstractions. I think there should also be some speed-up potential there.
msg2928 (view)	Author: florian	Date: 2014-02-07.11:44:30
I'm attaching a memory profile of the base version (right) and the patch (left) on sokoban08/p22 with merge&shrink. It looks like the difference doesn't show up in the profile (both peaks are at 1.3GB). The graphs are not exactly the same but this is normal (the profiler doesn't measure at the same time in both versions and not often enough to catch all the details). The peak is in the initialization of merge &shrink. The profile does not contain the search part. I did a previous profile that showed that memory consumed in the search does not come close to the peak in the initialization in both cases. Running the same call without the profiler results in peaks of 1.6GB (base) and 1.8GB (patch) - significantly more than with the profiler. It is not clear to me why this is so much higher with the patch. Does anyone have any ideas? If this is memory fragmentation, we loose up to 500MB to memory fragmentation (compared to the peak in the profiler). Is there anything we could/should do about this?
msg2827 (view)	Author: florian	Date: 2013-12-18.14:19:37
I can look into it again after the IPC. If it turns out to be memory fragmentation, I think we should merge and create a new issue to fix merge and shrink.
msg2826 (view)	Author: malte	Date: 2013-12-18.13:57:26
I looked in more detail at the merge-and-shrink data in Sokoban, and I don't really see a strong negative. The changes could just as well be arbitrary. If you want to investigate further, I'd focus some experiments on the combination of merge-and-shrink + Sokoban. We could, for example, try our other variations of merge-and-shrink to get more data. We could also add intermediate peak memory reports to the output and look at that data. Ultimately, my best guess at the moment is that we're seeing the effect of memory fragmentation during merge-and-shrink construction. If this hypothesis is correct, one could try to address the problem directly, e.g. by using custom allocators within merge-and-shrink.
msg2824 (view)	Author: florian	Date: 2013-12-18.13:27:43
I don't really know. I guess we will merge this into our IPC submission, since it does not seem to affect LMcut too much and we need multiple StateRegistries. But I don't think we can merge it into the main repository like this. Pity that the problem is not reproducable in a memory profiler. Do you have any other ideas how to find out what is going wrong here?
msg2823 (view)	Author: malte	Date: 2013-12-18.13:18:55
What do you suggest?
msg2820 (view)	Author: florian	Date: 2013-12-18.10:11:35
Rerunning the experiment with a cleared revision cache (and the small changes we discussed) did not change the result: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue386-exp3-compare-v3.html (50 MB) Memory usage still increases in most tasks. The increase is small in most cases but can be huge, e.g., 230 MB in sokoban08/p22 with merge and shrink. Ignoring merge and shrink and the portfolios, the difference in memory is never over 1 MB while it is more that that for a lot of runs with merge and shrink. So it seems whatever this is affects the merge and shrink heuristic in particular.
msg2807 (view)	Author: malte	Date: 2013-12-16.16:20:51
If there is no other good reason for the changed behaviour apparent, it may be worth deleting the data from the revision cache and rerunning the experiment. Changes in memory usage can often be explained by compiler/library upgrades if there is no other good reason.
msg2806 (view)	Author: florian	Date: 2013-12-16.16:01:24
I didn't see your message before answering. The experiment ran both planners at the same time but the base revision was compiled for the first experiment and taken from labs revision cache. The increased memory use seems to be particularly big with merge and shrink. I'll have a look at the logs.
msg2804 (view)	Author: florian	Date: 2013-12-16.14:04:52
I implemented the comments from your last review. Since this was mostly refactoring and comments, I would suggest to merge this without a second experiment.
msg2803 (view)	Author: malte	Date: 2013-12-16.14:03:52
Were both version of the planner compiled and run at the same time, or did you combine data from two experiments?
msg2802 (view)	Author: florian	Date: 2013-12-16.14:01:39
I don't see an obvious reason for the additional memory usage. There is the subscriber mechanism and the additional field to store the registry in each State, but both of them should only use a few bytes, i.e., there are only one or two State objects at the same time and only one subscriber in most cases.
msg2799 (view)	Author: malte	Date: 2013-12-14.22:27:01
Looks mostly OK, but we seem to be using a bit more memory across the board. Any explanation for this? Where do we change/add data structures that could explain this? (Looking at the total_time values, the reduction in coverage in LM-Cut doesn't strike me as a big problem. This can easily be random noise.)
msg2798 (view)	Author: florian	Date: 2013-12-14.22:13:18
Experiments for the recent commit finished: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue386-exp2-compare.html (50 MB) I'll fix the things you commented on in the pull request on Monday.
msg2797 (view)	Author: malte	Date: 2013-12-13.18:12:32
I reviewed the most recent commit.
msg2795 (view)	Author: malte	Date: 2013-12-12.22:07:23
Yes, this looks basically good. However, I'd register "this" rather than "cached_entries"; the SegmentedVector class can remain an implementation detail of PerStateInformation, and this will also allow removing the StateRegistry key from the PerStateInformation's EntryMap upon destruction of the StateRegistry. When we do make this change, I think we should also make sure to forbid copy construction and assignment for PerStateInformation and StateRegistry. I think copying would be broken if we don't support it properly in the subscriber mechanism, and it seems pointless to support it in the first place.
msg2794 (view)	Author: florian	Date: 2013-12-12.22:01:16
I prefer to do it in this issue because it also concerns "overhead when creating temporary states" and we probably have to fix it before the IPC. I'll try to do it now, so the experiments run with all your comments fixed and we won't need to wait for a second batch of experiments next week. Just need to get a ubuntu running on my pc at home, so i can build. Is this what you had in mind? void set_cached_registry(const StateRegistry *registry) { if (cached_registry != registry) { cached_registry = registry; pair<EntryMap::iterator, bool> inserted = entries_by_registry.insert(make_pair(registry, PerStateInformation<Entry>())); cached_entries = inserted.first; if (inserted.second) { registry.subscribe_per_state_information(cached_entries); } } }
msg2793 (view)	Author: malte	Date: 2013-12-12.21:38:27
PS: This probably means that we need a (mostly empty) abstract base class for PerStateInformation<T>. It needs an empty virtual destructor and nothing much else: class PerStateInformationBase { public: PerStateInformationBase(); virtual ~PerStateInformationBase() = 0; };
msg2792 (view)	Author: malte	Date: 2013-12-12.21:32:30
Sounds good, I had the same implementation in mind. Do you want to implement this as part of this issue386 or as a separate issue? To make it sufficiently efficient, the test whether a given StateRegistry and PerStateInformation "know each other" already could be integrated into the existing caching mechanism, I guess.
msg2791 (view)	Author: florian	Date: 2013-12-12.21:24:39
> One question on temporary registries: it seems to me that PerStateInformation > associated with states from temporary state registries is never released. Is > this correct? > > Say we create a StateRegistry for selective max and then do sampling with the > lmcount heuristic. This should create a PerStateInformation for the LM info for > the sampled states. Will we release this information once we discard the state > registry? No, the information would be kept in the current implementation. We should probably use a subscriber mechanism that registers each PerStateInformation which uses States from a registry, to that registry and send them a message if the registry is discarded. I'll try to add something like this.
msg2790 (view)	Author: malte	Date: 2013-12-12.19:47:20
Review is done.
msg2789 (view)	Author: malte	Date: 2013-12-12.19:43:46
I'm still reviewing, but feel free to already start experiments if you are in a hurry. Experiments for the optimal configs should be enough, no need for the satisficing ones. One question on temporary registries: it seems to me that PerStateInformation associated with states from temporary state registries is never released. Is this correct? Say we create a StateRegistry for selective max and then do sampling with the lmcount heuristic. This should create a PerStateInformation for the LM info for the sampled states. Will we release this information once we discard the state registry?
msg2779 (view)	Author: florian	Date: 2013-12-11.11:19:56
I just granted you access. If anyone else is interested, tell me.
msg2778 (view)	Author: malte	Date: 2013-12-10.23:50:56
That's 760 KiB, not bytes. :-) (And it's a Mystery task, not Miconic.) But be that as it may, the results indeed look good. The bitbucket link gives me an access denied message. (I'm logged in.)
msg2777 (view)	Author: florian	Date: 2013-12-10.23:44:46
I started working on this and ran an initial experiment on iPDB. The results look good. Memory usage mostly decreased (1 GB across all tasks), the most increase is 760 bytes in one miconic task. http://ai.cs.unibas.ch/_tmp_files/pommeren/issue386-exp1-compare.html Malte, do you have time to review this? https://bitbucket.org/flogo/downward-issues-issue386/pull-request/1/avoid-overhead-when-creating-temporary/diff I can also run experiments on more configs.
msg2594 (view)	Author: malte	Date: 2013-08-01.10:36:05
Like Florian, I'm worried about changes that set attributes in the heuristic objects that then need to be maintained properly. The more classes we have that are (at least conceptually) immutable, the easier the code is to maintain and debug. So unless performance forbids this, I would prefer a solution where the heuristic is told the correct state registry whenever it needs the information, and I think the best way to achieve this is to package this information with the state. My suggestion is to repurpose the StateHandle class so that it contains an ID and a reference to the registry, and to have everything that needs access to per-state data work on the level of state handles. Done properly, the heuristic code wouldn't even need to know that something like state registries exists. However, this doesn't solve the more fundamental problem that PerStateInfo objects are tied to a specific registry. There are a few possible ways to change this. One self-contained solution would fix this inside PerStateInfo: rather than storing just one segmented vector, have a hash map from StateRegistry pointers to segmented vectors. To keep speed reasonable, we could cache the last requested state registry and its vector, since we assume that we don't switch around between different state registries all the time.
msg2593 (view)	Author: florian	Date: 2013-08-01.09:57:02
> I think the answer to 1 must be no. > Not only do we have some use cases where we want this (state space samples), but > suppose we want a heuristic that does lookahead (as Felner et. al. did) - we do > not want to register the lookahead states. Even for state space samples and lookahead states it can make sense to register them (not necessarily in the global registry). For example if they are evaluated by a heuristic that saves information to each state. The heuristic only gets a state and should not care about whether this state is a sample or a state encountered during search. The question is who should be responsible for registering them. One way to do it would be to require the class that generates the state to register it before passing it to a heuristic. We might also have a new issue here: If heuristics save information per state and are called for a lot of temporary states, should it be possible to tell the heuristic when it is save to remove the data? > My suggestion is to have a *StateRegistry pointer in Heuristic, which tells the > heuristic where it's currently registering states. Having the heuristics that want to store information also register the state might be a solution -- I think we talked about this before in the code review comments. But this can create problems when heuristics are nested. Changing the StateRegistry to something new can be done recursively in all dependent heuristics but this overwrites the previous value. After we are done with the temporary state we might want to set everything back to the old value, i.e. undo the change instead of changing to the global registry.
msg2592 (view)	Author: erez	Date: 2013-08-01.08:11:50
I think the answer to 1 must be no. Not only do we have some use cases where we want this (state space samples), but suppose we want a heuristic that does lookahead (as Felner et. al. did) - we do not want to register the lookahead states. My suggestion is to have a *StateRegistry pointer in Heuristic, which tells the heuristic where it's currently registering states. This will be a temporary object during state space sample/lookahead/whatever, and the global registry elsewhere.
msg2589 (view)	Author: florian	Date: 2013-08-01.03:21:54
Depending on how we solve this, we could also consider splitting the State class into RegisteredState and UnregisteredState. This way we can save some assertions and no longer need invalid StateHandles.
msg2586 (view)	Author: florian	Date: 2013-08-01.02:50:34
We split issue344 into more manageable chunks. The main issue branch will soon be merged and this is a possible follow-up work. I nosied everyone from the original issue, but feel free to erase yourself from nosy, if you are not interested. When temporary states (probes, samples) are created that need to be evaluated, this creates a large memory overhead in the way the StateRegistry is currently set up. There are two types of heuristics involved here: the heuristic that uses temporary states (iPDB, selective max) and potential subheuristics of them that are evaluated for the temporary state. The main problem is with heuristics evaluated for temporary states that want to store per state information. For example, this could be incremental lm-cut inside selective max. In the following I think of those when I say 'heuristics'. These are the main questions: 1) Should states given to a heuristic always be registered? If the answer is yes: 2a) Where should temporary states be registered? If they are registered in the global registry, we would need some way to unregister/clear states after they are no longer needed. This could invalidate existing handles. If we use a temporary registry the problem is that heuristics do not know where the state they get is registered. The PerStateInformation class currently uses the StateRegistry to determine if an index is valid and to resize to the size of the registry automatically. If the answer to 1) is no: 2b) If the heuristic needs to store per state information, the state has to be registered at some point. This would mean the heuristic has to register the state. How should this be done? If we use the global registry, we have the same problem as before. If we use a custom registry inside the heuristic, we might end up storing regular (non-temporary) states twice and we would need a way to know in which registry a state was registered.

History
Date	User	Action	Args
2014-02-07 22:06:58	florian	set	status: in-progress -> resolved messages: + msg2939
2014-02-07 21:27:34	malte	set	messages: + msg2938
2014-02-07 20:46:47	florian	set	messages: + msg2937
2014-02-07 20:04:12	florian	set	messages: + msg2935
2014-02-07 19:17:16	malte	set	messages: + msg2932
2014-02-07 19:03:52	malte	set	messages: + msg2931
2014-02-07 11:44:30	florian	set	files: + memory_profiles.png messages: + msg2928
2013-12-19 18:36:54	florian	set	nosy: + martin
2013-12-18 14:19:37	florian	set	messages: + msg2827
2013-12-18 13:57:26	malte	set	messages: + msg2826
2013-12-18 13:27:43	florian	set	messages: + msg2824
2013-12-18 13:18:55	malte	set	messages: + msg2823
2013-12-18 10:11:35	florian	set	messages: + msg2820
2013-12-16 16:20:51	malte	set	messages: + msg2807
2013-12-16 16:01:24	florian	set	messages: + msg2806
2013-12-16 14:04:52	florian	set	messages: + msg2804
2013-12-16 14:03:52	malte	set	messages: + msg2803
2013-12-16 14:01:39	florian	set	messages: + msg2802
2013-12-14 22:27:01	malte	set	messages: + msg2799
2013-12-14 22:13:18	florian	set	messages: + msg2798
2013-12-13 18:12:32	malte	set	messages: + msg2797
2013-12-12 22:07:23	malte	set	messages: + msg2795
2013-12-12 22:01:16	florian	set	messages: + msg2794
2013-12-12 21:38:27	malte	set	messages: + msg2793
2013-12-12 21:32:30	malte	set	messages: + msg2792
2013-12-12 21:24:39	florian	set	messages: + msg2791
2013-12-12 19:47:20	malte	set	messages: + msg2790
2013-12-12 19:43:46	malte	set	messages: + msg2789
2013-12-11 11:19:56	florian	set	messages: + msg2779
2013-12-10 23:50:56	malte	set	messages: + msg2778
2013-12-10 23:44:46	florian	set	status: chatting -> in-progress assignedto: florian messages: + msg2777
2013-08-01 10:36:05	malte	set	messages: + msg2594
2013-08-01 09:57:02	florian	set	messages: + msg2593
2013-08-01 08:11:50	erez	set	messages: + msg2592
2013-08-01 03:21:54	florian	set	messages: + msg2589
2013-08-01 02:50:34	florian	create

Issue386