Issue386

Title Avoid overhead when creating temporary states
Priority feature Status resolved
Superseder Nosy List erez, florian, gabi, jendrik, malte, martin, mkatz, silvan
Assigned To florian Keywords
Optional summary

Created on 2013-08-01.02:50:34 by florian, last changed by florian.

Files
File name Uploaded Type Edit Remove
memory_profiles.png florian, 2014-02-07.11:44:30 image/png
Messages
msg2939 (view) Author: florian Date: 2014-02-07.22:06:58
merged
msg2938 (view) Author: malte Date: 2014-02-07.21:27:34
Thanks a lot! I think this is ready to merge, so feel free to merge and close
this one. (As always, anyone feeling otherwise feel free to reopen.)
msg2937 (view) Author: florian Date: 2014-02-07.20:46:47
> Florian, can you summarize how the current implementation for this issue
> affects coverage, runtime and memory usage?

This patch introduces a layer of indirection and a subscriber mechanism in the
PerStateInformation class. Instead of working only for the global state
registry, a PerStateInformation object now looks up the relevant storage in a
hashmap indexed with StateRegistry*. We expected this to be a minor change that
uses a few additional bytes per PerStateInformation object (only 1-2 at the
moment) and has a next-to-nothing time overhead (the result of the hashmap
lookup is cached).

The timing results are as expected (some noise but nothing strange there).
The memory results are also not surprising in most cases. We typically use a
little more memory than before (~20kB) and use a little less with iPDB (because
temporary states that are not used in the search no longer cause an overhead).
Differences in memory usage between the old and the new version are usually
below 100kB.
There is one exception to this: with Merge-and-Shrink the differences are much
more pronounced, in particular on the domain Sokoban. There the memory usage
differences are in the order of 10-100 MB. In one case the new version uses
1.8GB while the old uses 1.6GB. The attached memory profile is for this task and
only measures a peak memory of 1.3GB, which would mean a 500MB loss (to
fragmentation?). The additional memory requirements of the new version on this
task is reproducible in release and debug mode.
msg2935 (view) Author: florian Date: 2014-02-07.20:04:12
> We could try to change the memory access pattern in merge-and-shrink
I added this as issue415
msg2932 (view) Author: malte Date: 2014-02-07.19:17:16
Florian, can you summarize how the current implementation for this issue affects
coverage, runtime and memory usage, i.e. give us a summary interpretation of the
experiment results? I don't think it makes sense for everyone to sift through a
50 MB report file.
msg2931 (view) Author: malte Date: 2014-02-07.19:03:52
We could try to change the memory access pattern in merge-and-shrink to try to
avoid fragmentation. I don't think this would have to be terribly complicated,
but it would certainly require some thought about the data structures used for
the abstractions. I think there should also be some speed-up potential there.
msg2928 (view) Author: florian Date: 2014-02-07.11:44:30
I'm attaching a memory profile of the base version (right) and the patch (left)
on sokoban08/p22 with merge&shrink. It looks like the difference doesn't show up
in the profile (both peaks are at 1.3GB). The graphs are not exactly the same
but this is normal (the profiler doesn't measure at the same time in both
versions and not often enough to catch all the details).

The peak is in the initialization of merge &shrink. The profile does not contain
the search part. I did a previous profile that showed that memory consumed in
the search does not come close to the peak in the initialization in both cases.

Running the same call without the profiler results in peaks of 1.6GB (base) and
1.8GB (patch) - significantly more than with the profiler. It is not clear to me
why this is so much higher with the patch. Does anyone have any ideas?

If this is memory fragmentation, we loose up to 500MB to memory fragmentation
(compared to the peak in the profiler). Is there anything we could/should do
about this?
msg2827 (view) Author: florian Date: 2013-12-18.14:19:37
I can look into it again after the IPC. If it turns out to be memory
fragmentation, I think we should merge and create a new issue to fix merge and
shrink.
msg2826 (view) Author: malte Date: 2013-12-18.13:57:26
I looked in more detail at the merge-and-shrink data in Sokoban, and I don't
really see a strong negative. The changes could just as well be arbitrary. If
you want to investigate further, I'd focus some experiments on the combination
of merge-and-shrink + Sokoban. We could, for example, try our other variations
of merge-and-shrink to get more data. We could also add intermediate peak memory
reports to the output and look at that data.

Ultimately, my best guess at the moment is that we're seeing the effect of
memory fragmentation during merge-and-shrink construction.  If this hypothesis
is correct, one could try to address the problem directly, e.g. by using custom
allocators within merge-and-shrink.
msg2824 (view) Author: florian Date: 2013-12-18.13:27:43
I don't really know. I guess we will merge this into our IPC submission, since
it does not seem to affect LMcut too much and we need multiple StateRegistries.
But I don't think we can merge it into the main repository like this. Pity that
the problem is not reproducable in a memory profiler. Do you have any other
ideas how to find out what is going wrong here?
msg2823 (view) Author: malte Date: 2013-12-18.13:18:55
What do you suggest?
msg2820 (view) Author: florian Date: 2013-12-18.10:11:35
Rerunning the experiment with a cleared revision cache (and the small changes we
discussed) did not change the result:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue386-exp3-compare-v3.html (50 MB)

Memory usage still increases in most tasks. The increase is small in most cases
but can be huge, e.g., 230 MB in sokoban08/p22 with merge and shrink. Ignoring
merge and shrink and the portfolios, the difference in memory is never over 1 MB
while it is more that that for a lot of runs with merge and shrink.
So it seems whatever this is affects the merge and shrink heuristic in particular.
msg2807 (view) Author: malte Date: 2013-12-16.16:20:51
If there is no other good reason for the changed behaviour apparent, it may be
worth deleting the data from the revision cache and rerunning the experiment.
Changes in memory usage can often be explained by compiler/library upgrades if
there is no other good reason.
msg2806 (view) Author: florian Date: 2013-12-16.16:01:24
I didn't see your message before answering. The experiment ran both planners at
the same time but the base revision was compiled for the first experiment and
taken from labs revision cache.

The increased memory use seems to be particularly big with merge and shrink.
I'll have a look at the logs.
msg2804 (view) Author: florian Date: 2013-12-16.14:04:52
I implemented the comments from your last review. Since this was mostly
refactoring and comments, I would suggest to merge this without a second experiment.
msg2803 (view) Author: malte Date: 2013-12-16.14:03:52
Were both version of the planner compiled and run at the same time, or did you
combine data from two experiments?
msg2802 (view) Author: florian Date: 2013-12-16.14:01:39
I don't see an obvious reason for the additional memory usage. There is the
subscriber mechanism and the additional field to store the registry in each
State, but both of them should only use a few bytes, i.e., there are only one or
two State objects at the same time and only one subscriber in most cases.
msg2799 (view) Author: malte Date: 2013-12-14.22:27:01
Looks mostly OK, but we seem to be using a bit more memory across the board.
Any explanation for this? Where do we change/add data structures that could
explain this?

(Looking at the total_time values, the reduction in coverage in LM-Cut doesn't
strike me as a big problem. This can easily be random noise.)
msg2798 (view) Author: florian Date: 2013-12-14.22:13:18
Experiments for the recent commit finished:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue386-exp2-compare.html (50 MB)

I'll fix the things you commented on in the pull request on Monday.
msg2797 (view) Author: malte Date: 2013-12-13.18:12:32
I reviewed the most recent commit.
msg2795 (view) Author: malte Date: 2013-12-12.22:07:23
Yes, this looks basically good.

However, I'd register "this" rather than "cached_entries"; the SegmentedVector
class can remain an implementation detail of PerStateInformation, and this will
also allow removing the StateRegistry key from the PerStateInformation's
EntryMap upon destruction of the StateRegistry.

When we do make this change, I think we should also make sure to forbid copy
construction and assignment for PerStateInformation and StateRegistry. I think
copying would be broken if we don't support it properly in the subscriber
mechanism, and it seems pointless to support it in the first place.
msg2794 (view) Author: florian Date: 2013-12-12.22:01:16
I prefer to do it in this issue because it also concerns "overhead when creating
temporary states" and we probably have to fix it before the IPC. I'll try to do
it now, so the experiments run with all your comments fixed and we won't need to
wait for a second batch of experiments next week. Just need to get a ubuntu
running on my pc at home, so i can build.


Is this what you had in mind?

   void set_cached_registry(const StateRegistry *registry) {
       if (cached_registry != registry) {
           cached_registry = registry;
           pair<EntryMap::iterator, bool> inserted =
               entries_by_registry.insert(make_pair(registry,
PerStateInformation<Entry>()));
           cached_entries = inserted.first;
           if (inserted.second) {
               registry.subscribe_per_state_information(cached_entries);
           } 
       }
   }
msg2793 (view) Author: malte Date: 2013-12-12.21:38:27
PS: This probably means that we need a (mostly empty) abstract base class for
PerStateInformation<T>. It needs an empty virtual destructor and nothing much else:

class PerStateInformationBase {
public:
    PerStateInformationBase();
    virtual ~PerStateInformationBase() = 0;
};
msg2792 (view) Author: malte Date: 2013-12-12.21:32:30
Sounds good, I had the same implementation in mind.
Do you want to implement this as part of this issue386 or as a separate issue?

To make it sufficiently efficient, the test whether a given StateRegistry and
PerStateInformation "know each other" already could be integrated into the
existing caching mechanism, I guess.
msg2791 (view) Author: florian Date: 2013-12-12.21:24:39
> One question on temporary registries: it seems to me that PerStateInformation
> associated with states from temporary state registries is never released. Is
> this correct?
> 
> Say we create a StateRegistry for selective max and then do sampling with the
> lmcount heuristic. This should create a PerStateInformation for the LM info for
> the sampled states. Will we release this information once we discard the state
> registry?

No, the information would be kept in the current implementation. We should
probably use a subscriber mechanism that registers each PerStateInformation
which uses States from a registry, to that registry and send them a message if
the registry is discarded. I'll try to add something like this.
msg2790 (view) Author: malte Date: 2013-12-12.19:47:20
Review is done.
msg2789 (view) Author: malte Date: 2013-12-12.19:43:46
I'm still reviewing, but feel free to already start experiments if you are in a
hurry. Experiments for the optimal configs should be enough, no need for the
satisficing ones.

One question on temporary registries: it seems to me that PerStateInformation
associated with states from temporary state registries is never released. Is
this correct?

Say we create a StateRegistry for selective max and then do sampling with the
lmcount heuristic. This should create a PerStateInformation for the LM info for
the sampled states. Will we release this information once we discard the state
registry?
msg2779 (view) Author: florian Date: 2013-12-11.11:19:56
I just granted you access. If anyone else is interested, tell me.
msg2778 (view) Author: malte Date: 2013-12-10.23:50:56
That's 760 KiB, not bytes. :-) (And it's a Mystery task, not Miconic.) But be
that as it may, the results indeed look good.

The bitbucket link gives me an access denied message. (I'm logged in.)
msg2777 (view) Author: florian Date: 2013-12-10.23:44:46
I started working on this and ran an initial experiment on iPDB. The results
look good. Memory usage mostly decreased (1 GB across all tasks), the most
increase is 760 bytes in one miconic task.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue386-exp1-compare.html

Malte, do you have time to review this? 

https://bitbucket.org/flogo/downward-issues-issue386/pull-request/1/avoid-overhead-when-creating-temporary/diff

I can also run experiments on more configs.
msg2594 (view) Author: malte Date: 2013-08-01.10:36:05
Like Florian, I'm worried about changes that set attributes in the heuristic
objects that then need to be maintained properly. The more classes we have that
are (at least conceptually) immutable, the easier the code is to maintain and debug.

So unless performance forbids this, I would prefer a solution where the
heuristic is told the correct state registry whenever it needs the information,
and I think the best way to achieve this is to package this information with the
state.

My suggestion is to repurpose the StateHandle class so that it contains an ID
and a reference to the registry, and to have everything that needs access to
per-state data work on the level of state handles. Done properly, the heuristic
code wouldn't even need to know that something like state registries exists.

However, this doesn't solve the more fundamental problem that PerStateInfo
objects are tied to a specific registry. There are a few possible ways to change
this. One self-contained solution would fix this inside PerStateInfo: rather
than storing just one segmented vector, have a hash map from StateRegistry
pointers to segmented vectors.

To keep speed reasonable, we could cache the last requested state registry and
its vector, since we assume that we don't switch around between different state
registries all the time.
msg2593 (view) Author: florian Date: 2013-08-01.09:57:02
> I think the answer to 1 must be no. 
> Not only do we have some use cases where we want this (state space samples), but 
> suppose we want a heuristic that does lookahead (as Felner et. al. did) - we do 
> not want to register the lookahead states.

Even for state space samples and lookahead states it can make sense to register
them (not necessarily in the global registry). For example if they are evaluated
by a heuristic that saves information to each state. The heuristic only gets a
state and should not care about whether this state is a sample or a state
encountered during search.

The question is who should be responsible for registering them. One way to do it
would be to require the class that generates the state to register it before
passing it to a heuristic.

We might also have a new issue here: If heuristics save information per state
and are called for a lot of temporary states, should it be possible to tell the
heuristic when it is save to remove the data?


> My suggestion is to have a *StateRegistry pointer in Heuristic, which tells the 
> heuristic where it's currently registering states.

Having the heuristics that want to store information also register the state
might be a solution -- I think we talked about this before in the code review
comments. But this can create problems when heuristics are nested. Changing the
StateRegistry to something new can be done recursively in all dependent
heuristics but this overwrites the previous value. After we are done with the
temporary state we might want to set everything back to the old value, i.e. undo
the change instead of changing to the global registry.
msg2592 (view) Author: erez Date: 2013-08-01.08:11:50
I think the answer to 1 must be no. 
Not only do we have some use cases where we want this (state space samples), but 
suppose we want a heuristic that does lookahead (as Felner et. al. did) - we do 
not want to register the lookahead states.

My suggestion is to have a *StateRegistry pointer in Heuristic, which tells the 
heuristic where it's currently registering states. This will be a temporary 
object during state space sample/lookahead/whatever, and the global registry 
elsewhere.
msg2589 (view) Author: florian Date: 2013-08-01.03:21:54
Depending on how we solve this, we could also consider splitting the State class
into RegisteredState and UnregisteredState. This way we can save some assertions
and no longer need invalid StateHandles.
msg2586 (view) Author: florian Date: 2013-08-01.02:50:34
We split issue344 into more manageable chunks. The main issue branch will soon
be merged and this is a possible follow-up work. I nosied everyone from the
original issue, but feel free to erase yourself from nosy, if you are not
interested.

When temporary states (probes, samples) are created that need to be evaluated,
this creates a large memory overhead in the way the StateRegistry is currently
set up. There are two types of heuristics involved here: the heuristic that uses
temporary states (iPDB, selective max) and potential subheuristics of them that
are evaluated for the temporary state. The main problem is with heuristics
evaluated for temporary states that want to store per state information. For
example, this could be incremental lm-cut inside selective max. In the following
I think of those when I say 'heuristics'.

These are the main questions:

1) Should states given to a heuristic always be registered?

If the answer is yes:
2a) Where should temporary states be registered? If they are registered in the
global registry, we would need some way to unregister/clear states after they
are no longer needed. This could invalidate existing handles. If we use a
temporary registry the problem is that heuristics do not know where the state
they get is registered. The PerStateInformation class currently uses the
StateRegistry to determine if an index is valid and to resize to the size of the
registry automatically.

If the answer to 1) is no:
2b) If the heuristic needs to store per state information, the state has to be
registered at some point. This would mean the heuristic has to register the
state. How should this be done? If we use the global registry, we have the same
problem as before. If we use a custom registry inside the heuristic, we might
end up storing regular (non-temporary) states twice and we would need a way to
know in which registry a state was registered.
History
Date User Action Args
2014-02-07 22:06:58floriansetstatus: in-progress -> resolved
messages: + msg2939
2014-02-07 21:27:34maltesetmessages: + msg2938
2014-02-07 20:46:47floriansetmessages: + msg2937
2014-02-07 20:04:12floriansetmessages: + msg2935
2014-02-07 19:17:16maltesetmessages: + msg2932
2014-02-07 19:03:52maltesetmessages: + msg2931
2014-02-07 11:44:30floriansetfiles: + memory_profiles.png
messages: + msg2928
2013-12-19 18:36:54floriansetnosy: + martin
2013-12-18 14:19:37floriansetmessages: + msg2827
2013-12-18 13:57:26maltesetmessages: + msg2826
2013-12-18 13:27:43floriansetmessages: + msg2824
2013-12-18 13:18:55maltesetmessages: + msg2823
2013-12-18 10:11:35floriansetmessages: + msg2820
2013-12-16 16:20:51maltesetmessages: + msg2807
2013-12-16 16:01:24floriansetmessages: + msg2806
2013-12-16 14:04:52floriansetmessages: + msg2804
2013-12-16 14:03:52maltesetmessages: + msg2803
2013-12-16 14:01:39floriansetmessages: + msg2802
2013-12-14 22:27:01maltesetmessages: + msg2799
2013-12-14 22:13:18floriansetmessages: + msg2798
2013-12-13 18:12:32maltesetmessages: + msg2797
2013-12-12 22:07:23maltesetmessages: + msg2795
2013-12-12 22:01:16floriansetmessages: + msg2794
2013-12-12 21:38:27maltesetmessages: + msg2793
2013-12-12 21:32:30maltesetmessages: + msg2792
2013-12-12 21:24:39floriansetmessages: + msg2791
2013-12-12 19:47:20maltesetmessages: + msg2790
2013-12-12 19:43:46maltesetmessages: + msg2789
2013-12-11 11:19:56floriansetmessages: + msg2779
2013-12-10 23:50:56maltesetmessages: + msg2778
2013-12-10 23:44:46floriansetstatus: chatting -> in-progress
assignedto: florian
messages: + msg2777
2013-08-01 10:36:05maltesetmessages: + msg2594
2013-08-01 09:57:02floriansetmessages: + msg2593
2013-08-01 08:11:50erezsetmessages: + msg2592
2013-08-01 03:21:54floriansetmessages: + msg2589
2013-08-01 02:50:34floriancreate