Issue705

Title Reduce memory usage of successor generator
Priority wish Status resolved
Superseder Nosy List florian, haz, jendrik, malte
Assigned To florian Keywords
Optional summary

Created on 2017-01-11.10:41:50 by florian, last changed by malte.

Files
File name Uploaded Type Edit Remove
main.cc florian, 2017-04-27.19:50:30 text/x-c++src
successor-generator-statistics.patch malte, 2017-01-11.18:48:48 text/x-patch
Messages
msg6516 (view) Author: malte Date: 2017-09-03.18:01:05
> I never went into to much detail because the implementation might be different
> in another version and I thought that a rough estimate would be sufficient for
> this issue.

The formula in the code looks as detailed as it could possibly be. :-)
(That's why I only quoted a fragment here.)

The main point I overlooked before my previous comment is that the formula in
the code is already parametric (for unordered_map<K,V>). I didn't realize this
at first because this issue only talks about unordered_set<int>, not about
unordered_map or different key types than int.

I would still like to polish the estimation code a bit and add a method for
unordered_set (we currently only have unordered_map), but I think I know how to
proceed now. Thanks, Florian!
msg6515 (view) Author: florian Date: 2017-09-03.17:34:12
The formulas in msg6278 are mostly based on guesses that try to make sense of
the measured data from the program attached to this issue (main.cc). I think I
ran the program with 32bit/64bit and looked at the differences back then. I also
remember looking at the implementation on my laptop back then. The comments in
the implementation explained the implementation strategy and helped me come up
with a guess for the formula.

I never went into to much detail because the implementation might be different
in another version and I thought that a rough estimate would be sufficient for
this issue.
msg6495 (view) Author: malte Date: 2017-09-01.16:53:31
Not sure if this is the best place to put this information, but it may be better
than putting it nowhere. Once we have resolved these questions, perhaps this
would be useful information for the ForDevelopers section of the homepage.

I've wanted to generalize Florian's information on the memory requirements of
unordered_set<int> from msg6278 to unordered_set<T>, but it's not fully clear to
me how to generalize it. Let's look at the non-constant parts of Florian's info
on unordered_set<int>:

    M*2*(i + p) [entries]
    B*p [buckets]

where

    M: number of set elements
    B: number of buckets (roughly in the range [M, 2M]; more on this in msg6278)
    i: size of an integer
    p: size of a pointer

I'm wondering about the M*2*(i+p) part and how to generalize it to non-int sets.
The formula implies that we store two pointers and two ints for every entry.

I assume one of the two ints is the actual payload and the other one is the hash
value. Is this correct? In that case, I think it would be clearer to say that
this is one int and one size_t, as these have different sizes in 64-bit builds?
(But perhaps despite the 4-byte ints in a 64-bit build, we'd get 32 bytes per
entry here rather than 28 due to alignment?)

Are the two pointers a forward and backward pointer in a linked list? This
surprises me slightly as the description of the implementation I found mentioned
that this should be a singly linked list in gcc, but this may be out of date
(http://bannalia.blogspot.de/2013/10/implementation-of-c-unordered.html).
msg6457 (view) Author: jendrik Date: 2017-07-17.09:43:57
> Jendrik, will this need any follow-up work for the hash functions that are
> used in some of the successor generator nodes?

I don't think so. We plan to keep using the built-in hash functions for hashing 
primitive types and as far as I can see the successor generator code only hashes 
ints.
msg6452 (view) Author: malte Date: 2017-07-15.14:51:12
No separate report needed for this issue. I'd still be interested in seeing
these numbers for the current default, but we can do that over at issue732. Thanks!
msg6450 (view) Author: florian Date: 2017-07-15.08:34:30
I didn't get to this on Friday and probably won't have time on the weekend. Do
you still need the numbers from msg6444? It looks like you already did what you
wanted to do with them. I'll run an experiment for issue732 when I get back and
measure time in there. Is that enough or should I also generate the report for
this issue branch?
msg6447 (view) Author: malte Date: 2017-07-14.15:37:40
The new issue is issue732.

Florian, I've added you to the nosy list, hoping that you could perhaps help me
with the experiments and/or do a code review once the code is ready to be
reviewed. If you would rather not be involved, just let me know. Of course
everybody else is also welcome to join the discussion at issue732.
msg6445 (view) Author: malte Date: 2017-07-14.15:34:49
I've played around a bit with the algorithm for creating the successor
generators, and the initial results look quite promising.

I tested on the last tetris-sat14 task, since the scatter plot suggested this
might be a challenging domain for our current algorithm. The experimental new
implementation creates the successor generator in 0.06 seconds, more than 100
times faster than our current code (8.52 seconds). Without more experiments, I'm
not convinced that the algorithm is implemented correctly, but at least every
operator seems to show up in the resulting successor generator exactly once,
which is a good property to have. :-)

I'll create a new issue for this.
msg6444 (view) Author: malte Date: 2017-07-14.11:44:39
Hi Florian, can you let me know which task is the one where successor generator
creation takes the longest with the new code? Even better if you have a report
-- it looks like the latest report doesn't include data on successor generator
creation time and peak memory increase.
msg6443 (view) Author: florian Date: 2017-07-13.19:27:42
Done. Jendrik, will this need any follow-up work for the hash functions that are
used in some of the successor generator nodes? I saw that issue693 was merged in
the meantime.
msg6441 (view) Author: malte Date: 2017-07-10.18:00:47
OK, feel free to merge.
msg6440 (view) Author: florian Date: 2017-07-10.17:45:00
> The satisficing results don't look that good, and there are some very odd data
> points, for example nomystery-sat11-strips:p06.pddl, where the number of
> evaluations increases from 276 to 6326008. Any ideas? Or is this a
> non-reproducible glitch?

I think this is a glitch. There is a difference in 24 instances. I tried some of
them locally and got very different results. Sometimes the local numbers matched
the grid, sometimes the local master version matched the issue branch on the
grid, sometimes both were different, ... I would suspect that this is an effect
of one of the bugs in the landmark code.
msg6437 (view) Author: malte Date: 2017-07-10.16:51:36
Code looks good; I made one more comment.
msg6436 (view) Author: malte Date: 2017-07-10.16:48:28
The satisficing results don't look that good, and there are some very odd data
points, for example nomystery-sat11-strips:p06.pddl, where the number of
evaluations increases from 276 to 6326008. Any ideas? Or is this a
non-reproducible glitch?
msg6435 (view) Author: florian Date: 2017-07-10.15:14:32
The new results still look good. This revision includes the fixes to your
comments up to Friday but your last comments probably won't influence the
results much (I fixed them today).

Sorry about all the unexplained errors. I ran LM-cut, ipdb, cegar and lama-first
on all tasks from optimal and satisficing suites so the optimal configurations
crashed on tasks with axioms or conditional effects. There were no other
unexplained errors.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-v8-issue705-base-issue705-v12-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-search_time-base-v12.png
msg6422 (view) Author: malte Date: 2017-07-08.15:34:11
Done with my bitbucket comments.
msg6421 (view) Author: malte Date: 2017-07-08.15:01:46
If I recall correctly, the issue is that a variable is called interesting
already if there are any immediate operators, but these are not related to the
variable. It also seemed to me at first glance as if the big-O complexity of the
algorithm we're using is not as good as it could be. I suggest adding a TODO
there, such as:

"TODO: Look into more efficient algorithms that don't iterate over all operators
in the subtree at each layer of recursion. It also seems that we currently
consider variables "interesting" as soon as there are immediately applicable
operators, even if no preconditions on the variable exist. This could be improved."

Depending on how soon we can merge this issue, I might have a stab at this.
(This weekend I might have some time; later on perhaps not.)
msg6417 (view) Author: florian Date: 2017-07-07.18:29:55
Thanks for the review. I made some changes and made one more attempt to use
unique_ptr in the classes. This time it worked! I don't know what went wrong the
last time. I hope its not an OS specific thing.

I remember we had some discussion about the semantics of var_is_interesting and
immediate operators but cannot recall what I should change.
msg6408 (view) Author: malte Date: 2017-07-05.17:49:00
As discussed offline, I'm done with my code review. The two main things I
remember are the "cout << endl" line and the use of "override".

It would be nice to add a few comments: one explaining why we use raw pointers,
another mentioning that the construction algorithm and its handling of immediate
operators could perhaps be improved a bit. But I don't think we need to do this
improvement in this issue; we should not keep this up further. I'd be happy to
take a stab at this algorithm if I can find time, also to make it more
efficient, but again we should not let this keep up this issue.
msg6405 (view) Author: florian Date: 2017-07-04.19:53:24
Here is the full report:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-v7-issue705-base-issue705-v11-compare.html

I'll run more experiments as you suggested.
msg6403 (view) Author: malte Date: 2017-07-04.19:33:03
I've started looking at the code.

For the final version, it would be interesting to have some more experimental
results, for example for lama-first (on the satisficing suite), A* + lmcut and
A* + cegar or A* + ipdb to see if we see coverage changes somewhere. I assume
for A* + blind we see no effect because the timeout doesn't matter much there
(because it tends to run out of memory), and the tasks where we get substantial
memory savings for the successor generator itself are too large for blind search.
msg6402 (view) Author: malte Date: 2017-07-04.19:27:40
Do you have a lab report of base vs. v11 (other than the scatter plot)?
msg6315 (view) Author: florian Date: 2017-05-02.14:18:07
Here are the results of the final experiment:

(v9) merged in the current default branch including issue688 which uses operator
IDs instead of OperatorProxy in the successor generator. This reduces the
required memory because OperatorProxy contains an additional pointer to the task.
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_peak_mem_diff_per_task_size-v8-v9.png

(v10) then also uses variable IDs instead of variable proxies for variables in
switches which reduces memory usage for the same reason but to a smaller degree
because switch variables are only responsible for a small fraction of memory usage
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_peak_mem_diff_per_task_size-v9-v10.png

(v11) is the revision in the pull request. I'm not aware of any optimization
between v10 and v11. I only moved things around a bit to clean up the code
(split large functions into smaller ones, uncrustify, etc.) Still, the results
show some performance improvement between the versions. I'm not sure where this
is coming from, but I'm willing to accept that gift horse.
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_peak_mem_diff_per_task_size-v10-v11.png


Over all, here is the comparison of the base version compared to v11:

Memory per task object reduces significantly, specifically in the large cases:
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_peak_mem_diff_per_task_size-base-v11.png

Search time decreases on average (in some domains by a lot) although there is no
difference in coverage:
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-search_time-base-v11.png

Construction time also decreases significantly in most cases. In Tidybot the
time increases somewhat (the two extreme cases are 4.4s to 6.7s and 3.3s to
5.1s) but in general, the effect is clearly positive.
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_construction_time-base-v11.png
msg6279 (view) Author: florian Date: 2017-04-28.18:58:26
I think this is ready for review now. There is more that could be done but I
don't think it is necessary until we find a case where the option parser is a
greater bottleneck. I added some comments on possible optimizations to the
already existing list in the file.

The pull request is on bitbucket, the experiments for the final version are running:
https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/34
msg6278 (view) Author: florian Date: 2017-04-28.18:52:58
The estimation in the last post was completely off because I used a 64bit
compile and assumed 32bit pointers. The updated formulas make more sense. Say p
is the size of a pointer and i is the size of an int. 

1) Vectors:
       3*p [sizeof(vector)]
     + N*i [entries]
     + 2*p  [overhead]
   Again, there is some alignment issue that causes vector sizes to increase in
   steps of 2 (32bit) or 4 (64 bit).

2) Hash maps:
       7*p [sizeof(unordered_set)]
     + M*2*(i + p) [entries]
     + B*p [buckets]
     + 2*p [overhead]
   The number of buckets B can be predicted quite well with the small number of
   entries that we are looking at. Its B = 3 * 2^n - 1 for the smallest value
   of n such that B > M. The size per entry 2*(i + p) is what the data structure
   requires but in some combinations of word width and entry size the maps use
   more memory per entry. I also suspect alignment issues here but did not
   investigate further.
msg6254 (view) Author: florian Date: 2017-04-27.19:50:30
I attached a small program to measure the memory impact of vectors and hash
maps. If you call it as
"./main N M" it will create 100000 unordered maps and add M random values from
[1..N] and measure the change in peak memory usage. When line 45 is commented
out and line 46, it creates vectors with N elements instead. I ran this for N =
1..50 and M = 1..N and got stable results.

The overhead of both classes is much higher than I thought:

1) Vectors: 40 + 4N. Actually, the memory usage increases in steps of 16, so it
stays constant for 4 consecutive sizes (might be an alignment issue).

2) Hash maps: 96 + 32M + 16B where B is an offset that gets bigger with M at
certain stages. The stages start out as 3*2^n - 1 (2, 5, 11, 23, 47) but then
the gaps are slightly larger. The values of B are close to M at the start of the
stage (1, 4, 10, 22, 47).

The next revision adds these formulas as size estimations for the switches. When
creating a switch, we estimate both sizes and use the class with the lower
estimate. Since values of B are more relevant for N and M below 50, I added
explicit values for them, and use an estimation of 3*2^n - 1 for higher values.


The memory usage unfortunately increases with this patch:
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_peak_mem_diff_per_task_size-v7-v8.png

I assume that is because the estimation is off. I used int vectors, and the
operator proxies we currently use store an additional pointer to the task. I'll
merge the default branch with issue688 and repeat this experiment.
msg6241 (view) Author: florian Date: 2017-04-27.13:25:11
As we discussed offline, I added another specialized class for a leaf node that
only adds a single operator. As before, this improves our metric (estimated
bytes per task object):
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_peak_mem_diff_per_task_size-v6-v7.png

The effect on overall search time and memory usage was mostly smaller than the
noise.

The next step is to look for better rules of when to use which switch node.
msg6239 (view) Author: florian Date: 2017-04-27.11:46:46
I added two more specialized classes and a missing vector.reserve():

(v4) just adds the call to reserve and a bunch of statistics output.
Compared to (v3) memory usage actually increases in some cases (mostly
transport and scanalyzer and mostly between 100% and 120% of the
original value). Not sure what's going on here.
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-v3-v4.png

(v5) uses an unordered_map instead of a vector for switch nodes when
less than 50% of vector elements are non-zero. Compared to (v4) this
decreases memory usage on the high end of the curve (cases where we
need more bits per task object) and has some overhead on the low end.
Overall it looks quite good, but we could experiment with more
sophisticated criteria for switching to the hash-based switch node:
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-v4-v5.png

(v6) After (v5) a lot of wasted memory came from hash maps with just
one entry, so I added a specialized switch node that just does a single
check. This had a clearly positive result.
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-v5-v6.png


I also ran blind search on revisions (v3), (v5), and (v6). I uploaded
the plots that compare the base to (v6). Let me know if you are
interested in other results as well.

Runtime decreases in general but not enough to lead to increased
coverage. Memory usage is only dominated by the successor generator in
very small cases, so there is little effect. Successor generator
construction time also can reduce quite a bit but again, this usually
doesn't influence overall runtime.
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-total_time-base-v6.png
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-search_time-base-v6.png
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-memory-base-v6.png
 
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-sg_construction_time-base-v6.png
msg6228 (view) Author: malte Date: 2017-04-26.15:00:38
Sounds good! We should also evaluate how efficiently we generate successors,
i.e., run a search. Only running a blind search should be enough, but if you
want, you can also add astar(lmcut()) to have a bit more data.

I don't think we need this for all revisions, but I think it would be good to
have the data for the baseline vs. v3, and then also for the following revisions.
msg6227 (view) Author: florian Date: 2017-04-26.14:56:41
I started working on this issue. As a first step, we wanted to clean up the
interface a bit. Currently, the "switch" node performs three different tasks:
 1) add immediately applicable operators
 2) fork the search into the "don't care successor" and the "value successor"
 3) branch over values of a variable

As a first step, we split the switch class into three classes for those three
tasks. The following plots compare the base revision against the revision after
some changes. We measure the change M in peak memory caused by creating the
successor generator (approximating the actual memory usage) and the task size N
reported by the translator (approximating the task encoding size). The x-axes of
the plots show the ratio M/N, which approximates the number of bytes per object
in the task encoding. The y-axes show by how much this ratio changed in the new
version.

(v1) After splitting off functionality 1), memory usage decreases in some cases
because we no longer have the overhead of the vector to store immediate
operators when they are not needed:
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-base-v1.png

(v2) After additionally splitting off functionality 2), memory usage increases
again (in some cases above the original size) because we now have additional
objects to fork the search. This can also help in situations where we previously
had a switch without a don't care successor but this doesn't occur often. So
this change makes things worse but it cleans up the interface:
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-base-v2.png


(v3) After splitting the "switch" nodes into three parts, I removed the "empty"
node that was one of the main reasons for the memory usage in large tasks. In
most node types, we can assert that successors are not empty (otherwise, we
don't create the node). Only when branching over values, a successor can be
empty. The new code uses nullptr in this situation and adds a check. The memory
usage clearly improved and also mostly offset the loss from (v2):
  http://ai.cs.unibas.ch/_tmp_files/pommeren/issue705-base-v3.png


The next step is to add a hash-based switch node for cases where the switch node
over values is only sparsely populated.
msg6085 (view) Author: malte Date: 2017-01-11.20:17:25
> In the meantime, I'd be interested to know if the non-determinism discovered here
> (set for landmarks?) was the cause for altered expansions/evaluations when we
tried
> optimizing the successor generator construction (issue556). Perhaps not, as blind
> search was giving non-uniform results, but maybe something has changed since then
> that has inadvertently fixed the issue.

No, that issue only affects landmark-based configurations (and as far as I
understand, only ones using reasonable orders, so not the optimal ones). In
issue557, we already saw changes in expansions/evaluations in A* with blind
search. To me the most likely explanation is that the code in issue556 itself
was buggy because A* with blind search should really be stable in terms of
generated/expanded/evaluated nodes with a fixed operator ordering. But I don't
think anyone looked into this further.
msg6084 (view) Author: haz Date: 2017-01-11.20:09:51
I s'pose. Then again I trust my back-of-the-envelope math far less than you guys 
might. It's the quick statistics that I crave from the memcache outputs -- how 
many internal/leaf nodes in the successor generator, how big is each, what do the 
counts look like with the other metrics (op numbers, precond numbers, etc). All 
could be lifted with the right code alterations, but I've frequently found things 
I never thought to consider when just browsing the memcache stats.

If 662 isn't picked up in the next couple of weeks, I'll have a crack at finding 
a work-around when things ease up a bit commitment-wise. In the meantime, I'd be 
interested to know if the non-determinism discovered here (set for landmarks?) 
was the cause for altered expansions/evaluations when we tried optimizing the 
successor generator construction (issue556). Perhaps not, as blind search was 
giving non-uniform results, but maybe something has changed since then that has 
inadvertently fixed the issue.
msg6079 (view) Author: malte Date: 2017-01-11.19:14:37
Disagree in this case. :-)

It's not difficult to understand the memory complexity of the successor
generator, and the inaccuracies of a back-of-the-envelope estimate (e.g. whether
typical dynamic allocation overhead is 8 bytes, 12 bytes or whatever) don't
really matter much in understanding what is wrong and what would be a better
solution.

Profiling will tell you what happens in a particular task, but that's no
replacement for understanding what happens in general because the successor
generator can have so many different extreme cases (we have seen tasks with
30000 variables; tasks with a few variables but ranges around 1000; tasks with a
million operators; tasks where one operator has hundreds of thousands of
conditional effects -- ok, these don't actually matter for the successor generator).

But of course memory profiling can be very useful in general, and if someone
wants to work on issue662, this would be great. Enthusiasm for cmake-related
tasks is often a bit low around here, though. :-)

Ideally, I think one should both profile and try to understand the data
structure intellectually. When Jendrik and I looked at the state registry hash
table implementation recently, I felt that the combination of the two was very
insightful and critical for designing something better, much more so than only
doing one of the two. But here with the successor generator the data structures
are so simple that a profiler is not actually that useful.
msg6078 (view) Author: haz Date: 2017-01-11.19:00:44
Would be so much easier with memcache in hand (ala issue662).
msg6074 (view) Author: malte Date: 2017-01-11.18:48:48
I'm attaching a patch that provides the statistics in msg6064. Note that the
output is based on 32-bit assumptions, as it uses hard-coded numbers like "4" to
estimate the size of a pointer. The patch applies cleanly to revision
609499afc23a, which as of this writing is an unmerged revision on the issue688
branch. Hence it will underestimate the memory usage on a 64-bit system by a
factor of roughly 2.

The patch does not apply cleanly to default, but if someone wants to work on
this, it should be easy to port because it merely adds a bunch of methods (plus
some statistical output at the start and end of successor generator
construction). It may be easier to wait until issue688 is merged, though, to
avoid a merge conflict with this issue as both touch the successor generator.

One aspect I didn't mention in msg6064 is that it is wasteful in data structures
like these to use "VariableProxy" instead of a variable ID. In the example, this
causes us to store 13847 pointers to the same abstract task in the successor
generator. For really large tasks with more complex preconditions than
Logistics, this number can easily be on the order of 10^5-10^6.
msg6066 (view) Author: florian Date: 2017-01-11.10:42:03
Quoting Malte from msg6064:

>> By the way, I'm surprised that a successor generator for logistics98 task #25
>> should take so much memory. There are roughly 40000 operators, and Logistics
>> operators should have either 1 or 2 preconditions. The successor generator size
>> should scale roughly with the total number of preconditions, so less than 80000
>> "units" in this case, and 100 MiB for this sounds a lot.
>
> I looked into this. The thing I missed is that every internal node of the
> successor generator stores a pointer for every value of the tested variable,
> even if there are no operators that have this value as a precondition. This
> could be avoided by using a hash map instead of a vector to access the children,
> and then we would indeed be linear in the total number of preconditions (plus
> the number of actions without preconditions), but of course using hash maps has
> other disadvantages.
> 
> A good compromise could be an "adaptive" implementation that uses vectors when
> they are densely populated and something else otherwise. This could fit into the
> current framework easily because we already have polymorphic node types. This
> could be taken further by e.g. having dedicated node types that just add an
> operator or set of operators to the result set and then chain to another node,
> which would let us avoid lots of empty or tiny vectors for "immediate_operators"
> and friends.
> 
> With the current implementation we can be a factor of "max variable domain size"
> worse than the big-O memory usage we would like, and Logistics is one of the
> domains where variable domains can get very large. There are 29 variables with a
> domain size of 13 (representing trucks), 13 variables with a domain size of 27
> (representing planes), and 18 variables with a domain size of 393 (representing
> packages).
> 
> Creating the successor generator indeed increases peak memory usage by ~100 MiB,
> so this is roughly the amount of space that is actually used by the successor
> generator including memory management overhead. ("Roughly" because just
> measuring peak memory increase is a somewhat dangerous way to measure this,
> especially if a lot of memory has been released recently.)
> 
> I added some code with some rough calculations to measure how much memory is
> used for what. My numbers only add up to 84 MB, so are somewhat inaccurate,
> which I assume is because I didn't take into account memory padding or vector
> overallocation (although I hope we use reserve to avoid the latter), and I
> estimated the memory management overhead somewhat optimistically as 8 bytes per
> call to "new".
> 
> For tasks like these, empty successor generators and pointers to them are
> responsible for the bulk of the memory usage. With 64-bits, this will get worse,
> as all pointers are double the size. If we're unlucky, memory usage will almost
> double with a 64-bit compile. Perhaps it's not quite as bad as that because
> memory management overhead doesn't double. [Addendum: I have now tried it out
> with a 64-bit compile, and memory usage indeed roughly doubles to ~206 MB for
> each successor generator.]
> 
> Here are some numbers I gathered for this Logistics task (32-bit mode). This is
> for the second successor generator created, but all numbers other than the peak
> memory statistics in the first two lines are identical for the first one.
> 
> MALTE: Constructing successor generator... [t=1.0473s, 138584 KB]
> MALTE: Done constructing successor generator... [t=1.45588s, 241480 KB]
> MALTE: SG size estimates: total: 83767368
> MALTE: SG size estimates: object overhead: 62139348
> MALTE: SG size estimates: operators: 527192
> MALTE: SG size estimates: switch var: 110776
> MALTE: SG size estimates: generator for value: 20934664
> MALTE: SG size estimates: default generator: 55388
> MALTE: SG object counts: switches: 13847
> MALTE: SG object counts: leaves: 26208
> MALTE: SG object counts: empty: 5138224
> 
> If the estimates are accurate, then roughly 3/4 of the overall memory is used
> for empty successor generators. This could be cut down to 0 by using nullptr
> instead of millions of identical empty generators, or to a small constant number
> of bytes by reusing the same empty generator, similar to the reuse of nodes in
> the look-up trees for Cartesian abstractions. Of the remaining 1/4 of memory
> used, more than 90% is taken up by the "generator for value" vectors. By storing
> this information more cleverly and perhaps also being a bit cleverer about
> memory allocation (we could reduce overhead to close to zero by using an arena
> allocator), it should be possible to reduce overall successor generator memory
> usage by roughly 90% for tasks like this one.
msg6065 (view) Author: florian Date: 2017-01-11.10:41:50
The successor generator currently wastes a lot of memory. In some logistic tasks
it requires hundreds of MB. We want to reduce the memory overhead in this issue.
History
Date User Action Args
2017-09-03 18:01:05maltesetmessages: + msg6516
2017-09-03 17:34:12floriansetmessages: + msg6515
2017-09-01 16:53:31maltesetmessages: + msg6495
2017-07-17 09:43:57jendriksetmessages: + msg6457
2017-07-15 14:51:12maltesetmessages: + msg6452
2017-07-15 08:34:30floriansetmessages: + msg6450
2017-07-14 15:37:40maltesetmessages: + msg6447
2017-07-14 15:34:49maltesetmessages: + msg6445
2017-07-14 11:44:39maltesetmessages: + msg6444
2017-07-13 19:27:42floriansetstatus: reviewing -> resolved
messages: + msg6443
2017-07-10 18:00:47maltesetmessages: + msg6441
2017-07-10 17:45:00floriansetmessages: + msg6440
2017-07-10 16:51:36maltesetmessages: + msg6437
2017-07-10 16:48:28maltesetmessages: + msg6436
2017-07-10 15:14:32floriansetmessages: + msg6435
2017-07-08 15:34:11maltesetmessages: + msg6422
2017-07-08 15:01:46maltesetmessages: + msg6421
2017-07-07 18:29:55floriansetmessages: + msg6417
2017-07-05 17:49:00maltesetmessages: + msg6408
2017-07-04 19:53:24floriansetmessages: + msg6405
2017-07-04 19:33:03maltesetmessages: + msg6403
2017-07-04 19:27:40maltesetmessages: + msg6402
2017-05-02 14:18:07floriansetmessages: + msg6315
2017-04-28 18:58:26floriansetstatus: chatting -> reviewing
messages: + msg6279
2017-04-28 18:52:58floriansetmessages: + msg6278
2017-04-27 19:50:30floriansetfiles: + main.cc
messages: + msg6254
2017-04-27 13:25:11floriansetmessages: + msg6241
2017-04-27 11:46:46floriansetmessages: + msg6239
2017-04-26 15:00:38maltesetmessages: + msg6228
2017-04-26 14:56:41floriansetassignedto: florian
messages: + msg6227
2017-01-11 20:17:25maltesetmessages: + msg6085
2017-01-11 20:09:51hazsetmessages: + msg6084
2017-01-11 19:14:37maltesetmessages: + msg6079
2017-01-11 19:00:44hazsetmessages: + msg6078
2017-01-11 18:48:48maltesetfiles: + successor-generator-statistics.patch
messages: + msg6074
2017-01-11 12:21:59hazsetnosy: + haz
2017-01-11 10:42:03floriansetstatus: unread -> chatting
messages: + msg6066
2017-01-11 10:41:50floriancreate