Issue214

Title more compact state representation
Priority feature Status resolved
Superseder Nosy List andrew.coles, florian, gabi, jendrik, malte
Assigned To malte Keywords
Optional summary

Created on 2011-01-25.12:02:00 by malte, last changed by malte.

Files
File name Uploaded Type Edit Remove
faster-packed-states.patch andrew.coles, 2011-02-15.13:52:35 text/x-patch
only-use-axiom-evaluator-if-there-are-axioms.patch andrew.coles, 2011-02-20.00:15:36 text/x-diff
packed-states.patch andrew.coles, 2011-02-15.12:37:31 text/x-diff
packer.patch andrew.coles, 2014-03-15.16:12:08 text/x-diff
Messages
msg3064 (view) Author: malte Date: 2014-03-15.16:26:29
I'll move the discussion over to issue348. Whether or not this is worth inlining
depends on how often the code is called, and that's what issue348 is all about.

Thanks, Andrew!
msg3063 (view) Author: andrew.coles Date: 2014-03-15.16:12:08
Patch attached.

(Tidybot looks like it might be a pathological case, so it may well make a
negligible difference on other domains.)
msg3062 (view) Author: malte Date: 2014-03-15.16:02:11
Hmmm, we tried inlining this exact code, and someone (not naming names ;-)) said
it made a negligible difference. Can you attach a patch? Maybe we made a mistake
there, or maybe we looked at the wrong instances.
msg3061 (view) Author: andrew.coles Date: 2014-03-15.15:54:55
Great to see this now a feature.  Re. the tidybot issue: part of the problem is
that the compiler cannot inline the implementations of get/set at the points in
the code where they are called, as they're in int_packer.cc rather than
int_packer.h.

A quick test on the first 10 pfiles of tidybot, moving the VariableInfo class
and the implementations of set/get from int_packer.cc to int_packer.h, gives the
following total time figures:

pfile  issue214-v5-blind   +inlining   Diff
----------------------------------------------
p01    0.36                0.36        0
p02    33.22               30.26       2.96
p03    1.33                1.25        0.08
p04    22.9                20.94       1.96
p05    1190.98             1083.54     107.44
p06    124.98              114.74      10.24
p07    3.14                2.99        0.15
p08    828.61              752.05      76.56
p09    251.66              229.22      22.44
p10    1200.65             1117.71     82.94
msg3060 (view) Author: jendrik Date: 2014-03-15.12:13:44
Merged and pushed. Let's close this one and do the follow-up work in the new 
issues. Thanks to everyone involved, I think we are all glad that this was 
finally integrated.
msg3048 (view) Author: malte Date: 2014-03-14.01:13:19
Ah, channels crossed. :-) (Sorry, I was typing my message before seeing Jendrik's.)
msg3047 (view) Author: malte Date: 2014-03-14.01:12:03
The reason why the time score increases is because we run out of memory less
frequently and hence solve more problems, and every solved problem adds to the
score. If we had unlimited memory or used a shorter timeout, the old version
would have a better time score than the new one.

The main domains I suggest watching in the follow-up issues are tidybot (as
Florian says), pegsol (any version, but e.g. pegsol-opt11) and psr-large. These
are the ones where the total time increases most in terms of percentage. The
main configurations we ought to watch are blind, h^2, ipdb, h^CG and perhaps
also some of the less critical cases, such as LM-cut and BJOLP.

I had a look at tidybot. It seems to be unusual in several respects. We find
almost new mutexes, presumably because our algorithm is limited to one "counting
variable" and tidybot would require 2. Consequently, there's a huge number of
(almost all binary) variables. There is also a large number of operators, yet
actual branching is low. On problem opt11 #06 (one of the larger ones we can
solve), there are hundreds of variables, tens of thousands of operators, but the
average branching factor in the explored part of the state space is less than 2
if we take into account duplicates and less than 3 if we don't. So I expect that
we spend a lot of time there in the successor generators.

In any case, let's merge this. I'll move the list of interesting configurations
and domains over to issue348.
msg3046 (view) Author: jendrik Date: 2014-03-14.00:52:34
A valgrind+callgrind analysis for blind search on tidybot-opt11:p10.pddl shows 
that almost 15% of the time is spent in IntPacker::get(). These calls almost 
exclusively come from the successor generator, so this might be one place where 
we should consider unpacking the states in the context of issue348.
msg3045 (view) Author: florian Date: 2014-03-13.21:19:34
Tidybot seems to suffer particularly much in the timing results. Any idea why?
msg3044 (view) Author: jendrik Date: 2014-03-13.21:12:06
I ran an experiment for blind search on opt and sat domains:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue214-issue214-v5-compare.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue214-issue214-v5-sat-compare.html

Coverage only increases per domain, the code gets a little slower on average, 
but there are also many cases where it gets faster and the time scores actually 
increase.
msg3042 (view) Author: malte Date: 2014-03-13.14:13:55
We've had another round of code review and made some more changes to (hopefully)
make the code easier to understand. I would advocate a final round of
experiments to make sure we didn't mess up. If this helps reduce the amonut of
work, I think it's enough to just test blind search, but we should test on all
domains so that we also have results for the case with lots of axioms.

If experiments show no further problems, this is ready to merge.

There is one outstanding point for which we should open a new issue: the
question if, when and where to unpack the state information. We already have
issue348 related to the axiom evaluator that discusses this point, but this
should not be limited to the axiom evaluator, as our results for heuristics that
access state information many times show.

I suggest we open a new issue with this and deal with it soon, either straight
away or after issue420, which should be quick. We can take the information from
msg2994 and msg3004 to identify the most interesting configurations and
benchmarks for this, and msg3001 has some additional considerations regarding
IDA* etc. that might help guide the code design for the new issue.

Or maybe we don't need a new issue but should rather extend the scope of
issue348 to be about this more general aspect. (I think it makes sense to look
at these things together.)
msg3034 (view) Author: florian Date: 2014-03-06.16:12:03
We already have an issue for refactoring the axiom evaluator (issue348). This
just leaves the refactoring of static variables, Malte mentioned in the code
review for this issue.
msg3032 (view) Author: florian Date: 2014-03-06.13:58:28
> The odd bytes_per_state data point from the scatter plot doesn't seem to show
> up in the complete data. Where does it come from?

This is freecell problem 2-3. There are no preprocessing logs or properties for
this task but the correct output file exists. We assume preprocessing run twice
and I probably cancelled the second run after realizing this. The second
(unsuccessful) run then overwrote the log files and properties. The data point
then shows up as unsolved/None in the plot.
msg3004 (view) Author: malte Date: 2014-02-23.18:14:00
OK, after a few more minutes, the HTML file did finish loading barely within the
memory limit. :-)

Overall, I think the results look good and are ready to merge. The odd
bytes_per_state data point from the scatter plot doesn't seem to show up in the
complete data. Where does it come from?

Another lab-related remark: it would be useful if misleading/meaningless data
points for the portfolio and anytime configs weren't included in the tables at
all. They do make it difficult to identify the things that really need checking,
especially when the tables are so tall and wide that one has to scroll in such a
way that the first table row and column aren't visible, so it's hard to see what
a given cell means.

Not unexpectedly, we have some bad slowdowns in configs with fast heuristic
computation and lots of axioms, in particular h^CG + PSR-Large, where some
runtimes roughly double (359 => 649 seconds; 131 => 274 seconds). Unpacking
states should be helpful here.

It would be nice to see quality scores, too. These are usually what we're
ultimately interested in.

All in all, I think this is ready to merge, but it would be great if we could do
some follow-up work to look into the refactoring I mentioned and possible speed
improvements through unpacking states.
msg3003 (view) Author: malte Date: 2014-02-23.17:44:55
Thanks! I would ignore the LAMA memory results; I don't think one can
meaningfully compare memory usage in anytime configuations, and I'm not even
sure what the data reported by lab actually is in this case. In the time
results, you refer to the same config twice. (Again, comparing on LAMA would not
be very meaningful. A quality comparison for LAMA would be interesting, though.)

I don't think I can open the detailed report on my notebook without running out
of memory, so maybe someone else can look at something odd in the
bytes_per_state data: there seems to be a data point where one revisions
requires 8 bytes and the other requires roughly 100000. This should not be possible.
msg3002 (view) Author: florian Date: 2014-02-23.17:29:30
The sat experiments are done.

Bytes per state:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_sat_bytes_per_state.png

Memory:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_sat_memory_eager_greedy_cg.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_sat_memory_lazy_greedy_ff.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_sat_memory_seq_sat_lama_2011.png

Time:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_sat_total_time_eager_greedy_cg.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_sat_total_time_lazy_greedy_ff.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_sat_total_time_lazy_greedy_ff.png

Full report (100MB)
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214-issue214-sat-compare.html
msg3001 (view) Author: malte Date: 2014-02-23.02:00:16
The code looks good to me, and I think the bin-packing algorithm is also fine,
although I didn't check it completely.

I think it's worth considering two changes to the code.

First, it might be useful to refactor the actual int-packing functionality into
a general utility class rather than having it inside the State class. Ideally, a
class should serve only one purpose, and I think our State class now plays more
than one role, which makes it a bit complex.

Second, we should think about whether and in which situations we want to unpack
states. (This is point 8. from my previous list.) I think it's good that we
generate experimental results for the never-unpack case first, but unpacking
might be useful without complicating the code too much.


Thinking about the points above might also help us clarify certain logical
boundaries in the code. For example, if we wanted to implement IDA*, we would
still need State functionality such as looking up the value of a state variable,
dumping a state, or generating the successor state of a given state and an
operator (which isn't actually inside the State class at the moment). But we
probably wouldn't want to pack states. Maybe this can tell us something about
where the packing functionality ought to live.


I think neither of the two points above is pressing, and I wouldn't mind
deferring them to separate issues if that's what the others prefer. This would
leave only the satisficing planning experiments for this issue.
msg3000 (view) Author: florian Date: 2014-02-23.00:23:20
> 1. We should perform some experiments for satisficing planning.

The experiment is running. If all goes well, it will be done tomorrow.

> 2. A code review would be good. Where can we find the code?

https://bitbucket.org/flogo/downward-issues-issue214/pull-request/1/packed-states/diff
Jendrik already did a first round of reviewing.

> 3. Which greedy bin-packing algorithm do we currently use?

The patch is currently using First-fit decreasing.  At least I think it does (it
is not implemented by sorting).
Should we switch to Best-fit decreasing?

> I suggest that 4.-7. should be separate issues.

I agree.

> I don't know if we should open an issue for 4.

Makes sense.

> We already discussed 5. some time ago, but I don't think we have an issue for it
> yet. I think it would be good to open one.

issue423

> Regarding 6., I think this should be separate from 5. because it's less clear
> how to implement something like this in the best way. I would suggest opening an
> issue for this so that the idea does not get lost. This would also give Andrew a
> place to attach the code he mentions in the last paragraph of msg1237.

issue424

Concerning iPDB:
> Not having seen the code, the best explanation I can come up with is that we
> currently work directly on the states everywhere, and this affects the time to
> look up a state variable's value. Is this right?

Yes.

> (I don't know how h^2 is implemented.)

I don't know either.

> Looking at the results also brought up two feature requests for lab

I created issues for them in lab's issue tracker
https://bitbucket.org/jendrikseipp/lab/issue/8
https://bitbucket.org/jendrikseipp/lab/issue/9
msg2996 (view) Author: malte Date: 2014-02-22.16:01:43
Looking at the results also brought up two feature requests for lab:

- String fields, such as "config_nick" and "commandline_config" in the "info"
table, should not be right-justified.

- Similar to the "coverage" tally, it would be nice to have tallies of how many
task timed out, how many ran out of memory, how many were proved unsolvable and
whatever other categories make sense. (For experiments like this one, it would
be interesting to see how many cases are left for certain configurations where
we run out of memory.) More generally, we might want to consider how to best
support categorical attributes like our "exit code" attribute.
msg2995 (view) Author: jendrik Date: 2014-02-22.16:01:32
I think your explanation makes sense.
msg2994 (view) Author: malte Date: 2014-02-22.15:57:29
After looking at the results in more detail again, we might want to add:

8. Look at how and when to unpack states.

One thing that is a bit surprising about the results is that certain slow
heuristics (h^2, ipdb) are (slightly) adversely affected in terms of speed,
while blind search isn't. LM-Cut is also doing fine in terms of speed.

Not having seen the code, the best explanation I can come up with is that we
currently work directly on the states everywhere, and this affects the time to
look up a state variable's value. Is this right? This would help explain what
goes on in the IPDB heuristic, which may need to look at some state variables
many times, while LM-Cut only looks at each state variable once to initialize
its internal representation. (I don't know how h^2 is implemented.)
msg2993 (view) Author: malte Date: 2014-02-22.15:39:43
All results look excellent. (I still have to go over the big results table for a
second time to look for oddities.) The bytes_per_state scatter plot is
particularly nice.

Like Florian, I went over all comments for this issue again to consolidate our
TODO list. Here's what I came up with (including 1-2 new ones):

1. We should perform some experiments for satisficing planning.

2. A code review would be good. Where can we find the code?

3. Which greedy bin-packing algorithm do we currently use? Since this is an
offline version of bin-packing (we know all the state variables and their sizes
before we have to start making decisions), the best simple algorithms are
First-fit decreasing and Best-fit decreasing. With a small integer bin size,
these are both quite simple to implement efficiently, and I suggest we should go
with best-fit decreasing.

4. We can look into memory savings in other parts of the planner (such as the
open lists).

5. We can look into storing a smaller set of variables for each state. An
obvious space improvement would be to avoid storing the values of derived variables.

6. Taking the previous point further, as mentioned by Andrew in msg1237, we
could avoid storing variables whose values are implied by others through mutex
reasoning. Patrik's planner does something like this; he calls the identifying
variables a "spanning set" of variables. As far as I recall, beyond space
reduction, he could also use this information to make better decisions in
pattern selection and/or heuristic computation (identifying additive sets) in
his iPDB code.

7. We should not waste time in the axiom evaluation code when there are no axioms.

I suggest that 4.-7. should be separate issues.

I don't know if we should open an issue for 4. since it's more of an overall
goal than a concrete suggestion. So I would suggest that we only implement an
issue for this if someone plans to work on it.

We already discussed 5. some time ago, but I don't think we have an issue for it
yet. I think it would be good to open one.

Regarding 6., I think this should be separate from 5. because it's less clear
how to implement something like this in the best way. I would suggest opening an
issue for this so that the idea does not get lost. This would also give Andrew a
place to attach the code he mentions in the last paragraph of msg1237.

Regarding 7., this is already on the tracker as issue420.
msg2991 (view) Author: florian Date: 2014-02-22.15:04:23
The links for Merge-and-Shrink scatter plots in msg2986 are wrong. They should be

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_total_time_astar_merge_and_shrink_bisim.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_memory_astar_merge_and_shrink_bisim.png

Thanks Jendrik.
msg2988 (view) Author: malte Date: 2014-02-22.00:51:48
Great work! We need to run a test on our satisficing planning test suite, too.
There are some additional "unusual" domains there where it would be interesting
to see the results, such as psr-large.

> Finally, patch i) from msg1242 seems to solve a problem we recently discussed.
> Is there already an issue for this or is this already merged?

This is issue420, and it would be great if someone could assign it to
themselves. :-)
msg2987 (view) Author: andrew.coles Date: 2014-02-22.00:28:17
Great to see some progress here.  I suspect your 'simpler' approach will get
most of the benefit - if I recall, a lot of it came from storing binary
variables in one bit.

Re. a custom memory allocator for the open list: if you're referring to the
standard scalar open list, and this still uses a std::deque, there's not a lot
one can do to improve on this.  Internally, deques allocate storage space for
several entries at once - however many fit on a page.  This contrasts with the
other stl containers one could use, e.g. a list, which allocate space each entry
individually.

If there is anything else left which you need to allocate a lot of, then take a
look at the boost pool library: "Pools are generally used when there is a lot of
allocation and deallocation of small objects":

http://www.boost.org/doc/libs/1_55_0/libs/pool/doc/html/boost_pool/pool/introduction.html
msg2986 (view) Author: florian Date: 2014-02-22.00:12:48
This has been lying dormant for some time now. Jendrik and I started working on
this. We based our implementation on Andrew's patch but did not directly use it.
The state representation changed, so it was no longer applicable and we wanted
to start with something a bit more simple (i.e., without splitting values over
integer boundaries).

Differences between our implementation and Andrew's:
We use a word (int) as our basic storage type. To determine which variable is
stored where, we use greedy bin-packing and select the variable with the largest
domain that still fits in the current word. If all variables are too large, we
start a new word.
We do not yet unpack the state at the start of successor generation (msg1239) or
axiom evaluation (msg1242) but we do calculate the hash function directly on the
packed data (msg1239).

The results look quite good: coverage increases overall, though there are some
tasks that are no longer solved. Obviously, memory usage decreases; in some
cases we save 90% of memory per state. As Andrew already said, this
unfortunately does not mean that we also save 90% of total memory. We do not get
a coverage increase of 17 on the lmcut domains (but the way I read it, this
result also included other patches).

There are some other ideas in this discussion (some of them were already done as
part of other issues):
The custom allocator that allocates memory in bigger chunks is the main part of
the new state representation (although not implemented as an allocator).
Using only state_var_t* in the open lists was done but changed to StateIDs
recently. We do not yet have a custom allocator to open list entries, maybe we
should look into this, but I guess it should be a new issue.
Finally, patch i) from msg1242 seems to solve a problem we recently discussed.
Is there already an issue for this or is this already merged?





Reports
------------------------

Size of each state in Byte:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_bytes_per_state.png

Memory usage for some configs:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_memory_astar_blind.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_memory_astar_lmcut.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_memory_astar_merge_and_shrink.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_memory_astar_ipdb.png

Total time for these configs:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_total_time_astar_blind.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_total_time_astar_lmcut.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_total_time_astar_merge_and_shrink.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214_total_time_astar_ipdb.png

Full Report (60MB)
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue214-issue214-compare.html
msg2000 (view) Author: malte Date: 2011-12-20.13:19:43
See also issue309.
msg1242 (view) Author: andrew.coles Date: 2011-02-20.00:15:36
Okay, no more memory savings, but it's now faster on average to use the packed
state representation than not :).  This was achieved with two changes:

i) Only use the axiom evaluator if there are axioms in the domain.  In the
interests of fairness, I've applied this to my control (i.e. otherwise
unpatched) copy of the planner too, and the average time taken to find plans is
now 69.3% of what it was previously.  Patch attached.

ii) In AxiomEvaluator::evaluate, unpack the state at the start of the function,
then re-pack it again at the end.

The number of problems solved with these changes it the same as before - the
planner still runs out of memory long before CPU.  But, on average, excluding
run times < 1 second, the run time of the planner with the memory usage patches
is 76.3% of the planner without.  Best case is 22.6%, worst case is 117.9%.  A
scatter-plot of memory-usage-ratio plotted against runtime-ratio is at:

http://personal.cis.strath.ac.uk/~ac/memoryruntimescatterplot.png
msg1241 (view) Author: andrew.coles Date: 2011-02-18.12:19:10
...and another new allocator, this time used for the search space hash table
(allocating objects of type __gnu_cxx::_Hashtable_node<std::pair<StateProxy
const, SearchNodeInfo> >).

Using blind search and running with a 4GB memory limit (but then censuring the
results afterwards at either 1GB, 2GB or 4GB) the number of problems solved are:

Memory Limit & Downward & Downward + Patches \\
1GB          & 268      & 282 \\
2GB          & 276      & 293 \\
4GB          & 289      & 310

(Domains used were those in 'suite_lmcut_domains()' from
new-scripts/downward_suits.py, ran through the latest translator and
preprocessor, and using the same 'output' file with both configurations.)

Overheads with the new allocator are a little higher than without - (geometric)
mean of 5.4% higher run-times across problems that took longer than 1 second to
solve.

I have a few more ideas, the most obvious of which is changing the HashTable to
use state_var_t* directly rather than StateProxy (whilst still respecting the
need to do something equivalent to make_permanent upon insertion).  I'm also
looking over the open lists (custom allocator for std::deque nodes, at least
when doing blind search).
msg1240 (view) Author: andrew.coles Date: 2011-02-16.01:00:03
Just a brief update: I quickly wrote a new allocator for state-sized chunks of
memory (have a stack of available state-sized chunks of memory, when this
becomes empty, push another megabyte's worth onto it).  It helps.  I'll find out
tomorrow exactly what the benefit of that on its own is, compared to state
packing without it (and similarly, no state packing, but just the custom allocator).

For now, though, it's getting late, so, comparing the packed-states +
custom-allocator binary to that from the sources in the SVN, on a collection of
benchmark domains:

- 21 more problems were solved within 30 minutes/2GB.
- Excluding problems taking less than a second to solve, the average time taken
to find a plan was increased by 1.3%.  SD 10%, worst case 17.2%, best case -43%
(i.e. sometimes faster)
msg1239 (view) Author: andrew.coles Date: 2011-02-15.13:52:35
Okay, I've made a couple of changes to the patch that seems to alleviate most of
the time overhead.  New patch attached.  The two changes:

i) Unpacking the state at the start of successor generation, rather than
hammering operator[] all the way down the tree.
ii) Making the hash function work on the packed state variables directly, rather
than using operator[] to get the unpacked values.

A quick test (a harder driverlog problem, on a fastermachine):

Unpatched             : Total time: 6.49s, Peak memory: 89756 KB
Old patch             : Total time: 7.97s, Peak memory: 69516 KB
Old patch + (i)       : Total time: 7.88s, Peak memory: 69516 KB
New patch (with both) : Total time: 7.03s, Peak memory: 69516 KB

So generalising from one problem file (in exactly the way one shouldn't ;) ) the
time overheads are now 8% rather than 22%.

I'll fire it at a bunch more domains and problems - downward didn't report
memory usage last time I was looking into this, so it's much easier now to see
what's going on.
msg1238 (view) Author: malte Date: 2011-02-15.12:48:43
Wow! :-)

I agree that reducing other overheads will be important too, and I had similar
thoughts as you regarding that. The time overhead is a bit worrying, but maybe
we don't need to worry about blind search too much (then again, overheads do
tend to add up...).

I hope that there are instances (larger ones) where your patch would be a
clearer win. The memory advantage should grow with the number of state
variables, since memory-allocation-related overhead should be more or less constant.

In any case, I'd like to implement a solution like this at some point, even if
it's only to get rid of the current mess with having three different executables
for different state_var_t sizes. Will take me a while to get to it though; there
are a few higher priority items such as cleaning up the current m&s mess.
msg1237 (view) Author: andrew.coles Date: 2011-02-15.12:37:31
Interesting you should mention this, I looked into this a year or two ago and
had a couple of ideas for resolving the main unresolved issues.  The attached
patch (against the trunk SVN search) will do what you propose.  The results,
sadly, aren't as impressive as one might hope.  It starts off promising - on
driverlog pfile 3 it reports:

Unpacked size of each state: 22 bytes
Packed size of each state: 4 bytes

..i.e. it's now using 2 unsigned shorts, instead of 11 unsigned shorts, which is
pretty neat.  But, this doesn't translate into anywhere near a 5.5-fold
reduction in memory usage.  Taking the unpatched downward sources from SVN, at
the end of a blind A* search run on this problem, it reports:

Search time: 14.85s
Total time: 14.85s
Peak memory: 89316 KB

...whereas with the attached patch, this becomes:

Search time: 17.79s
Total time: 17.79s
Peak memory: 71056 KB

So slower, but without much memory gain.  The main underlying issue is that the
state data itself is a relatively small proportion of the overheads incurred in
storing a closed list.  My latest thinking is that the solution to this, more of
an engineering problem than a science problem, involves:

1) A state hash data structure with lower overheads
2) A custom array allocator, returning state-sized chunks of memory from a
larger allocated block:
 - use new[] to get blocks of a few megabytes, return sequential addresses in
this space
 - exploit the fact that closed lists are monotonically expanding, so
deallocation is a non-issue for states we want to keep

If that works, i.e. if the reduction in performance at least translates into a
reasonable reduction in memory usage, then I have another chunk of code
implemented that uses the mutex data from all.groups to allow some variables to
be excluded from the state entirely, inferring their values as needed (e.g.
whether a square is clear in sokoban, whether a truck is empty in driverlog)
from the other variables.  I'm happy to share what I have if anyone's interested
in this.
msg1225 (view) Author: malte Date: 2011-01-25.12:01:59
To get rid of the current state_var_t mess, we could use a compact
implementation where the state is a vector of int32s, int64s or whatever, but
each state variable does not correspond to a whole entry, but a subset of bits
that depends on the require range. For example, ten state variables that require
8, 6, 5, 7, 1, 1, 1, 1, 1 and 1 bits could be squeezed into a single 32-bit word.
History
Date User Action Args
2014-03-15 16:26:29maltesetstatus: chatting -> resolved
messages: + msg3064
2014-03-15 16:12:08andrew.colessetfiles: + packer.patch
messages: + msg3063
2014-03-15 16:02:11maltesetmessages: + msg3062
2014-03-15 15:54:55andrew.colessetstatus: resolved -> chatting
messages: + msg3061
2014-03-15 12:13:44jendriksetstatus: chatting -> resolved
messages: + msg3060
2014-03-14 01:13:19maltesetmessages: + msg3048
2014-03-14 01:12:03maltesetmessages: + msg3047
2014-03-14 00:52:34jendriksetmessages: + msg3046
2014-03-13 21:19:34floriansetmessages: + msg3045
2014-03-13 21:12:06jendriksetmessages: + msg3044
2014-03-13 14:13:55maltesetmessages: + msg3042
2014-03-06 16:12:03floriansetmessages: + msg3034
2014-03-06 13:58:28floriansetmessages: + msg3032
2014-02-23 18:14:00maltesetmessages: + msg3004
2014-02-23 17:44:55maltesetmessages: + msg3003
2014-02-23 17:29:31floriansetmessages: + msg3002
2014-02-23 02:00:16maltesetmessages: + msg3001
2014-02-23 00:23:20floriansetmessages: + msg3000
2014-02-22 16:01:43maltesetmessages: + msg2996
2014-02-22 16:01:32jendriksetmessages: + msg2995
2014-02-22 15:57:29maltesetmessages: + msg2994
2014-02-22 15:39:43maltesetmessages: + msg2993
2014-02-22 15:04:23floriansetmessages: + msg2991
2014-02-22 00:51:48maltesetmessages: + msg2988
2014-02-22 00:28:17andrew.colessetmessages: + msg2987
2014-02-22 00:12:48floriansetnosy: + florian
messages: + msg2986
2013-08-22 10:03:42jendriksetnosy: + jendrik
2011-12-20 13:19:43maltesetmessages: + msg2000
2011-03-07 17:34:17maltesetnosy: + gabi
2011-02-20 00:15:36andrew.colessetfiles: + only-use-axiom-evaluator-if-there-are-axioms.patch
messages: + msg1242
2011-02-18 12:19:10andrew.colessetmessages: + msg1241
2011-02-16 01:00:03andrew.colessetmessages: + msg1240
2011-02-15 13:52:35andrew.colessetfiles: + faster-packed-states.patch
messages: + msg1239
2011-02-15 12:48:44maltesetmessages: + msg1238
2011-02-15 12:37:31andrew.colessetfiles: + packed-states.patch
nosy: + andrew.coles
messages: + msg1237
2011-01-25 12:02:06maltesetstatus: unread -> chatting
2011-01-25 12:02:00maltecreate