Issue348

Title Rethink how and when to unpack states
Priority wish Status chatting
Superseder Nosy List andrew.coles, florian, gabi, jendrik, johannes, malte
Assigned To florian Keywords
Optional summary
TODOs (incomplete):

1. Check that goalcount heuristic benefits from the changes in this issue 
(converting GlobalStates to States made it slower, see issue554).

Created on 2012-08-30.13:28:12 by florian, last changed by jendrik.

Summary
TODOs (incomplete):

1. Check that goalcount heuristic benefits from the changes in this issue 
(converting GlobalStates to States made it slower, see issue554).
Files
File name Uploaded Type Edit Remove
callgrind.airport.p09.default florian, 2014-10-08.17:34:34 application/octet-stream
callgrind.airport.p09.patched florian, 2014-10-08.17:34:50 application/octet-stream
Messages
msg4337 (view) Author: florian Date: 2015-07-07.22:20:58
@Gabi: I'm adding you to the nosy list because you are working on issue545 and
this might be relevant. Feel free to undo this.

This issue has been lying around for a while again. In the meantime the
situation has changed once again. We currently have two state classes:

There is GlobalState which has an id, is used in the search, by StateRegistry,
and PerStateInformation but does not know anything about the new task interface.
GlobalState objects are immutable and hold a pointer to their data (stored in a
StateRegistry). Each access to variable values is packed/unpacked on-the-fly
individually. In the long run, we want the global task to disappear, which means
we also hav to get rid of GlobalStates.

Then there is the class State which contains unpacked state data. State objects
know about the task interface and can generate FactProxy objects for it. Access
to variable values is a direct array access, but copying States is expensive.
States can currently be created from GlobalStates by the intermediary helper
method convert_global_state. States are currently used within heuristics (those
that already are updated to the new task interface). Operator application is
implemented for them, but does not support axioms.

The problem of packing/unpacking currently exists for both state classes:
GlobalState is always packed, which makes evaluating axioms slow when generating
successors in the search algorithms. State is always unpacked which is a problem
for heuristics like the blind heuristic or PDBs that only access very few facts.
For example, if we currently generate a successor and then evaluate a pdb
heuristic for it, the following happens:

1. StateRegistry::get_successor_state reserves space for packed state data and
copies the the current packed state data there.
2. It then loops through the effects of the operator and checks for each one, if
it fires. To do that, it unpacks each value individually. The effects are then
set in the new packed state individually as well.
3. The AxiomEvaluator is called on the new packed state data and again accesses
each value individually (read and write access).
4. A GlobalState object is created that points to the new packed state data.
5. A State object is created by accessing/unpacking each value individually and
storing the result in an int array.
6. The PDB accesses the values for variables in its pattern.

This is inefficient mostly in step 3. (evaluating axioms would be faster on
unpacked states) and step 5./6. (The PDB only needs a few values and should not
unpack the state).

Malte mentioned in msg4326:
> One possibility is to replace the current concrete State class with
> an abstract interface with unpacking and non-unpacking implementations.

Gabi currently also works on issue545 which will change the state class(es) once
again.
msg3741 (view) Author: malte Date: 2014-10-09.10:48:52
Sure, sounds good.
msg3728 (view) Author: florian Date: 2014-10-08.20:54:46
What do you suggest as a next step? You pointed out a problem in the code review
which accounts for some of the lost runtime, but it cannot account for the whole
difference. Should we have a more detailed look at the profile together?
msg3721 (view) Author: malte Date: 2014-10-08.17:40:59
It makes a certain amount of sense that this causes overhead, but it's
surprising that it's so large and that it varies so much from task to task
(little difference for some tasks, more than five times the runtime for others).
Perhaps there's something special about these airport tasks, such as a high rate
of duplicates.
msg3720 (view) Author: florian Date: 2014-10-08.17:34:34
I ran callgrind on airport/p09 but did not have a close look at the results. On
first glance it looks like pack/unpack just takes too much time. With the blind
heuristic, we unpack every value even though we do not look it during heuristic
calculation.
msg3717 (view) Author: florian Date: 2014-10-08.13:04:42
It might take me a while to get to this but I'll look at the profile for the
Airport tasks.

The code is in my bitbucket repository. Merging in the new default should have
updated the existing pull request, so it now shows the diff of the versions we
compared:
https://bitbucket.org/flogo/downward-issues-issue348/pull-request/1/always-unpack-state/diff
msg3713 (view) Author: malte Date: 2014-10-08.12:02:03
The slowdown in, for example, some Airport tasks with blind search is quite
large. Perhaps you can profile one of these? I'd be interested in having a look
at the code, ideally a comparison of the two versions compared in the
experiment. Can you set up a code review for this delta?
msg3700 (view) Author: florian Date: 2014-10-07.21:25:08
Results for the optimal suite are in (40MB):

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v3-base-issue348-v3-compare.html

Coverage and memory look good, but everything seems to run a little slower, and
in some cases a lot slower. In the last reports we also compared to the code
before issue214 (packed states), this is not done here.

What do think would be a good next step (more reports, experiments for sat,
profiling, ...)?
msg3670 (view) Author: florian Date: 2014-10-06.12:30:49
I will merge in the new code and re-run the experiments.
msg3638 (view) Author: malte Date: 2014-10-04.19:49:35
It would be good to make progress on this issue -- I think it's an important one.

Can I do anything to help get this one moving again? Also, perhaps we should run
some new experiments that give before/after numbers against the newest code
version? The original experiments are now 4-5 months old, we've had many changes
since then.
msg3407 (view) Author: johannes Date: 2014-09-03.13:23:02
This issue has implications for my numeric fast downward extension too. I
currently tag variables in preprocess with "constant" (contains the value of a
primitive numeric expression such as road length) "derived" (result of a numeric
axiom) "instrumentation" (variable never occurs in preconditions -> only
relevant to compute metrics) and "state-relevant".

The state-relevant variables have to be packed, while states that only differ in
the instrumentation variables can be considered the same state. Constants can be
stored globally once (they never change) and the handling of derived variables
has multiple possible solutions. Following florian the current solution would be
to evaluate all axioms when unpacking the state and not store them in the packed
buffer (and also do not evaluate them on the fly).
msg3188 (view) Author: florian Date: 2014-05-23.16:55:50
We still have an open issue in the code, maybe you could have a look. Its the
TODO in state_registry.cc: the question is whether we can avoid to copy the
state data by reserving uninitialized memory in the pool and then pack the state
into this memory.
(https://bitbucket.org/flogo/downward-issues-issue348/pull-request/1)

Apart from that, we wanted to have a closer look (i.e. profile) at iPDB. It
seems to be the config where the state packing has the largest negative effect.
msg3187 (view) Author: malte Date: 2014-05-23.15:55:06
Thanks! What's the next step?
msg3185 (view) Author: florian Date: 2014-05-23.15:22:07
I ran full experiments after Jendriks review for optimal domains:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v2-opt.html

The full results for satisficing domains are for the code version before the
review, but this should be fine, because we mostly changed minor things in the
review

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v1full-sat.html

Both reports show a lot of failed merge and shrink configs because the
configuration file in lab was out of date.

Finally, I tried to create a report that only shows runs with a significant
change in total time. It compares the code before issue214 to the code after
this issue. It excludes a run if it wasn't solved in both configs, or it was
solved in both configs in under 0.5 seconds. The remaining runs are only
included if their runtime difference is at least 20%.

http://ai.cs.unibas.ch/_tmp_files/pommeren/notablechangesbyrevisionreport.html#total_time
msg3180 (view) Author: florian Date: 2014-05-21.17:53:47
Argh, yes! The reports had the same name and the second one replaced the first.
Looked ok in my browser because I checked the first before uploading the second.
Let me try again:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v1-opt.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-v1-sat.html
msg3179 (view) Author: malte Date: 2014-05-21.17:49:49
Numbers look good! Did you want to give two different experiment links?
msg3178 (view) Author: florian Date: 2014-05-21.17:42:44
I did a first round of experiments for unpacking in every state and invited
Jendrik to the following pull request:

https://bitbucket.org/flogo/downward-issues-issue348/pull-request/1

If anyone else wants to have a look, let me know.

The experiments only test the domains and configs Malte suggested in msg3049. We
should run a larger set of benchmarks before the final merge.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-base-issue348-v1-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue348-base-issue348-v1-compare.html
msg3092 (view) Author: florian Date: 2014-03-28.12:00:38
Some notes from our last meeting:

As a first step, we want to completely unpack the state everywhere outside the
state registry, i.e., only pack it for storage.

As I said offline, I'll finish some other things first and will then work on
this issue if it is still open. I'll leave the issue unassigned until I actually
have time to work on it in case anyone else wants to work on it in the meantime.
msg3082 (view) Author: malte Date: 2014-03-24.13:32:26
Any volunteers to lead this? I'm happy to give advice regarding the design etc.
msg3065 (view) Author: malte Date: 2014-03-15.16:31:52
Andrew reports (in msg3061 and msg3063 of issue214) a nice speed improvement for
blind search in the tidybot domain by inlining the get/set methods for the
packed states. See msg3061 for details.

We should consider the question of inlining or not inlining in the context of
this issue, since

1) inlining speeds up access to packed states, making unpacking less necessary
2) inlining might be less beneficial if we unpack states in one go, since there
will be fewer calls to get/set.

My suspicion regarding 1) is that we will want to unpack even if we inline calls
to get/set. Regarding 2), we should measure and look at particularly bad
domains/configurations.
msg3049 (view) Author: malte Date: 2014-03-14.01:33:48
For these and the related issues on the state space, I suggest we identify a set
of domains and configurations that are particularly worth watching, and limit
our initial experiments to this smaller subset. (Or, if you prefer to run all
experiments on all benchmarks and configs, that's also fine, but I'd like to
additionally have result tables that only talk about the "interesting subset".)

I find it very hard to find problem cases in the 100 MB tables, since the
summaries don't nearly tell the whole story. For example, we know that for this
issue we mainly need to watch total time, and the total time summaries are
rather useless with a large set of configs because the set of commonly solved
instances becomes tiny. I would never have found h^CG + PSR-Large as a problem
case if I hadn't been looking for it specifically in the detailed tables.

Here are my candidates for configs to watch:
- A* + blind
- A* + h^2
- A* + ipdb
- eager_greedy + h^CG (presumably with preferred ops? whatever the config in our
satisficing experiment for issue214 was)
- LM-Cut
- BJOLP

(The last two are not really problem cases, but it's good to have something not
expected to be problematic to compare against.)

Here are my candidates for domains:
- tidybot (e.g. the opt variant)
- pegsol (e.g. opt-11, but I expect pegsol-2008 should work equally well)
- psr-large

There may be other critical cases I am missing. Ideally, I'd be interested in a
list of cases (config, instance) where total time is at least 0.5 seconds for
the old code and increases by at least 20% in the new code in the experiments of
issue214. Can someone prepare this, or if not, can someone make the properties
files for the two main experiments (opt and sat) from this issue available?
(Actually, it'd be nice to have the properties files either way.)
msg3043 (view) Author: jendrik Date: 2014-03-13.17:02:57
Rephrased title to make this issue more general. Originally this was only about the axiom evaluator.

Quoting Malte from issue214:

"""
There is one outstanding point ...: the
question if, when and where to unpack the state information. We already have
issue348 related to the axiom evaluator that discusses this point, but this
should not be limited to the axiom evaluator, as our results for heuristics that
access state information many times show.

I suggest we [...] deal with it soon, either straight
away or after issue420, which should be quick. We can take the information from
msg2994 and msg3004 to identify the most interesting configurations and
benchmarks for this, and msg3001 has some additional considerations regarding
IDA* etc. that might help guide the code design for the new issue.
"""
msg3041 (view) Author: jendrik Date: 2014-03-13.01:10:08
Other related things we should take care of:

1) The initial state representation contains unevaluated axiom values.

2) Update the comment above the declaration of g_initial_state_data in globals.h 
once 1) is done.
msg3033 (view) Author: florian Date: 2014-03-06.16:10:08
After we finished issu214 and issue420, we should try to refactor the axiom
evaluator. States should be unpacked before axioms are evaluated. This would
mean that the axiom evaluator works on unpacked state data and we could move the
typedef for PackedStateEntry into the state class and make it private. We should
then also make sure that axioms are evaluated on the global variable holding the
initial state data. Currently this is done only when the initial state object is
created by the state registry, which is confusing because the global variable
name is not what it seems.
msg2327 (view) Author: florian Date: 2012-08-30.13:28:12
From Malte's comment in Rietveld:
> The way that the axiom evaluator accesses the state at the moment will
probably need some additional thought at some point.

At the moment the evaluator accesses the State class directly and changes the
state variables of it. In the patch for issue344 this is changed to a solution
where the evaluation works directly on the state_var_t array and is called in
the constructor for successor states and in the named constructor for the
initial state.
History
Date User Action Args
2015-10-29 14:02:12jendriksetsummary: TODOs (incomplete): 1. Check that goalcount heuristic benefits from the changes in this issue (converting GlobalStates to States made it slower, see issue554).
2015-07-07 22:20:58floriansetnosy: + gabi
messages: + msg4337
2014-10-09 10:48:52maltesetmessages: + msg3741
2014-10-08 20:54:46floriansetmessages: + msg3728
2014-10-08 17:40:59maltesetmessages: + msg3721
2014-10-08 17:34:50floriansetfiles: + callgrind.airport.p09.patched
2014-10-08 17:34:34floriansetfiles: + callgrind.airport.p09.default
messages: + msg3720
2014-10-08 13:04:42floriansetmessages: + msg3717
2014-10-08 12:02:03maltesetmessages: + msg3713
2014-10-07 21:25:09floriansetmessages: + msg3700
2014-10-06 12:30:49floriansetmessages: + msg3670
2014-10-04 19:49:35maltesetmessages: + msg3638
2014-09-03 13:23:02johannessetnosy: + johannes
messages: + msg3407
2014-05-23 16:55:50floriansetmessages: + msg3188
2014-05-23 15:55:06maltesetmessages: + msg3187
2014-05-23 15:22:07floriansetmessages: + msg3185
2014-05-21 17:53:47floriansetmessages: + msg3180
2014-05-21 17:49:49maltesetmessages: + msg3179
2014-05-21 17:42:44floriansetmessages: + msg3178
2014-05-21 14:46:09floriansetassignedto: florian
2014-03-28 12:00:38floriansetmessages: + msg3092
2014-03-24 13:32:26maltesetmessages: + msg3082
2014-03-15 16:31:52maltesetmessages: + msg3065
2014-03-15 16:27:05andrew.colessetnosy: + andrew.coles
2014-03-14 01:33:48maltesetmessages: + msg3049
2014-03-13 17:02:57jendriksetmessages: + msg3043
title: Rethink the way that the axiom evaluator accesses the state -> Rethink how and when to unpack states
2014-03-13 01:10:08jendriksetnosy: + jendrik
messages: + msg3041
2014-03-06 16:10:08floriansetstatus: unread -> chatting
messages: + msg3033
2012-08-30 13:28:12floriancreate