Issue688

Title replace GlobalOperator* by operator ids in successor generator and for preferred operators
Priority feature Status resolved
Superseder Nosy List florian, haz, jendrik, malte
Assigned To florian Keywords
Optional summary

Created on 2016-11-23.17:40:05 by jendrik, last changed by florian.

Files
File name Uploaded Type Edit Remove
insert_pushback.cpp florian, 2017-01-09.17:49:48 text/x-c++src
v2-base-callgrind.out florian, 2017-01-09.15:59:07 application/octet-stream
v2-callgrind.out florian, 2017-01-09.15:59:13 application/octet-stream
Messages
msg6255 (view) Author: florian Date: 2017-04-27.20:43:56
Thanks! I did the changes you suggested and merged.
msg6252 (view) Author: malte Date: 2017-04-27.18:41:56
Done with this round of comments.
msg6071 (view) Author: florian Date: 2017-01-11.15:33:08
After some debugging, I was able to trace the different number of evaluations
that we saw with lama-first back to the landmark code. In the task I used to
reproduce this error (hiking-sat14-strips/ptesting-2-3-7) the landmark graph had
a different number of edges in both versions. Looking closer, the reasonable
orderings were missing. This is a known issue in the landmark code (issue383). I
think the reason why the behavior is reproducible within one revision but
non-deterministic across revisions is that it depends on the order of landmarks
which in turn depends on their address in memory (e.g., within a
set<LandmarkNode *>).
msg6070 (view) Author: florian Date: 2017-01-11.13:30:10
I introduced a class OperatorID and used it where axioms are not allowed. Since
this issue is about the successor generator and preferred operators, this is
pretty much in all places that we touched so far. The class ActionID was no
longer used afterwards, so I removed it. I suggest we add it once we work on a
place where it is useful (in the landmark code or the DTG classes maybe).
msg6068 (view) Author: florian Date: 2017-01-11.10:49:12
> One thing
> we could do is to pay the price of the extra successor generator for now, but
> revisit this later as we tackle the more general problem of how to manage, cache
> and reuse expensive data structures like axiom evaluators, successor generators
> and causal graphs globally. I'm not sure if there is already an issue for this,
> but it should be mentioned somewhere.

This is mentioned in our long term plans
(http://www.fast-downward.org/ForDevelopers/LongTermPlans) and in issue564.
msg6067 (view) Author: florian Date: 2017-01-11.10:44:29
Reducing the memory overhead of the successor generator is now issue705. I'd be
interested in working on it, if I find some time.
msg6064 (view) Author: malte Date: 2017-01-10.21:59:07
> By the way, I'm surprised that a successor generator for logistics98 task #25
> should take so much memory. There are roughly 40000 operators, and Logistics
> operators should have either 1 or 2 preconditions. The successor generator size
> should scale roughly with the total number of preconditions, so less than 80000
> "units" in this case, and 100 MiB for this sounds a lot.

I looked into this. The thing I missed is that every internal node of the
successor generator stores a pointer for every value of the tested variable,
even if there are no operators that have this value as a precondition. This
could be avoided by using a hash map instead of a vector to access the children,
and then we would indeed be linear in the total number of preconditions (plus
the number of actions without preconditions), but of course using hash maps has
other disadvantages.

A good compromise could be an "adaptive" implementation that uses vectors when
they are densely populated and something else otherwise. This could fit into the
current framework easily because we already have polymorphic node types. This
could be taken further by e.g. having dedicated node types that just add an
operator or set of operators to the result set and then chain to another node,
which would let us avoid lots of empty or tiny vectors for "immediate_operators"
and friends.

With the current implementation we can be a factor of "max variable domain size"
worse than the big-O memory usage we would like, and Logistics is one of the
domains where variable domains can get very large. There are 29 variables with a
domain size of 13 (representing trucks), 13 variables with a domain size of 27
(representing planes), and 18 variables with a domain size of 393 (representing
packages).

Creating the successor generator indeed increases peak memory usage by ~100 MiB,
so this is roughly the amount of space that is actually used by the successor
generator including memory management overhead. ("Roughly" because just
measuring peak memory increase is a somewhat dangerous way to measure this,
especially if a lot of memory has been released recently.)

I added some code with some rough calculations to measure how much memory is
used for what. My numbers only add up to 84 MB, so are somewhat inaccurate,
which I assume is because I didn't take into account memory padding or vector
overallocation (although I hope we use reserve to avoid the latter), and I
estimated the memory management overhead somewhat optimistically as 8 bytes per
call to "new".

For tasks like these, empty successor generators and pointers to them are
responsible for the bulk of the memory usage. With 64-bits, this will get worse,
as all pointers are double the size. If we're unlucky, memory usage will almost
double with a 64-bit compile. Perhaps it's not quite as bad as that because
memory management overhead doesn't double. [Addendum: I have now tried it out
with a 64-bit compile, and memory usage indeed roughly doubles to ~206 MB for
each successor generator.]

Here are some numbers I gathered for this Logistics task (32-bit mode). This is
for the second successor generator created, but all numbers other than the peak
memory statistics in the first two lines are identical for the first one.

MALTE: Constructing successor generator... [t=1.0473s, 138584 KB]
MALTE: Done constructing successor generator... [t=1.45588s, 241480 KB]
MALTE: SG size estimates: total: 83767368
MALTE: SG size estimates: object overhead: 62139348
MALTE: SG size estimates: operators: 527192
MALTE: SG size estimates: switch var: 110776
MALTE: SG size estimates: generator for value: 20934664
MALTE: SG size estimates: default generator: 55388
MALTE: SG object counts: switches: 13847
MALTE: SG object counts: leaves: 26208
MALTE: SG object counts: empty: 5138224

If the estimates are accurate, then roughly 3/4 of the overall memory is used
for empty successor generators. This could be cut down to 0 by using nullptr
instead of millions of identical empty generators, or to a small constant number
of bytes by reusing the same empty generator, similar to the reuse of nodes in
the look-up trees for Cartesian abstractions. Of the remaining 1/4 of memory
used, more than 90% is taken up by the "generator for value" vectors. By storing
this information more cleverly and perhaps also being a bit cleverer about
memory allocation (we could reduce overhead to close to zero by using an arena
allocator), it should be possible to reduce overall successor generator memory
usage by roughly 90% for tasks like this one.

Of course, all this is only very tangentially related to this issue. I suggest
that we open a new issue about improving the memory usage of SuccessorGenerator.
Given the 64-bit-unfriendliness of the current implementation and the fact that
it is such a memory hog, perhaps this should also be referenced from the 64-bit
issue(s) dealing with memory usage. 

Anyone want to open the issue? (This does not imply volunteering to work on it. ;-))
Once we have such an issue, I can attach the patch that created the output above.
msg6063 (view) Author: malte Date: 2017-01-10.19:07:00
> 1) The successor generator in the landmark code was not meant to be a temporary
> hack. If I understood the old code correctly it had a bug where it used to
> global successor generator to generate operators and then used their ids as an
> index to the local task's operators. This probably requires some more thought.

I understand the motivation, but we should not pay a 100 MiB memory cost
unnecessarily (if this is indeed solely due to the additional successor
generator). The least we can do is not to create it in cases where we don't use
it. (This won't help lama-first, which does use it.)

I'm not sure if a good solution will be easy to come by. In that case, one thing
we could do is to pay the price of the extra successor generator for now, but
revisit this later as we tackle the more general problem of how to manage, cache
and reuse expensive data structures like axiom evaluators, successor generators
and causal graphs globally. I'm not sure if there is already an issue for this,
but it should be mentioned somewhere. (It was one of the items on our "Unser
Planer soll schöner werden" whiteboard, but I don't recall how it was phrased
there.)

> 2) If there are any hash sets that used GlobalOperator* before and now use
> ActionIDs (or ints), this could explain the different behavior.

Given that the change only happens in lama-first, it seems to be related to the
landmark code. There are only about a dozen changed lines in the landmarks code,
and no hash tables. The change is rather straight-forward, and I think the only
way in which we could get different performance is if the new successor
generator (based on the TaskProxy) generates the operators in a different order
from the old one (g_successor_generator). Is this the case? In this case the
preferred operators for lmcount can be generated in a different order, which can
in turn affect search results.

I also had a look at the reproducibility of results for lama-first. On the same
compile, they seem to be reproducible. I ran the same task from hiking-sat14 a
hundred times each with the v3 and v3-base revisions. With v3-base I got 42314
evaluations in every run and with v3 I got 43120 evaluations in every run.


By the way, I'm suprised that a successor generator for logistics98 task #25
should take so much memory. There are roughly 40000 operators, and Logistics
operators should have either 1 or 2 preconditions. The successor generator size
should scale roughly with the total number of preconditions, so less than 80000
"units" in this case, and 100 MiB for this sounds a lot. The successor generator
nodes contain many vectors etc. that cause overhead, but *this* much? I can
confirm the result locally. v3-base has a peak memory usage of 220608 KiB, and
v3 has a peak memory usage of 323748 KiB. Looking at the output, the ~100 MiB
difference already manifests in the output before the actual search starts, so
it is indeed part of the set-up cost. There is also a noticeable increase in
set-up time (2.22 seconds vs. 2.70 seconds until the first state has been
evaluated).
msg6061 (view) Author: florian Date: 2017-01-10.18:33:26
I just wanted to add two quick comments (we can wait until the code review to
discuss them):

1) The successor generator in the landmark code was not meant to be a temporary
hack. If I understood the old code correctly it had a bug where it used to
global successor generator to generate operators and then used their ids as an
index to the local task's operators. This probably requires some more thought.

2) If there are any hash sets that used GlobalOperator* before and now use
ActionIDs (or ints), this could explain the different behavior.
msg6058 (view) Author: malte Date: 2017-01-10.18:00:04
I had a brief look at the code, and I think it requires more review.

Generating an additional successor generator is not something we should do
lightly if it can cost 100 MiB of memory. As a temporary hack, we might consider
it, but there is no comment in the code that would suggest that this successor
generator isn't meant to stay there forever. Moreover, if saw it correctly, we
generate the successor generator even if we're never going to use it, e.g. in
admissible landmark heuristics that don't use preferred operators.

At first glance, I also saw a few other things I wasn't too sure about, such as
the ActionID::get_index() method.

This issue is already in the review queue anyway, so I would suggest deferring
the merge until it has been further reviewed.


While we're talking about the design more generally, we also still have to
address Florian's comment:

>> not every int that refers to an operator needs to become an ActionID.
>> ...
>> have both OperatorID and ActionID
>
> Won't this get confusing, because we then have three different ways to store the
> index of an operator (int, OperatorID and ActionID)? Well, we should probably
> discuss this on a case-by-case basis. 
>
> I have currently added
>   ActionID OperatorProxy::get_action_id()
> and marked
>   int OperatorProxy::get_id()
> as deprecated. Should I change this?

Let's split the question about the three different ways of referring to
operators into two separate questions:

(1) Do we want (something like) ActionID and (something like) OperatorID, or do
we want just ActionID?

(2) Should we still use int in some places?

I think here the answer to (1) is quite clear; "operator or axiom" and
"operator" are very different logical types, and it is valuable not to confuse
them. Currently, in reading old code or doing code reviews, I often don't know
whether something is allowed to be an axiom or not. It would be great to get rid
of these doubts. The choice of ActionID vs. OperatorID is different from the
choice of ActionID vs. int: the latter is a design choice where you have to
invoke taste, practicality, or whatever. The former should be an easy choice
because in any given context, only one of ActionID vs. OperatorID will be the
correct type.

Regarding (2), I guess we have to look at what the code looks like. My gut
reaction is to be wary of introducing more specialized types like
"PerOperatorInformation" (or whatever) unless we really have to. If a major
motivation for PerOperatorInformation is the inconvenience of frequently
converting back and forth between IDs and ints, then perhaps we're overusing the
IDs. But I think this is something where it's difficult to make a decision
without having looked at more code. For what it's worth, I also don't think that
the choice will become more different than in the current code. In the current
code, we have to decide between "GlobalOperator *" and "int". Switching to IDs
basically makes it a choice between "ID" and "int". So I don't think we're
making things worse in any way. (But of course that doesn't mean further
improvements cannot be made.)

Regarding OperatorProxy::get_action_id vs. OperatorProxy::get_id, I don't have
an answer without looking at the code some more, but this will hopefully be
cleared up in the code review.
msg6057 (view) Author: malte Date: 2017-01-10.17:32:11
msg5699 talks about versions of the code compiled on different platforms, not
different runs of the same executable. (For example, anything that depends on a
hash table can behave differently on different platforms or library versions.)
I'll see if I can reproduce non-deterministic behaviour. Interestingly, it seems
to affect only a few domains, but many tasks in these domains. In the hiking
domain, it affects all but two tasks.

The magnitude of the memory changes seems large in some tasks for "just" another
copy of the successor generator. I'll have a closer look.
msg6056 (view) Author: jendrik Date: 2017-01-10.17:06:52
The successor generator is only "copied" for the lmcount heuristic (for computing 
helpful actions).

The pull request indeed shows the difference between v3 and v3-base.

If I remember correctly, lama-first is not deterministic (see msg5699).
msg6055 (view) Author: malte Date: 2017-01-10.16:32:40
Actually, I also don't really understand the increased memory usage. For
example, in logistics98 task #25, the memory usage for lama-first increases from
roughly 221 MB to roughly 329 MB (accompanied by a modest increase in the number
of evaluations). It is conceivable that this is due to storing another copy of
the successor generator, as some of these Logistics tasks are huge. But then why
doesn't it affect the other configurations?

If I look at the pull request now, it will show me the differences between the
two revisions in this experiment (v3 and v3-base), right?
msg6054 (view) Author: malte Date: 2017-01-10.16:23:23
The only thing that concerns is me is the change in the number of evaluations in
lama-first. Should this happen? From the title of the issue (I did not re-read
the discussion or look at the pull request again), it sounds like this should be
a behaviour-neutral change. Do we get a different order of successors or
preferred operators than before?
msg6053 (view) Author: jendrik Date: 2017-01-10.11:51:57
Fluctuations in coverage by +/- 3 are expected, I'd say. I was surprised by the 
fact that lama-first uses noticeably more memory, but this is due to us storing a 
copy of the successor generator now. Overall, I think this looks like it can be 
merged.
msg6052 (view) Author: florian Date: 2017-01-10.11:17:00
Experiments for satisficing confings are done and the results are less clear to
me. Coverage changes by up to 3 tasks and for lama-first the number of evaluated
nodes changes. I didn't do many experiments with satisficing configs so far. Is
this expected behavior?

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue688-v3-sat-issue688-v3-base-issue688-v3-compare.html
msg6050 (view) Author: florian Date: 2017-01-09.21:23:18
Ahh, no I forgot "-O3 -DNDEBUG". With those flags I also get another factor of
10. Its interesting that with your numbers insert looks better for empty
vectors. In the experiment for optimal configurations we reclaimed the lost
performance by switching back from insert to push_back:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue688-v3-opt-issue688-v3-base-issue688-v3-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue688-v3-opt-blind-search_time.png

The unexplained error was a file system issue on the grid which we get from time
to time. It should be unrelated to the issue. Other than that, performance
actually looks slightly better than in the default branch.

Experiments for satisficing configs are still running but if they looks as good,
I think we could merge this issue.
msg6049 (view) Author: malte Date: 2017-01-09.18:29:30
Interesting! Have you compiled with -O3 and -DNDEBUG? I get much lower absolute
numbers on a rather slow machine. Also, the numbers for very small n seem to be
a bit skewed because they are computed first and the cache needs to "warm up"
initially. Plus, it may take a while for CPU frequency scaling to kick in.

If I repeat the test multiple times (in the same run), I get the following
numbers starting from the second repetition:

0	0.00116774s	0.000737738s
1	0.00231885s	0.00629627s
2	0.00375664s	0.00728213s
3	0.00522674s	0.00718928s
4	0.00664204s	0.00750449s
5	0.00843431s	0.0072071s
6	0.00981433s	0.00749877s
7	0.0116694s	0.00734878s
8	0.0137504s	0.00754392s
9	0.014328s	0.00749458s
10	0.0169009s	0.00745978s
11	0.0195376s	0.00743822s
12	0.0195321s	0.00784113s
13	0.0211988s	0.00778723s
14	0.0241689s	0.00784572s
15	0.025188s	0.00772268s
16	0.0264936s	0.00778464s
17	0.033613s	0.00773947s
18	0.0303284s	0.00801455s
19	0.0407786s	0.00800393s

That's around 10x faster than your numbers for push_back and around 30x faster
for insert. Interestingly, on my machine, insert is faster than push_back for
empty vectors, which is probably the most common case in the successor
generator. However, it is slower for 1-4 elements, which will be the next most
common case.

The real numbers will be much worse because the source vectors will be less
likely to be in cache in Fast Downward, but I guess there is no reason that this
should affect either version of the algorithm.

The real numbers will also be worse because I think we don't reuse the result
vectors in the successor generator and hence will need to reallocate
occasionally. In general, reallocation can affect push_back worse than insert
because insert will avoid reallocating twice for a single source vector. But
with our source vectors usually being very small, this is probably negligible.

The numbers are very impressive in absolute terms. Let's look at some larger
values of n, again after some initial warming up of the cache:

0	0.00121572s	0.000772686s
10	0.0211858s	0.00580211s
20	0.0489344s	0.009432s
30	0.0626462s	0.00886397s
40	0.123023s	0.00976372s
50	0.138081s	0.011145s
60	0.200289s	0.0112135s
70	0.213263s	0.0119306s
80	0.234878s	0.0123057s
90	0.26135s	0.0121723s
100	0.280911s	0.0135034s
110	0.329251s	0.0140138s
120	0.300368s	0.0141975s
130	0.423778s	0.0159464s
140	0.375172s	0.0162418s
150	0.442719s	0.0170958s
160	0.450286s	0.0177s
170	0.501583s	0.0180266s
180	0.430109s	0.0190362s
190	0.509671s	0.0195396s
200	0.502858s	0.0205814s

Once we're past ~100 elements, in the numbers for insert we finally see the
linear scaling with n that one would expect. Let's look at the numbers for n =
200: 0.02 seconds for a million insertions of 200 elements each. That's 0.02
seconds / (10^6 * 200 element insertions) = 10^{-10} seconds per element
insertion. This machine has 3.5 GHz, which means that we end up with roughly 3
insertions per CPU cycle. That's highly impressive.
msg6047 (view) Author: florian Date: 2017-01-09.17:49:48
I was interested to see the difference between push_back and insert, so I wrote
a micro benchmark for it (attached; depends on Fast Downward's timer class).
Looks like a for loop over push_back calls is faster if it loops over <= 11
items but slower if it loops over more.

#items	push_back	insert
0	0.0374227s	0.0971615s
1	0.0539942s	0.259376s
2	0.0733161s	0.25943s
3	0.0925464s	0.258908s
4	0.111984s	0.259604s
5	0.144296s	0.259498s
6	0.162998s	0.259484s
7	0.183151s	0.259243s
8	0.202124s	0.259941s
9	0.221983s	0.260338s
10	0.240904s	0.260577s
11	0.260199s	0.260491s
12	0.280887s	0.260567s
13	0.298408s	0.25992s
14	0.318356s	0.260425s
15	0.336865s	0.259996s
16	0.355382s	0.259523s
17	0.3751s 	0.260194s
18	0.395153s	0.260079s
19	0.414572s	0.260441s
msg6045 (view) Author: florian Date: 2017-01-09.16:27:14
That only applies to v1. After we merged issue629, we could remove the temporary
vector and the virtual method, so v2 and the experiment that is running now are
not affected by this.
msg6044 (view) Author: malte Date: 2017-01-09.16:22:56
Florian, perhaps before you run more experiments, can you look at this comment
by Jendrik?

> Are you sure about the pruning code? In v1 "PruningMethod::prune_operators(const
> GlobalState &state, vector<int> &op_ids)" is called,
> which delegates to the virtual "PruningMethod::prune_operators(const
GlobalState &,
> std::vector<const GlobalOperator *> &)", which does
> nothing in the case of NullPruningMethod. However the non-virtual method creates a
> temporary vector containing all applicable operators.
msg6043 (view) Author: malte Date: 2017-01-09.16:18:34
I'm surprised that this would make such a large difference, but I guess things
can add up. I found this related link (though not for vectors):

http://stackoverflow.com/questions/38483422/push-back-faster-than-insert
msg6042 (view) Author: florian Date: 2017-01-09.15:59:07
Actually, yes. I was expecting too much noise to be able to pinpoint a
difference of 5%, but there was only one path in the call graph that got slower
in the new version: the one going to the successor generator inserting action
ids into the vector of applicable operators. In msg6035 I wrote that this was
only called when building the generator but this is not the case; we also call
it for each state.

From looking at the numbers of calls, I suspect that insert is slower than
push_back here, because most of the vectors are empty. If the overhead of
inserting an empty vector is larger than the overhead of running over an empty
for loop, this could make sense. Both of these overheads should be close to
zero, but with so many calls it could make a difference. I'll attach the
callgrind files in case some is interested.

I'm now repeating the experiment with a push_back instead of insert again.
msg6041 (view) Author: haz Date: 2017-01-09.15:33:07
Awesome stuff! Any insight on the slowdown show up in the difference between call 
graphs then?
msg6038 (view) Author: jendrik Date: 2017-01-09.11:10:44
Thanks, Christian! Malte recently wrote up how to install qcachegrind, a kcachegrind alternative that 
doesn't depend on the KDE packages: http://www.fast-downward.org/ForDevelopers/Profiling
msg6037 (view) Author: haz Date: 2017-01-08.18:16:21
You guys could always valgrind the thing on a properly-sized-problem* that 
demonstrates the slow-down. Example command line I've been using with FD:
- https://goo.gl/PqfLiQ

The following few lines let you view the output (if you're willing to install 
kde components, then just use "kcachegrind callgrind.out.*" to view the output).

*: Shouldn't be so big that you can't solve it with the valgrind overhead, nor 
so small that the parsing dominates.
msg6036 (view) Author: jendrik Date: 2017-01-08.17:57:27
Are you sure about the pruning code? In v1 "PruningMethod::prune_operators(const GlobalState &state, vector<int> &op_ids)" is called, 
which delegates to the virtual "PruningMethod::prune_operators(const GlobalState &, std::vector<const GlobalOperator *> &)", which does 
nothing in the case of NullPruningMethod. However the non-virtual method creates a temporary vector containing all applicable operators.

The vector-appending happens while generating successor operators, not while generating the successor generator, but I doubt this could 
explain the difference in runtime.

I think it is very likely that allocating a vector in each state could be responsible for the slowdown. However, the question remains 
why v2 is still slower than its ancestor on default.
msg6035 (view) Author: florian Date: 2017-01-08.12:58:20
Thanks. 

> I would have guessed that the overhead stems from creating the temporary vector
> in the pruning code, but then the overhead should have gone away in v2.

Yes, the temporary is gone now, but that code wasn't used in either version. We
are using plain blind search without pruning here and NullPruningMethod
overrides the main call to the pruning code with an empty method.

Likewise, all of the changes concerning preferred operators should not affect
this. (I thought about OrderedSet<const GlobalOperator *> and OrderedSet<int>
behaving differently, but that also wouldn't matter for blind search).

> The only other possible culprit I can imagine is that v1 uses
> std::vector::insert() for appending vectors to vectors, while v1-base uses 
> push_back().

The slow-down seems to be more or less the same for short and long runtimes, so
I would be surprised if the reason is in the preprocessing (building the
successor generator). I would suspect something that happens at least once per
expansion. We have an additional array access to get from the ID to the
operator, but that should be super fast, shouldn't it?

> Maybe some profiling could help?

With an effect size this small, I doubt we would get useful results, but feel
free to try.


I'm out of ideas on this one.
msg6032 (view) Author: jendrik Date: 2017-01-08.00:06:10
Sure, here you go:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue688-v1-opt-blind-search_time.png

It seems the slowdown was introduced by v1 already. Here is the diff between v1 and v1-base:

https://bitbucket.org/jendrikseipp/downward/branches/compare/issue688-v1..issue688-v1-base#diff

I would have guessed that the overhead stems from creating the temporary vector in the pruning 
code, but then the overhead should have gone away in v2. The only other possible culprit I can 
imagine is that v1 uses std::vector::insert() for appending vectors to vectors, while v1-base uses 
push_back(). Maybe some profiling could help?
msg6031 (view) Author: florian Date: 2017-01-07.13:37:42
The experiments for the current version show a slight negative trend (~5%) in
search_time. Apologies for all the unexplained errors in the report. I didn't
have VAL on my PATH, so every run failed in the last step. This should not
affect the results, though.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue688-v2-opt-issue688-v2-base-issue688-v2-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue688-v2-opt-blind-search_time.png

Jendrik, could you create the same scatter plot for your experiment
(v1-opt/blind), so we can see where the slow-down comes from?
msg6028 (view) Author: florian Date: 2017-01-06.03:01:21
> In any case, I'd prefer introducing such a class in a separate step to
> introducing (and starting to use) ActionID.

We can introduce it when/if we use ActionID in the pruning code. There it would
be most useful. Can you think of a good name?

> not every int that refers to an operator needs to become an ActionID.
> ...
> have both OperatorID and ActionID

Won't this get confusing, because we then have three different ways to store the
index of an operator (int, OperatorID and ActionID)? Well, we should probably
discuss this on a case-by-case basis. 

I have currently added
  ActionID OperatorProxy::get_action_id()
and marked
  int OperatorProxy::get_id()
as deprecated. Should I change this?
msg6026 (view) Author: malte Date: 2017-01-06.01:14:25
> I suggest we merge this issue first and then open several smaller issues to use
> ActionID in other places where it makes sense. If there are no responses to the
> email on downward-dev until tomorrow, I'll repeat Jendrik's experiment with the
> new code.

Sounds good to me.

For what it's worth, in places where axioms are forbidden, we might want to make
this explicit by using an OperatorID class instead that always refers to an
operator, never to an axiom. (That is, have both OperatorID and ActionID.)
msg6025 (view) Author: malte Date: 2017-01-06.01:09:15
I can see that a class like you describe in msg6023 could be useful, but I'm not
sure I'd like it called PerOperatorInformation because it would be so different
in motivation, use and purpose to PerStateInformation, and I'd rather avoid the
analogy in name. PerStateInformation is very much about the automatic growing
when new states are explored, the sequential numbering of states as they are
generated, the subscriber mechanism etc.

In any case, I'd prefer introducing such a class in a separate step to
introducing (and starting to use) ActionID.

> I backed out of the commit in the pruning code. I still think its a good idea
> but found many other places like it (dtgs, landmarks, m&s, operator counting and
> others). Looking for uses of OperatorProxy::get_id is a good way to find them.

I don't think we need to/should start using ActionID in every place where we
currently use ints. I think it's most important in places where the information
is more long-lived and transcends barriers between components, e.g. in the
result that the pruning code returns to the search code.

That's not an objection to introducing ActionID. Rather, it's a warning that
perhaps not every int that refers to an operator needs to become an ActionID.
msg6024 (view) Author: florian Date: 2017-01-05.23:31:20
I backed out of the commit in the pruning code. I still think its a good idea
but found many other places like it (dtgs, landmarks, m&s, operator counting and
others). Looking for uses of OperatorProxy::get_id is a good way to find them.

I suggest we merge this issue first and then open several smaller issues to use
ActionID in other places where it makes sense. If there are no responses to the
email on downward-dev until tomorrow, I'll repeat Jendrik's experiment with the
new code.
msg6023 (view) Author: florian Date: 2017-01-05.22:41:03
I went through the pruning code replacing ints with ActionIDs and there is a lot
of information stored for each operator or each operator pair. I wrote
id.get_index() a lot, so let me bring up PerOperatorInformation again: I didn't
want to reimplement PerStateInformation with its nested vectors and so on. What
I had in mind was a wrapper for vector<T> with an operator[](ActionID) and a
constructor that takes an OperatorsProxy to initialize the vectors size. In the
pruning code at least, this would make the code more readable.

Maybe you also had something else in mind instead of using id.get_index()?
msg6022 (view) Author: florian Date: 2017-01-05.20:26:46
I see. So its fine for a user to store a vector<T> and index it with something
like action_id.get_index()? So for example, I could store and access an int for
every operator with something like this:

ActionID id; // from a parameter
vector<int> data(task.get_operators().size());
assert(id.is_operator());
assert(utils::in_bounds(id.get_index(), data));
data[id.get_index()];

I used get_index instead of get_value to hide the implementation detail that we
use negative values for axioms.
msg6021 (view) Author: malte Date: 2017-01-05.20:07:19
Also, in the context of a planning task, every operator/axiom has a clearly
identifiable ID that is a natural part of it, just like every variable has a
number. It makes sense to speak of the 5th action or the 3rd variable. But it
doesn't really make that much sense to speak of "the 5th state" and things like
this; the state with state ID is 5 is just the 5th state that happens to be
generated for a given StateRegistry. So states and actions aren't precise
analogs here.

The main motivation for both StateID and ActionID is that we want something more
type-safe than an integer, but beyond this, there are a good number of
differences between the two.
msg6020 (view) Author: malte Date: 2017-01-05.20:01:52
For now, I would say it's neither lower level nor higher level but unrelated. So
for now I'd leave this code alone.

> Also, do we want PerOperatorInformation, PerAxiomInformation, and/or
> PerActionInformation, so we can keep access to the value of an ActionID limited?

No, I don't think so. The main reason why we have PerStateInformation rather
than using something like unordered_map<State, ValueType> everywhere is because
hashing and storing states is expensive and there are exponentially many of
them. But there aren't exponentially many actions.
msg6019 (view) Author: florian Date: 2017-01-05.19:53:20
I was wondering how deep we want to go with this change. AbstractTask has a lot
of methods taking an integer (id) and a bool (is_axiom) to identify an action.
Should we use ActionID in those places or is AbstractTask on a lower level?

Also, do we want PerOperatorInformation, PerAxiomInformation, and/or
PerActionInformation, so we can keep access to the value of an ActionID limited?
msg6017 (view) Author: jendrik Date: 2017-01-05.18:29:29
Florian, feel free to hijack the issue :-)
msg6016 (view) Author: malte Date: 2017-01-05.18:19:42
Small note: the state ID class is called StateID (capital D), and I suggest
staying with this policy (OperatorID, or something-else-ID, see following).

We also discussed the question whether we can find a suitable word to
distinguish "operator" from "axiom or operator", but so far we haven't come up
with an idea.

One possible word is "action". It doesn't shout out "operator or axiom", but if
we make it a convention we follow everywhere, we could probably get used to it.
I'd prefer it over OperatorOrAxiomID or something like this. We currently use
"action" to mean "operator" in a few places in the code, but it would be good to
clean this up anyway and consistently say "operator" where we mean "operator". I
think either choice would be better with the current ambiguity where "operator"
can either mean "operator" or "operator or axiom", depending on context.

As whatever word we choose might end up in many places in the code, it is
perhaps a matter we should discuss on downward-dev.
msg6014 (view) Author: florian Date: 2017-01-05.18:07:56
Malte and I discussed this offline. We want to switch to using a new class for
operator ids (similar to StateId for registered states). This will hopefully
give back some type safety while still being as efficient as using ints. We can
then also use this class to differentiate between axioms and operators (see
msg5640).
Jendrik, if you don't mind, I'll hijack this issue and start with this.
msg5958 (view) Author: florian Date: 2016-12-21.14:54:58
As we discussed, I put the method back in.
msg5952 (view) Author: florian Date: 2016-12-20.23:58:21
For issue700 it would be useful to merge this one first but leaving the existing
mechanism for translating operators in place. Jendrik, could you add this back
in and update the pull request, please? We could also rename the method from
  virtual const GlobalOperator *get_global_operator(int index, bool is_axiom)
to 
  virtual int get_global_operator_id(int index, bool is_axiom).
msg5925 (view) Author: jendrik Date: 2016-12-20.13:35:30
Thanks for the comments! I think we need to discuss how to translate operator ids 
between tasks in person, before this can move ahead.

Regarding the expansions: I don't see how the change can affect the number of 
expansions for lama-first. It seems to be something related to the landmarks.
msg5908 (view) Author: florian Date: 2016-12-19.17:11:38
I had a look at the code and left some comments. The main issue is that we still
need the function to translate local to global operators (or operator ids).

For the experiments: do you know why the number of expansions changes with
lama-first (e.g., in grid). Also, it looks like there is a negative trend in
runtime. Could you add some relative scatter plots?
msg5898 (view) Author: jendrik Date: 2016-12-17.00:38:05
Good catch. Let's discuss the score code off the tracker since it's not related to 
the issue.
msg5894 (view) Author: malte Date: 2016-12-16.15:59:43
I hope I can get far enough down the FIFO during the sprint. In the meantime,
Jendrik, a small bug report for the normalized scores: the scores are often
reported as negative zero (-0.0000), which I think looks a bit odd since they
cannot be negative. Perhaps a numerical stability issue? If you can point me to
the code where they are computed, perhaps I can give some advice.
msg5893 (view) Author: jendrik Date: 2016-12-16.09:52:20
To speed things up, I went ahead and ran some experiments:

http://ai.cs.unibas.ch/_tmp_files/seipp/issue688-v1-opt-issue688-v1-base-issue688-v1-compare.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue688-v1-sat-issue688-v1-base-issue688-v1-compare.html

The tables use the revised comparative report from lab 2. All scores (quality, time and memory 
scores) are now normalized to the range [0, 1] and by default we only report the sum of their sums 
in aggregated tables.

I think the results show the usual fluctuations of issue experiments, but nothing that should stop 
us from merging this.
msg5825 (view) Author: jendrik Date: 2016-11-28.12:15:02
As suggested by Florian, I added it to the review queue.
msg5824 (view) Author: jendrik Date: 2016-11-26.13:59:06
Unfortunately, Florian probably won't be able to find time for this.

Malte, do you think you could have a quick look, or shall I put it in the 
reviewing queue?
msg5819 (view) Author: jendrik Date: 2016-11-23.20:52:47
I agree that we should merge issue629 and issue688 in close succession, but the 
order doesn't matter, I think.

While implementing, I noticed that the datastructure for preferred and applicable 
operators should be changed at the same time. The issue title reflects this now. 

I have created a pull request at 
https://bitbucket.org/jendrikseipp/downward/pull-requests/62 .

Florian, could you have a look, please?
msg5818 (view) Author: florian Date: 2016-11-23.18:29:19
We should wait with this at least until the pruning uses ids (issue629).
msg5817 (view) Author: jendrik Date: 2016-11-23.17:40:05
This is part of issue509.
History
Date User Action Args
2017-04-27 20:43:56floriansetstatus: reviewing -> resolved
messages: + msg6255
2017-04-27 18:41:56maltesetmessages: + msg6252
2017-01-11 15:33:08floriansetmessages: + msg6071
2017-01-11 13:30:10floriansetmessages: + msg6070
2017-01-11 10:49:12floriansetmessages: + msg6068
2017-01-11 10:44:29floriansetmessages: + msg6067
2017-01-10 21:59:07maltesetmessages: + msg6064
2017-01-10 19:07:00maltesetmessages: + msg6063
2017-01-10 18:33:26floriansetmessages: + msg6061
2017-01-10 18:00:04maltesetmessages: + msg6058
2017-01-10 17:32:11maltesetmessages: + msg6057
2017-01-10 17:06:52jendriksetmessages: + msg6056
2017-01-10 16:32:40maltesetmessages: + msg6055
2017-01-10 16:23:23maltesetmessages: + msg6054
2017-01-10 11:51:57jendriksetmessages: + msg6053
2017-01-10 11:17:00floriansetmessages: + msg6052
2017-01-09 21:23:18floriansetmessages: + msg6050
2017-01-09 18:29:30maltesetmessages: + msg6049
2017-01-09 17:49:48floriansetfiles: + insert_pushback.cpp
messages: + msg6047
2017-01-09 16:27:14floriansetmessages: + msg6045
2017-01-09 16:22:56maltesetmessages: + msg6044
2017-01-09 16:18:35maltesetmessages: + msg6043
2017-01-09 15:59:13floriansetfiles: + v2-callgrind.out
2017-01-09 15:59:07floriansetfiles: + v2-base-callgrind.out
messages: + msg6042
2017-01-09 15:33:07hazsetmessages: + msg6041
2017-01-09 11:10:44jendriksetmessages: + msg6038
2017-01-08 18:16:21hazsetmessages: + msg6037
2017-01-08 17:57:27jendriksetmessages: + msg6036
2017-01-08 12:58:20floriansetmessages: + msg6035
2017-01-08 00:06:10jendriksetmessages: + msg6032
2017-01-07 13:37:42floriansetmessages: + msg6031
2017-01-06 03:01:21floriansetmessages: + msg6028
2017-01-06 01:14:25maltesetmessages: + msg6026
2017-01-06 01:09:15maltesetmessages: + msg6025
2017-01-05 23:31:20floriansetmessages: + msg6024
2017-01-05 22:41:03floriansetmessages: + msg6023
2017-01-05 20:26:46floriansetmessages: + msg6022
2017-01-05 20:07:19maltesetmessages: + msg6021
2017-01-05 20:01:52maltesetmessages: + msg6020
2017-01-05 19:53:20floriansetmessages: + msg6019
2017-01-05 18:46:16floriansetassignedto: jendrik -> florian
2017-01-05 18:29:29jendriksetmessages: + msg6017
2017-01-05 18:19:42maltesetmessages: + msg6016
2017-01-05 18:07:56floriansetmessages: + msg6014
2016-12-28 15:59:26hazsetnosy: + haz
2016-12-21 14:54:58floriansetmessages: + msg5958
2016-12-20 23:58:21floriansetmessages: + msg5952
2016-12-20 13:35:30jendriksetmessages: + msg5925
2016-12-19 17:11:38floriansetmessages: + msg5908
2016-12-17 00:38:05jendriksetmessages: + msg5898
2016-12-16 15:59:43maltesetmessages: + msg5894
2016-12-16 09:52:20jendriksetmessages: + msg5893
2016-11-28 12:15:02jendriksetmessages: + msg5825
2016-11-26 13:59:06jendriksetstatus: chatting -> reviewing
messages: + msg5824
2016-11-23 20:52:47jendriksetmessages: + msg5819
title: replace GlobalOperator* by operator ids in successor generator -> replace GlobalOperator* by operator ids in successor generator and for preferred operators
2016-11-23 18:29:19floriansetstatus: unread -> chatting
messages: + msg5818
2016-11-23 18:04:07floriansetnosy: + florian
2016-11-23 17:40:05jendrikcreate