msg7603 (view) |
Author: malte |
Date: 2018-09-19.19:21:15 |
|
Excellent! Kill the beast!
|
msg7588 (view) |
Author: florian |
Date: 2018-09-19.16:41:48 |
|
I merged revision v2 and ignored the debug branch (it still lives in my
bitbucket repository if we need it for something).
|
msg7583 (view) |
Author: malte |
Date: 2018-09-19.16:16:59 |
|
> I have no doubt that the decrease in performance is significant. I meant
> unlucky with respect to the number of additional timeouts which is much lower
> in the latest experiments on the same revision.
I am not worried about that number for experiments like these, I'd always look
at score_total_time instead for experiments like these, for which timeouts don't
matter because the score converges to 0 anyway as the time limit is reached.
(Well, at least for 30 minute timeouts.)
> Anyway, I agree that this is the compiler's fault. Should I merge revision v2?
> It has the same code as debug8 but without the history of debug commits and
> experiments. It also still returns "const TaskProxy &" instead of "TaskProxy".
Pick what you like best.
|
msg7581 (view) |
Author: florian |
Date: 2018-09-19.16:10:20 |
|
I have no doubt that the decrease in performance is significant. I meant unlucky
with respect to the number of additional timeouts which is much lower in the
latest experiments on the same revision.
Anyway, I agree that this is the compiler's fault. Should I merge revision v2?
It has the same code as debug8 but without the history of debug commits and
experiments. It also still returns "const TaskProxy &" instead of "TaskProxy".
|
msg7573 (view) |
Author: malte |
Date: 2018-09-19.15:03:35 |
|
> May the variance is just very high and we got unlucky in the first experiment.
I don't think so, there is clearly a significant change here not explainable by
noise. For example, looking at the 32-bit results for score_total_time in
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue791-debug-opt4-issue791-debugbase-issue791-debug8-compare.html
we have a negative change in 43 domains and a positive change in only 7 domains.
Generally speaking, I recommend looking at per-domain score_total_time results
in cases like these to get a good overall picture and cancel out some noise (by
aggregating tasks within one domain). If I calculated this correctly, 43:7 is
significant at a p level less than 0.000001.
The positive news is that in the 64-bit, things look quite a bit more random,
with 36 negative vs. 14 positive changes. This is still highly significant, but
at a p level of "only" around 0.025. Based on the 64-bit results and the fact
that the "bad" diff clearly looks like it seems to be the compiler's fault
rather than ours, I think we should accept these performance numbers.
|
msg7566 (view) |
Author: florian |
Date: 2018-09-19.13:46:04 |
|
I have more strange results from the 64bit experiment. I added more debug
versions to further pinpoint the change that influences performance.
debugbase: current version of default branch
debug5: initializes StateRegistry with a task (not a proxy) but then stores both
the task and the proxy and has accessors for both. The unpack method uses
get_task. All other changes of this issue are also contained in this version.
debug6: exactly like debug5 but the unpack method uses get_task_proxy.
debug7: removes the member "task" and the getter for it.
debug8: changes the ctor to use a proxy instead of the pointer
Between debugbase and debug7 the code only got faster (both in 32 bit and in 64
bit builds).
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue791-debug-opt4-issue791-debugbase-issue791-debug7-compare.html
From debug7 to debug8 performance dropped (the drop is less pronounced in the 64
bit build but still noticable).
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue791-debug-opt4-issue791-debug7-issue791-debug8-compare.html
The diff is tiny and doesn't look critical at all:
https://bitbucket.org/FlorianPommerening/downward-issues/compare/issue791-debug8..issue791-debug7#diff
Interestingly, the baseline also has a lot of noise. While the first experiment
had 530 timeouts in the release32 build (increasing to 539 after the patch),
this experiment has 538 (increasing to 540). Overall, the change from base to
debug8 is not that dramatic in this experiment, so we could just ignore it and
merge the issue. May the variance is just very high and we got unlucky in the
first experiment.
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue791-debug-opt4-issue791-debugbase-issue791-debug8-compare.html
|
msg7555 (view) |
Author: malte |
Date: 2018-09-19.09:11:42 |
|
Compiler hiccups aside, the only thing in the diff that might plausibly have an
impact on performance is the existence of the attribute StateRegistry::task,
which affects how the state registry is laid out in memory. (Normally, of course
we would not expect an unnecessary attribute to be useful, but it can
conceivably happen due to cache effects.)
|
msg7554 (view) |
Author: malte |
Date: 2018-09-19.09:06:30 |
|
I would have a look if the behaviour is reproducible with a 64-bit build. If
not, we should not bother further.
|
msg7553 (view) |
Author: florian |
Date: 2018-09-19.08:32:07 |
|
It gets even more curious: I tried what you suggested and saw no effect in the
experiment. Then I wanted to confirm that the diff I mentioned in msg7548 was
responsible. I created two new debug revisions:
debug5: initializes StateRegistry with a task (not a proxy) but then stores both
the task and the proxy and has accessors for both. The unpack method uses
get_task. All other changes of this issue are also contained in this version.
debug6: exactly like debug5 but the unpack method uses get_task_proxy
I also included debugbase (last revision on default branch) and debug4
(including all changes) in the experiment. So in total we have the following diffs:
debugbase -> debug5: most of the changes
debug5 -> debug6: only changes the way unpack accesses the task
debug6 -> debug4: switches ctor from task to task proxy and removes unused get_task
I expected to see the difference from debug5 to debug6 but surprisingly, debug6
is even a little bit faster than debugbase:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue791-debug-opt3-issue791-debugbase-issue791-debug6-compare.html
The actual difference happens when we switch the constructor argument and remove
the dead code from debug6 to debug4:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue791-debug-opt3-issue791-debug4-issue791-debug6-compare.html
Diff (displayed in the wrong way again: red is new, green is old):
https://bitbucket.org/FlorianPommerening/downward-issues/compare/issue791-debug6..issue791-debug4#diff
This code is only executed once, so I have no idea why it would have any
influence on performance.
|
msg7550 (view) |
Author: malte |
Date: 2018-09-18.22:57:46 |
|
Curious. "const TaskProxy &" is an extra level of indirection compared to "const
AbstractTask &". It would be unusual for this to make such a huge difference in
performance because I don't think we're using it in an inner loop. (Are we?)
But you could try return TaskProxy instead of const TaskProxy &. That should
have exactly the same representation in memory as const AbstractTask &.
|
msg7548 (view) |
Author: florian |
Date: 2018-09-18.22:48:42 |
|
The experiments on the intermediate versions are done. I used 4 revisions:
* base: the version on default where the branch started
* debug1: introduces the unpack() method of global states
* debug2: additionally introduces TaskProxy::create_state
* debug3: additionally uses task proxies in the state registry (equivalent to v2
below)
I'm focusing on timeouts in the release build but the reports also have the data
for out-of-memory errors and the debug build if you are interested. The display
of the diffs is not ideal because I committed the revisions in the wrong order.
base -> debug1 (adding unpack):
-9 timeouts (code got faster)
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue791-debug-opt-issue791-base-issue791-debug1-compare.html
actual diff:
https://bitbucket.org/FlorianPommerening/downward-issues/compare/issue791-debug1..issue791-base#diff
diff ignoring the merge of other issues
https://bitbucket.org/FlorianPommerening/downward-issues/compare/issue791-debug1..f2c6ad7320b7#diff
debug1 -> debug2 (adding create_state):
+1 timeout (no large change in speed)
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue791-debug-opt-issue791-debug1-issue791-debug2-compare.html
diff:
https://bitbucket.org/FlorianPommerening/downward-issues/compare/issue791-debug2..issue791-debug1#diff
debug2 -> debug3 (adding get_task_proxy):
+16 timeouts (code got slower)
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue791-debug-opt-issue791-debug2-issue791-debug3-compare.html
diff (displayed the wrong way, the red part is in debug3, the green in debug2)
https://bitbucket.org/FlorianPommerening/downward-issues/compare/issue791-debug2..issue791-debug3#diff
Looks like the overhead comes from changing
TaskProxy task_proxy(registry->get_task());
to
TaskProxy task_proxy = registry->get_task_proxy();
in GlobalState::unpack()
and
const AbstractTask &task;
const AbstractTask &get_task() const {
return task;
}
to
TaskProxy task_proxy;
const TaskProxy &get_task_proxy() const {
return task_proxy;
}
in StateRegistry.
I still do not see why.
|
msg7533 (view) |
Author: malte |
Date: 2018-09-18.18:44:13 |
|
To be fair, the (positive) effect on the debug config, which is also very
consistent, is equally puzzling. This may be one of these cases where the
compiler takes a different turn based on something more or less random. I'm
interested in the results for the intermediate configurations, but maybe there
isn't much we can do here.
Another thing you can try (although I don't know how easy it is) is run the
experiment with a different compiler.
|
msg7532 (view) |
Author: florian |
Date: 2018-09-18.18:26:00 |
|
Unfortunately, there was no difference after changing the `const State &&` thing:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue791-v2-opt-issue791-base-issue791-v2-compare.html
I tried running more local tests with versions of the code that are between the
default branch and the tip of the issue branch but the variance is so high that
it is hard to see any change. I'll put those intermediate tests on the grid as well.
|
msg7519 (view) |
Author: florian |
Date: 2018-09-18.15:16:08 |
|
I made a similar local test and reached the same conclusion. I started another
experiment on the grid. I have no idea where the high variance comes from, we
saw it in issue794 as well.
|
msg7515 (view) |
Author: malte |
Date: 2018-09-18.15:00:05 |
|
I made a little local test of the `const State &&` thing we discussed on
bitbucket, just running
./build.py && ./fast-downward.py
~/downward/benchmarks/barman-opt11-strips/pfile01-001.pddl --search "astar(blind())"
locally with and without the "const". Unfortunately, runtime fluctuates *a lot*
on my machine for this task, by more than 10%, so it's hard to tell a
difference. But for what it's worth, the best of 5 runs was 20.1368s without
const and 21.4119s with const. So maybe we're on to something.
|
msg7513 (view) |
Author: malte |
Date: 2018-09-18.14:01:24 |
|
I looked at the code again and didn't find anything that should have a major
performance impact. There are a few places where we move a State around, rather
than a vector<int>, and certainly the overhead of that is non-zero, but also it
should be small.
A before/after profile might reveal if we do any copying we don't expect, for
example by checking how many times we call the copy constructor of State as
opposed to the move constructor, etc.
|
msg7511 (view) |
Author: florian |
Date: 2018-09-18.13:57:57 |
|
I tried analyzing this with callgrind but the differences are so small, its hard
to see in a profile. I'm trying a different task now and will also try
cachegrind to see if cache efficiency could be responsible.
|
msg7510 (view) |
Author: malte |
Date: 2018-09-18.13:35:26 |
|
It is a small slowdown, but it is quite consistent. It would be good to figure
out what is going on here. Would a profile help?
|
msg7508 (view) |
Author: florian |
Date: 2018-09-18.12:37:23 |
|
Strangely, the debug mode is faster and the release mode is slower after the
change in the issue:
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue791-v1-opt-issue791-base-issue791-v1-compare.html
|
msg7504 (view) |
Author: malte |
Date: 2018-09-18.09:26:25 |
|
Code looks clean to me. (I left two comments.) If it works correctly, it can be
merged. ;-) I'd appreciate a small experiment (5 minutes sufficient), and given
the new assertions, this should include a debug config.
|
msg7501 (view) |
Author: jendrik |
Date: 2018-09-18.00:07:52 |
|
I left two small comments. Code looks good to me.
|
msg7493 (view) |
Author: florian |
Date: 2018-09-17.16:03:25 |
|
I updated the pull request with the change from issue824.
|
msg7181 (view) |
Author: florian |
Date: 2018-06-04.20:31:32 |
|
Thanks Jendrik. I made the changes you suggested.
|
msg7176 (view) |
Author: jendrik |
Date: 2018-06-04.18:37:53 |
|
I left some minor comments on Bitbucket. I think this is an elegant solution.
|
msg7170 (view) |
Author: florian |
Date: 2018-06-04.17:51:35 |
|
Pull request #43 shows the difference to branch issue774 because this one should
be merged first. Once it is merged, I'll update the pull request.
https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/43
|
msg7167 (view) |
Author: florian |
Date: 2018-06-04.17:01:52 |
|
> The registry should also be generated from a task in the same way successor
> generator and similar classes are generated.
Maybe this was a bad idea. The successor generator, int packer, and axiom
evaluator can all be used through a const reference but the state registry
cannot. Returning it as a non-const reference from a task proxy would mean that
the method could not be const. Maybe its better to have a constructor of the
state registry that takes a task proxy as a parameter and have the search engine
create a registry internally.
|
msg7157 (view) |
Author: florian |
Date: 2018-06-04.14:07:08 |
|
As part of using the task interface everywhere, we want to switch the
StateRegistry to the new interface. This will be mostly done with issue774 but
some issues will remain. For example, the registry should use a task proxy
internally, not a shared pointer to an abstract task. The registry should also
be generated from a task in the same way successor generator and similar classes
are generated.
In this issue, we want to clean up the registry and fix these remaining issues.
|
|
Date |
User |
Action |
Args |
2018-09-19 19:21:15 | malte | set | messages:
+ msg7603 |
2018-09-19 16:41:49 | florian | set | status: reviewing -> resolved messages:
+ msg7588 |
2018-09-19 16:16:59 | malte | set | messages:
+ msg7583 |
2018-09-19 16:10:20 | florian | set | messages:
+ msg7581 |
2018-09-19 15:03:35 | malte | set | messages:
+ msg7573 |
2018-09-19 13:46:04 | florian | set | messages:
+ msg7566 |
2018-09-19 09:11:42 | malte | set | messages:
+ msg7555 |
2018-09-19 09:06:30 | malte | set | messages:
+ msg7554 |
2018-09-19 08:32:07 | florian | set | messages:
+ msg7553 |
2018-09-18 22:57:46 | malte | set | messages:
+ msg7550 |
2018-09-18 22:48:42 | florian | set | messages:
+ msg7548 |
2018-09-18 18:44:13 | malte | set | messages:
+ msg7533 |
2018-09-18 18:26:00 | florian | set | messages:
+ msg7532 |
2018-09-18 15:16:08 | florian | set | messages:
+ msg7519 |
2018-09-18 15:00:05 | malte | set | messages:
+ msg7515 |
2018-09-18 14:01:24 | malte | set | messages:
+ msg7513 |
2018-09-18 13:57:57 | florian | set | messages:
+ msg7511 |
2018-09-18 13:35:26 | malte | set | messages:
+ msg7510 |
2018-09-18 12:37:23 | florian | set | messages:
+ msg7508 |
2018-09-18 09:26:25 | malte | set | messages:
+ msg7504 |
2018-09-18 00:07:52 | jendrik | set | messages:
+ msg7501 |
2018-09-17 16:03:25 | florian | set | messages:
+ msg7493 |
2018-06-04 20:31:32 | florian | set | messages:
+ msg7181 |
2018-06-04 18:37:53 | jendrik | set | status: in-progress -> reviewing messages:
+ msg7176 |
2018-06-04 17:51:35 | florian | set | messages:
+ msg7170 |
2018-06-04 17:01:52 | florian | set | messages:
+ msg7167 |
2018-06-04 14:07:08 | florian | create | |