Issue583

Title M&S refactoring part 2: move more functionality to FactoredTransitionSystem
Priority wish Status resolved
Superseder Nosy List florian, malte, silvan
Assigned To silvan Keywords
Optional summary
v1: move interface from TransitionSystem to FactoredTransitionSystem
v2: classes using FactoredTransitionSystem can access underlying Distances objects
v3: Labels class uses FactoredTransitionSystem for label reduction
v4: merge strategies use FactoredTransitionSystem
v5: safety experiment after merge

Created on 2015-10-27.11:18:13 by silvan, last changed by silvan.

Summary
v1: move interface from TransitionSystem to FactoredTransitionSystem
v2: classes using FactoredTransitionSystem can access underlying Distances objects
v3: Labels class uses FactoredTransitionSystem for label reduction
v4: merge strategies use FactoredTransitionSystem
v5: safety experiment after merge
Messages
msg4804 (view) Author: silvan Date: 2015-11-16.09:43:15
The v5 comparison remains as it was before, without the unexplained errors:
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-v3-issue583-v5-compare.html

Closing this one again.
msg4803 (view) Author: silvan Date: 2015-11-15.19:21:21
After rerunning versions v1 through v4, it seems clear that v1 already caused a
lot of improvement in performance and is probably solely responsible for the
overall increase observed in v5 (which I'll also re-run to get rid of the
unexplained errors -- I seem to have fixed the problem with running VAL). The
other three revisions did not really affect performance.

For ease of access, I re-post the links here:
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-issue583-v1-compare.html
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-v1-issue583-v2-compare.html
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-v2-issue583-v3-compare.html
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-v2-issue583-v4-compare.html
msg4801 (view) Author: malte Date: 2015-11-14.20:29:11
> Thanks for mentioning you faced similar problems too. I wanted to mention,
> however, that I did not use or even build and link any LP solver with these
> experiments (in fact, I never did so far).

Sure! We were discussing the "failure" reported by the buildbot, and the
buildbot (at least in some configurations) does link in the LP solvers.

> Concerning the buildbot "baseline", I also don't know how to update it, if it
> exists -- my intuitive understanding so far was that it compares the results
> against earlier results and complains if the difference is too large.

The buildbot compares to numbers stored in a file. We generate(d) these numbers
by running a given planner revision and saving its results. If there are changes
that affect these numbers in a significant way, we need to regenerate them by
creating the file again for the revision we want to compare again in the future.
There is a script to do this; if none of us three knows how this is done, then
Jendrik will know it. If I recall correctly, this was originally implemented by
Erez when we first started using a buildbot, and Jendrik later adapted it to use
lab.
msg4799 (view) Author: silvan Date: 2015-11-14.15:36:36
> There seem to be something like 10000 unexplained errors. Usually there should
> be 0. :-) Do we know why they are there, and are you sure that the rest of the
> results make sense (i.e. are not influenced by these errors)?

This is due to failing to run VAL. I hope I fixed this now by changing something
in .bashrc, but I am not sure...

Thanks for mentioning you faced similar problems too. I wanted to mention,
however, that I did not use or even build and link any LP solver with these
experiments (in fact, I never did so far).

Concerning the buildbot "baseline", I also don't know how to update it, if it
exists -- my intuitive understanding so far was that it compares the results
against earlier results and complains if the difference is too large.
msg4798 (view) Author: malte Date: 2015-11-14.12:08:02
The memory delta seems to be 2 MB across the board (see second link in Silvan's
last message that points to the failing buildbot log).

The memory savings that Silvan mentions are for the "serious" problems where the
data structures used planning algorithms actually matter. Our buildbot tests are
tiny, and memory is dominated by the code itself, with total memory usage around
20 MB for most configurations.

I'm a bit uncomfortable with having the daily/weekly tests on a constant red,
since it means we get a lot of buildbot errors and I should check all of them to
make sure that we don't miss the genuine ones over the spurious ones. But if we
can get this sorted out by the end of next week, I guess that's soon enough.

> (also, I don't know how to do this)

Me neither. Perhaps it would be good if we added some information on this to
http://www.fast-downward.org/ForDevelopers/BuildBot.
msg4797 (view) Author: florian Date: 2015-11-14.11:56:15
Strange that the buildbot still complains about a memory regression, if memory
is reduced everywhere as Silvan said. The memory regression I know about is
caused by linking with CLP. It causes a constant 2MB increase in memory usage.
If you see any more than that, that's caused by something else. We wanted to
discuss this during the next meeting, so I would like to wait with updating the
baseline until then (also, I don't know how to do this).
msg4796 (view) Author: malte Date: 2015-11-14.11:39:01
There seem to be something like 10000 unexplained errors. Usually there should
be 0. :-) Do we know why they are there, and are you sure that the rest of the
results make sense (i.e. are not influenced by these errors)?

Apart from this, the results look good indeed! :-)

Please rerun the individual experiments to find out where we got the improvement
iff you want to.

I've added Florian because of the memory regression you mention. We recently saw
something similar, which was caused by a change in the setup that caused one of
the LP solvers to be included that wasn't included before. This increases the
size of the planner, which also contributes to memory. This seems to still be
the same thing. Florian, do you agree that this is what Silvan sees here? If
yes, can you update the reference values or get someone else to do it so that we
don't keep getting these buildbot errors?
msg4795 (view) Author: silvan Date: 2015-11-14.11:18:33
The results are ready and look quite promising:
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-v3-issue583-v5-compare.html

I'd like to know from what steps the improvement comes from, but I am not sure
if it is worth rerunning all the comparisons that I posted before. What do you
think?

Although memory seems to be reduced everywhere, the buildbot complains about a
memory regression in gripper with merge-and-shrink:
http://buildbot.fast-downward.org/builders/weekly-aidev-gcc48-LP/builds/15/steps/weekly_planner/logs/stdio
I don't know if the change is due to merging this issue, as there are 8
changesets included in the test, but I suppose others did not touch the
merge-and-shrink code (but maybe other parts that may affect memory?). When
looking at the results I just posted, memory seems to be fine also for gripper,
for all configurations.
msg4778 (view) Author: silvan Date: 2015-11-12.09:23:41
Safety experiment (v5) after merging from default is queued and will take a
while to complete. I nevertheless already merged an pushed, hence closing this one.
msg4769 (view) Author: malte Date: 2015-11-11.13:54:53
Sounds good! (No time for reviewing the code at the moment -- feel free to merge
if you're happy with it.)
msg4768 (view) Author: silvan Date: 2015-11-11.13:50:26
I think that before the ICAPS deadline, I cannot continue working on this issue,
and hence I'd like to close this one for the moment, continuing with further
refactoring in the sense of this issue in a separate one. Many things get
changed in FastDownward right now and having less merging work seems to be a
good option to me. Furthermore, this issue also provides a first non-polished
version of a "common_setup" which uses lab's new FastDownwawrdExperiment class,
which then others could use and improve.

Malte, what do you think? If you agree, would you like to have a look at the
diff or should we postpone this for another round of coding/reviewing in the future?
msg4733 (view) Author: silvan Date: 2015-11-04.14:33:37
Finally, also DFP uses (and the linear strategies) use the new factored
representation. get_vector() is gone.

http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-v2-issue583-v4-compare.html

I recently noticed a bug in DFP, and just now another one. I'll address both in
a separate issue.
msg4732 (view) Author: silvan Date: 2015-11-04.10:31:18
In the third version, I merged from default (new baseline) and the Labels class
also uses the FactoredTransitionSystem class directly. Performance did not change.

http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-v2-issue583-v3-compare.html
msg4730 (view) Author: silvan Date: 2015-11-01.19:12:49
Second step: let classes requiring distance information directly access the
Distances object directly rather than indirectly via FactoredTransitionSystem.
That allows more inlining, but didn't really help. Coverage of CGGL actually
decreases (but increased in version 1).
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-v1-issue583-v2-compare.html
msg4705 (view) Author: malte Date: 2015-10-30.15:31:40
Looks good!
msg4704 (view) Author: silvan Date: 2015-10-30.10:44:28
In a first step, I moved lots of the code from TransitionSytem to
FactoredTransitionSystem. The change seems to be correct and did not hurt
performance:
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue583-base-issue583-v1-compare.html
(the unexplained errors stem from a failure of running VAL, probably related to
the new way of using Fast Downward and my wrong maia settings)
msg4676 (view) Author: silvan Date: 2015-10-27.11:18:13
This is part of meta issue567.

We want to move further parts of the code from TransitionSystem to
FactoredTransitionSystem. In particular, HeuristicRepresentation and Distances
could be managed by FactoredTransitionSystem, and other classes using the
"vector of all transition systems" should be changed to use the factored
representation.
History
Date User Action Args
2015-11-16 09:43:15silvansetstatus: chatting -> resolved
messages: + msg4804
summary: v1: move interface from TransitionSystem to FactoredTransitionSystem v2: classes using FactoredTransitionSystem can access underlying Distances objects v3: Labels class uses FactoredTransitionSystem for label reduction v4: merge strategies use FactoredTransitionSystem -> v1: move interface from TransitionSystem to FactoredTransitionSystem v2: classes using FactoredTransitionSystem can access underlying Distances objects v3: Labels class uses FactoredTransitionSystem for label reduction v4: merge strategies use FactoredTransitionSystem v5: safety experiment after merge
2015-11-15 19:21:21silvansetmessages: + msg4803
2015-11-14 20:29:11maltesetmessages: + msg4801
2015-11-14 15:36:36silvansetmessages: + msg4799
2015-11-14 12:08:02maltesetmessages: + msg4798
2015-11-14 11:56:15floriansetmessages: + msg4797
2015-11-14 11:39:01maltesetnosy: + florian
messages: + msg4796
2015-11-14 11:18:33silvansetmessages: + msg4795
2015-11-12 09:23:41silvansetmessages: + msg4778
2015-11-11 13:54:53maltesetmessages: + msg4769
2015-11-11 13:50:26silvansetmessages: + msg4768
2015-11-11 13:43:09silvansetsummary: v1: move interface from TransitionSystem to FactoredTransitionSystem v2: classes using FactoredTransitionSystem can access underlying Distances objects v3: Labels class uses FactoredTransitionSystem for label reduction v4: merge strategies use FactoredTransitionSystem
2015-11-04 14:33:37silvansetmessages: + msg4733
2015-11-04 10:31:18silvansetmessages: + msg4732
2015-11-01 19:12:49silvansetmessages: + msg4730
2015-10-30 15:31:40maltesetmessages: + msg4705
2015-10-30 10:44:28silvansetmessages: + msg4704
2015-10-27 11:18:13silvancreate