Created on 2022-03-07.16:59:38 by clemens, last changed by malte.
msg10673 (view) |
Author: malte |
Date: 2022-03-21.11:55:43 |
|
Sounds good!
|
msg10672 (view) |
Author: clemens |
Date: 2022-03-16.21:45:20 |
|
Looks good to me, I see no reason to reopen this issue.
|
msg10671 (view) |
Author: remo |
Date: 2022-03-12.19:01:58 |
|
The results for the v3 experiments with 30 minute timeout are in:
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-satisficing-30min-compare.html
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-optimal-30min-compare.html
In the optimal case there are more unexplained errors than in the previous v3 experiment, but they look reasonable to me as they seem to
mostly consist of additional over-length run logs in the parcprinter domain. The satisficing experiment has fewer unexplained errors which
seems to be due to fewer TRANSLATE_OUT_OF_TIME errors.
The newly added attribute *lmgraph_generation_time* shows that this step generally takes longer now, with the sum of differences being
positive for all configurations. If I understand correctly, this is expected as we now do computation during the initialization that was
previously done during state evaluation.
As indicated by Malte's local experiments, the slight drop in coverage we observed in the 5 minute v3 experiments does not exist anymore.
All configurations now have the same coverage for both versions, except seq-opt-bjolp-opt, which now solves one more task.
The scores seem to improve very slightly for the optimal case and to worsen similarly slightly for the satisficing case.
From my point of view these experiments are in line with our thoughts when merging and do not raise new concerns, therefore I will mark
this issue as resolved for now. Feel free to reopen if I'm missing something.
|
msg10670 (view) |
Author: remo |
Date: 2022-03-11.17:41:36 |
|
We have merged this issue. We will post the results of the running experiments as
soon as they become available, but expect similar results to the previous v3
experiments.
|
msg10669 (view) |
Author: malte |
Date: 2022-03-11.16:52:15 |
|
As discussed on Discords, I ran local experiments for the tasks where coverage changed in the v3 experiments to see if this can be explained by Grid noise.
Results (base vs. v3):
satisficing:
lama-first, caldera-sat18-adl, p13: 242.627s vs. 242.592s
lama-first, transport-sat11-strips, p19: 703.565s vs. 703.433s
lama-first-pref, tetris-sat14-strips, p024: 740.235s vs. 740.440s
lm-zg, assembly, prob29: 468.318s vs. 460.493s
lm-zg, depot, p19: 317.722s vs. 311.192s
optimal:
lm-exhaust, blocks, 10-2: 280.007s vs. 281.045s
lm-exhaust, parcprinter-opt11, 11, 359.748s vs. 348.811s
lm-hm2, freecell, 6-1, 357.093s vs. 357.101s
lm-hm2, pipes-tankage, p08, 219.347s vs. 219.278s
lm-hm2, snake, p04, 397.086s vs. 396.19s
I didn't run the bjolp-opt task because of the LP solver dependency.
This looks all unproblematic to me. Times are usually very close, and where they differ, they tend to be better than before. There are two very very slightly slower runs and one slightly larger slowdown (280.007s to 281.045s) which I would not be concerned about.
|
msg10666 (view) |
Author: remo |
Date: 2022-03-11.13:16:21 |
|
The results for the experiments without the debug logs can be found here:
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-satisficing-compare.html
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-optimal-compare.html
Even though we hoped to improve performance by making landmark dead-end detection more efficient, the
results indicate that this is not the case. While the fluctuations are small, more point towards slightly
worse performance with coverage dropping by one or two tasks for some configurations. Looking at these
differences a little more closely, we concluded that most or all of them can be attributed to the short
timeout of 5 minutes, leading to one configuration solving the task barely within the limit and the other
failing. This could also explain the differences in score, because for the configurations where coverage is
the same, scores are also very close.
In order to better see how significant these differences are, we started a (hopefully final) round of
experiments with a 30 minute timeout.
|
msg10665 (view) |
Author: remo |
Date: 2022-03-11.10:00:45 |
|
Here are the results of the experiment I mentioned in msg10662:
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v2-satisficing-compare.html
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v2-optimal-compare.html
Of particular interest are the three attributes related to dead-ends, namely *dead-end_empty_first_achievers*,
*dead-end_empty_possible_achievers*, and *dead-end_empty_possible_achievers_finished*. The first two indicate
the number of times the respective case was encountered, same as in the experiment discussed in msg10642. The
numbers for empty first achievers are the same before and after our changes. At first glance, this does not seem
to be the case for empty possible achievers, as the numbers change in all optimal and one satisficing
configuration. This can be explained by the fact that *dead-end_empty_possible_achievers* counts occurrences in
all, and thus also in failed tasks, where the counts are not deterministic due to time and memory constraints
firing. Because of this we added the "finished" variant of the *dead-end_empty_possible_achievers* attribute,
that only considers tasks where coverage is 1 and we again see that the numbers remain the same, as expected.
While functionality seems to be preserved, performance remains to be discussed. Due to the many over-length run
log errors, these experiments do not reliably measure performance, but so far scores indicate that performance
slightly improves for most configurations but also decreases in some cases. Especially lama first with preferred
operators shows lower scores in most measures.
The upcoming experiment aims to be more informative about performance by comparing the two versions without the
debug logs.
|
msg10662 (view) |
Author: remo |
Date: 2022-03-10.14:55:01 |
|
Simon and I created a pull request for the code changes we (Malte, Clemens, Simon, Remo) implemented yesterday: https://github.com/aibasel/downward/pull/98
In order to avoid looping over every landmark in every state, we ensured that the static information required by the function *dead_end_exists* is only computed
once.
The first case considers non-derived unreached landmarks without first achievers. This information is completely static and we therefore only compute it once, when
initializing the LandmarkStatusManager. As the existence of such landmarks means that the task is unsolvable, we store the boolean variable *task_is_unsolvable*
indicating whether this is the case.
The second case considers non-derived landmarks that are needed again but have no possible achievers. Here, the set of landmarks for which this check is relevant
is static and we thus compute and store this set as *unachievable_landmark_ids* during initialization.
The function *dead_end_exists* is now still called for every state, but only has to first check the *task_is_unsolvable* flag and second loop over the landmarks in
*unachievable_landmark_ids* and check whether they are needed again, which would imply a dead-end.
We set up an experiment that evaluates the effect of these changes and will link the results once available.
|
msg10642 (view) |
Author: remo |
Date: 2022-03-08.11:47:57 |
|
Clemens and I have run an experiment to investigate in which domains landmarks are detected as dead-ends by the
function *dead_end_exists*. The results for satisficing and optimal configurations look as follows:
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v1-satisficing-report.html
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v1-optimal-report.html
Despite the unexplained errors related to large run logs, the essential information is there. The attributes
*dead-end_empty_first_achievers* and *dead-end_empty_possible_achievers* show the number of times the respective
case is encountered.
The results are consistent across satisficing and optimal configurations. An empty set of first achievers in
unreached landmarks leads to detected dead-ends only in the mystery domain. Empty possible achievers in needed-
again landmarks appear in the domains: airport, mystery, organic-synthesis, parcprinter, sokoban, woodworking, and
additionally in the satisficing-exclusive schedule domain.
This information should come in handy as a reference when evaluating future solutions.
|
msg10639 (view) |
Author: clemens |
Date: 2022-03-07.16:59:38 |
|
The dead end detection in the LandmarkStatusManager class loops over every landmark in every state. This is unnecessarily inefficient, because we are only interested in landmarks that are not derived and have empty (first) achievers. The goal of this issue is to implement this test more efficiently, i.e., by looping only over a list of the landmarks that satisfy that criterion.
|
|
Date |
User |
Action |
Args |
2022-03-21 11:55:43 | malte | set | messages:
+ msg10673 |
2022-03-16 21:45:20 | clemens | set | messages:
+ msg10672 |
2022-03-12 19:01:58 | remo | set | status: chatting -> resolved messages:
+ msg10671 |
2022-03-11 17:41:36 | remo | set | messages:
+ msg10670 |
2022-03-11 16:52:15 | malte | set | messages:
+ msg10669 |
2022-03-11 13:16:21 | remo | set | messages:
+ msg10666 |
2022-03-11 10:00:45 | remo | set | messages:
+ msg10665 |
2022-03-10 14:55:01 | remo | set | messages:
+ msg10662 |
2022-03-08 11:47:57 | remo | set | status: unread -> chatting nosy:
+ simon messages:
+ msg10642 |
2022-03-07 16:59:38 | clemens | create | |
|