Title Make landmark dead end detection more efficient
Priority wish Status resolved
Superseder Nosy List clemens, jendrik, malte, remo, salome, simon, thomas
Assigned To Keywords
Optional summary

Created on 2022-03-07.16:59:38 by clemens, last changed by malte.

msg10673 (view) Author: malte Date: 2022-03-21.11:55:43
Sounds good!
msg10672 (view) Author: clemens Date: 2022-03-16.21:45:20
Looks good to me, I see no reason to reopen this issue.
msg10671 (view) Author: remo Date: 2022-03-12.19:01:58
The results for the v3 experiments with 30 minute timeout are in:

In the optimal case there are more unexplained errors than in the previous v3 experiment, but they look reasonable to me as they seem to 
mostly consist of additional over-length run logs in the parcprinter domain. The satisficing experiment has fewer unexplained errors which 
seems to be due to fewer TRANSLATE_OUT_OF_TIME errors.

The newly added attribute *lmgraph_generation_time* shows that this step generally takes longer now, with the sum of differences being 
positive for all configurations. If I understand correctly, this is expected as we now do computation during the initialization that was 
previously done during state evaluation.

As indicated by Malte's local experiments, the slight drop in coverage we observed in the 5 minute v3 experiments does not exist anymore. 
All configurations now have the same coverage for both versions, except seq-opt-bjolp-opt, which now solves one more task.

The scores seem to improve very slightly for the optimal case and to worsen similarly slightly for the satisficing case.

From my point of view these experiments are in line with our thoughts when merging and do not raise new concerns, therefore I will mark 
this issue as resolved for now. Feel free to reopen if I'm missing something.
msg10670 (view) Author: remo Date: 2022-03-11.17:41:36
We have merged this issue. We will post the results of the running experiments as 
soon as they become available, but expect similar results to the previous v3 
msg10669 (view) Author: malte Date: 2022-03-11.16:52:15
As discussed on Discords, I ran local experiments for the tasks where coverage changed in the v3 experiments to see if this can be explained by Grid noise.

Results (base vs. v3):


lama-first, caldera-sat18-adl, p13: 242.627s vs. 242.592s
lama-first, transport-sat11-strips, p19: 703.565s vs. 703.433s
lama-first-pref, tetris-sat14-strips, p024: 740.235s vs. 740.440s
lm-zg, assembly, prob29: 468.318s vs. 460.493s
lm-zg, depot, p19: 317.722s vs. 311.192s


lm-exhaust, blocks, 10-2: 280.007s vs. 281.045s
lm-exhaust, parcprinter-opt11, 11, 359.748s vs. 348.811s
lm-hm2, freecell, 6-1, 357.093s vs. 357.101s
lm-hm2, pipes-tankage, p08, 219.347s vs. 219.278s
lm-hm2, snake, p04, 397.086s vs. 396.19s

I didn't run the bjolp-opt task because of the LP solver dependency.

This looks all unproblematic to me. Times are usually very close, and where they differ, they tend to be better than before. There are two very very slightly slower runs and one slightly larger slowdown (280.007s to 281.045s) which I would not be concerned about.
msg10666 (view) Author: remo Date: 2022-03-11.13:16:21
The results for the experiments without the debug logs can be found here:

Even though we hoped to improve performance by making landmark dead-end detection more efficient, the 
results indicate that this is not the case. While the fluctuations are small, more point towards slightly 
worse performance with coverage dropping by one or two tasks for some configurations. Looking at these 
differences a little more closely, we concluded that most or all of them can be attributed to the short 
timeout of 5 minutes, leading to one configuration solving the task barely within the limit and the other 
failing. This could also explain the differences in score, because for the configurations where coverage is 
the same, scores are also very close.

In order to better see how significant these differences are, we started a (hopefully final) round of 
experiments with a 30 minute timeout.
msg10665 (view) Author: remo Date: 2022-03-11.10:00:45
Here are the results of the experiment I mentioned in msg10662:

Of particular interest are the three attributes related to dead-ends, namely *dead-end_empty_first_achievers*, 
*dead-end_empty_possible_achievers*, and *dead-end_empty_possible_achievers_finished*. The first two indicate 
the number of times the respective case was encountered, same as in the experiment discussed in msg10642. The 
numbers for empty first achievers are the same before and after our changes. At first glance, this does not seem 
to be the case for empty possible achievers, as the numbers change in all optimal and one satisficing 
configuration. This can be explained by the fact that *dead-end_empty_possible_achievers* counts occurrences in 
all, and thus also in failed tasks, where the counts are not deterministic due to time and memory constraints 
firing. Because of this we added the "finished" variant of the *dead-end_empty_possible_achievers* attribute, 
that only considers tasks where coverage is 1 and we again see that the numbers remain the same, as expected.

While functionality seems to be preserved, performance remains to be discussed. Due to the many over-length run 
log errors, these experiments do not reliably measure performance, but so far scores indicate that performance 
slightly improves for most configurations but also decreases in some cases. Especially lama first with preferred 
operators shows lower scores in most measures.

The upcoming experiment aims to be more informative about performance by comparing the two versions without the 
debug logs.
msg10662 (view) Author: remo Date: 2022-03-10.14:55:01
Simon and I created a pull request for the code changes we (Malte, Clemens, Simon, Remo) implemented yesterday:

In order to avoid looping over every landmark in every state, we ensured that the static information required by the function *dead_end_exists* is only computed 

The first case considers non-derived unreached landmarks without first achievers. This information is completely static and we therefore only compute it once, when 
initializing the LandmarkStatusManager. As the existence of such landmarks means that the task is unsolvable, we store the boolean variable *task_is_unsolvable* 
indicating whether this is the case.

The second case considers non-derived landmarks that are needed again but have no possible achievers. Here, the set of landmarks for which this check is relevant 
is static and we thus compute and store this set as *unachievable_landmark_ids* during initialization.

The function *dead_end_exists* is now still called for every state, but only has to first check the *task_is_unsolvable* flag and second loop over the landmarks in 
*unachievable_landmark_ids* and check whether they are needed again, which would imply a dead-end.

We set up an experiment that evaluates the effect of these changes and will link the results once available.
msg10642 (view) Author: remo Date: 2022-03-08.11:47:57
Clemens and I have run an experiment to investigate in which domains landmarks are detected as dead-ends by the 
function *dead_end_exists*. The results for satisficing and optimal configurations look as follows:

Despite the unexplained errors related to large run logs, the essential information is there. The attributes 
*dead-end_empty_first_achievers* and *dead-end_empty_possible_achievers* show the number of times the respective 
case is encountered.

The results are consistent across satisficing and optimal configurations. An empty set of first achievers in 
unreached landmarks leads to detected dead-ends only in the mystery domain. Empty possible achievers in needed-
again landmarks appear in the domains: airport, mystery, organic-synthesis, parcprinter, sokoban, woodworking, and 
additionally in the satisficing-exclusive schedule domain.

This information should come in handy as a reference when evaluating future solutions.
msg10639 (view) Author: clemens Date: 2022-03-07.16:59:38
The dead end detection in the LandmarkStatusManager class loops over every landmark in every state. This is unnecessarily inefficient, because we are only interested in landmarks that are not derived and have empty (first) achievers. The goal of this issue is to implement this test more efficiently, i.e., by looping only over a list of the landmarks that satisfy that criterion.
Date User Action Args
2022-03-21 11:55:43maltesetmessages: + msg10673
2022-03-16 21:45:20clemenssetmessages: + msg10672
2022-03-12 19:01:58remosetstatus: chatting -> resolved
messages: + msg10671
2022-03-11 17:41:36remosetmessages: + msg10670
2022-03-11 16:52:15maltesetmessages: + msg10669
2022-03-11 13:16:21remosetmessages: + msg10666
2022-03-11 10:00:45remosetmessages: + msg10665
2022-03-10 14:55:01remosetmessages: + msg10662
2022-03-08 11:47:57remosetstatus: unread -> chatting
nosy: + simon
messages: + msg10642
2022-03-07 16:59:38clemenscreate