Issue 1049: Make landmark dead end detection more efficient

Title	Make landmark dead end detection more efficient
Priority	wish	Status	resolved
Superseder		Nosy List	clemens, jendrik, malte, remo, salome, simon, thomas
Assigned To		Keywords
Optional summary

Created on 2022-03-07.16:59:38 by clemens, last changed by malte.

Messages
msg10673 (view)	Author: malte	Date: 2022-03-21.11:55:43
Sounds good!
msg10672 (view)	Author: clemens	Date: 2022-03-16.21:45:20
Looks good to me, I see no reason to reopen this issue.
msg10671 (view)	Author: remo	Date: 2022-03-12.19:01:58
The results for the v3 experiments with 30 minute timeout are in: https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-satisficing-30min-compare.html https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-optimal-30min-compare.html In the optimal case there are more unexplained errors than in the previous v3 experiment, but they look reasonable to me as they seem to mostly consist of additional over-length run logs in the parcprinter domain. The satisficing experiment has fewer unexplained errors which seems to be due to fewer TRANSLATE_OUT_OF_TIME errors. The newly added attribute lmgraph_generation_time shows that this step generally takes longer now, with the sum of differences being positive for all configurations. If I understand correctly, this is expected as we now do computation during the initialization that was previously done during state evaluation. As indicated by Malte's local experiments, the slight drop in coverage we observed in the 5 minute v3 experiments does not exist anymore. All configurations now have the same coverage for both versions, except seq-opt-bjolp-opt, which now solves one more task. The scores seem to improve very slightly for the optimal case and to worsen similarly slightly for the satisficing case. From my point of view these experiments are in line with our thoughts when merging and do not raise new concerns, therefore I will mark this issue as resolved for now. Feel free to reopen if I'm missing something.
msg10670 (view)	Author: remo	Date: 2022-03-11.17:41:36
We have merged this issue. We will post the results of the running experiments as soon as they become available, but expect similar results to the previous v3 experiments.
msg10669 (view)	Author: malte	Date: 2022-03-11.16:52:15
As discussed on Discords, I ran local experiments for the tasks where coverage changed in the v3 experiments to see if this can be explained by Grid noise. Results (base vs. v3): satisficing: lama-first, caldera-sat18-adl, p13: 242.627s vs. 242.592s lama-first, transport-sat11-strips, p19: 703.565s vs. 703.433s lama-first-pref, tetris-sat14-strips, p024: 740.235s vs. 740.440s lm-zg, assembly, prob29: 468.318s vs. 460.493s lm-zg, depot, p19: 317.722s vs. 311.192s optimal: lm-exhaust, blocks, 10-2: 280.007s vs. 281.045s lm-exhaust, parcprinter-opt11, 11, 359.748s vs. 348.811s lm-hm2, freecell, 6-1, 357.093s vs. 357.101s lm-hm2, pipes-tankage, p08, 219.347s vs. 219.278s lm-hm2, snake, p04, 397.086s vs. 396.19s I didn't run the bjolp-opt task because of the LP solver dependency. This looks all unproblematic to me. Times are usually very close, and where they differ, they tend to be better than before. There are two very very slightly slower runs and one slightly larger slowdown (280.007s to 281.045s) which I would not be concerned about.
msg10666 (view)	Author: remo	Date: 2022-03-11.13:16:21
The results for the experiments without the debug logs can be found here: https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-satisficing-compare.html https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v3-optimal-compare.html Even though we hoped to improve performance by making landmark dead-end detection more efficient, the results indicate that this is not the case. While the fluctuations are small, more point towards slightly worse performance with coverage dropping by one or two tasks for some configurations. Looking at these differences a little more closely, we concluded that most or all of them can be attributed to the short timeout of 5 minutes, leading to one configuration solving the task barely within the limit and the other failing. This could also explain the differences in score, because for the configurations where coverage is the same, scores are also very close. In order to better see how significant these differences are, we started a (hopefully final) round of experiments with a 30 minute timeout.
msg10665 (view)	Author: remo	Date: 2022-03-11.10:00:45
Here are the results of the experiment I mentioned in msg10662: https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v2-satisficing-compare.html https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v2-optimal-compare.html Of particular interest are the three attributes related to dead-ends, namely dead-end_empty_first_achievers, dead-end_empty_possible_achievers, and dead-end_empty_possible_achievers_finished. The first two indicate the number of times the respective case was encountered, same as in the experiment discussed in msg10642. The numbers for empty first achievers are the same before and after our changes. At first glance, this does not seem to be the case for empty possible achievers, as the numbers change in all optimal and one satisficing configuration. This can be explained by the fact that dead-end_empty_possible_achievers counts occurrences in all, and thus also in failed tasks, where the counts are not deterministic due to time and memory constraints firing. Because of this we added the "finished" variant of the dead-end_empty_possible_achievers attribute, that only considers tasks where coverage is 1 and we again see that the numbers remain the same, as expected. While functionality seems to be preserved, performance remains to be discussed. Due to the many over-length run log errors, these experiments do not reliably measure performance, but so far scores indicate that performance slightly improves for most configurations but also decreases in some cases. Especially lama first with preferred operators shows lower scores in most measures. The upcoming experiment aims to be more informative about performance by comparing the two versions without the debug logs.
msg10662 (view)	Author: remo	Date: 2022-03-10.14:55:01
Simon and I created a pull request for the code changes we (Malte, Clemens, Simon, Remo) implemented yesterday: https://github.com/aibasel/downward/pull/98 In order to avoid looping over every landmark in every state, we ensured that the static information required by the function dead_end_exists is only computed once. The first case considers non-derived unreached landmarks without first achievers. This information is completely static and we therefore only compute it once, when initializing the LandmarkStatusManager. As the existence of such landmarks means that the task is unsolvable, we store the boolean variable task_is_unsolvable indicating whether this is the case. The second case considers non-derived landmarks that are needed again but have no possible achievers. Here, the set of landmarks for which this check is relevant is static and we thus compute and store this set as unachievable_landmark_ids during initialization. The function dead_end_exists is now still called for every state, but only has to first check the task_is_unsolvable flag and second loop over the landmarks in unachievable_landmark_ids and check whether they are needed again, which would imply a dead-end. We set up an experiment that evaluates the effect of these changes and will link the results once available.
msg10642 (view)	Author: remo	Date: 2022-03-08.11:47:57
Clemens and I have run an experiment to investigate in which domains landmarks are detected as dead-ends by the function dead_end_exists. The results for satisficing and optimal configurations look as follows: https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v1-satisficing-report.html https://ai.dmi.unibas.ch/_tmp_files/christen/issue1049-v1-optimal-report.html Despite the unexplained errors related to large run logs, the essential information is there. The attributes dead-end_empty_first_achievers and dead-end_empty_possible_achievers show the number of times the respective case is encountered. The results are consistent across satisficing and optimal configurations. An empty set of first achievers in unreached landmarks leads to detected dead-ends only in the mystery domain. Empty possible achievers in needed- again landmarks appear in the domains: airport, mystery, organic-synthesis, parcprinter, sokoban, woodworking, and additionally in the satisficing-exclusive schedule domain. This information should come in handy as a reference when evaluating future solutions.
msg10639 (view)	Author: clemens	Date: 2022-03-07.16:59:38
The dead end detection in the LandmarkStatusManager class loops over every landmark in every state. This is unnecessarily inefficient, because we are only interested in landmarks that are not derived and have empty (first) achievers. The goal of this issue is to implement this test more efficiently, i.e., by looping only over a list of the landmarks that satisfy that criterion.

History
Date	User	Action	Args
2022-03-21 11:55:43	malte	set	messages: + msg10673
2022-03-16 21:45:20	clemens	set	messages: + msg10672
2022-03-12 19:01:58	remo	set	status: chatting -> resolved messages: + msg10671
2022-03-11 17:41:36	remo	set	messages: + msg10670
2022-03-11 16:52:15	malte	set	messages: + msg10669
2022-03-11 13:16:21	remo	set	messages: + msg10666
2022-03-11 10:00:45	remo	set	messages: + msg10665
2022-03-10 14:55:01	remo	set	messages: + msg10662
2022-03-08 11:47:57	remo	set	status: unread -> chatting nosy: + simon messages: + msg10642
2022-03-07 16:59:38	clemens	create

Issue1049