Issue1036

Title Make landmark status manager deterministic and flawless
Priority bug Status in-progress
Superseder Nosy List clemens, jendrik, malte, salome, silvan, thomas
Assigned To clemens Keywords
Optional summary

Created on 2021-09-14.10:01:44 by clemens, last changed by clemens.

Messages
msg10451 (view) Author: clemens Date: 2021-10-25.11:10:08
The new experiments ran through, they contain both v1 (now from the correct branch) and v2 which contains the smaller changes discussed with Malte. At a first glance, the results don't look too different from what we've looked at before.
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue1036-v1-v2-optimal-compare.html
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue1036-v1-v2-satisficing-compare.html
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue1036-v1-v2-lama-anytime-compare.html

I didn't have time to look into the other things yet, so there's still a lot to do. I'll keep you updated.
msg10448 (view) Author: malte Date: 2021-10-12.11:20:52
Some other things we discussed that I think should be addressed before merging:

- The code doesn't currently explain what it implements. It should point to the workshop paper in the code comments, and perhaps also in the user documentation.

- The implementation differs from what is described in the workshop paper because it defines the L^- landmarks differently, based on a recomputation in every step rather than storing L^- information with the search nodes as described in the paper. It is not clear if it is semantically equivalent. If yes, we need to explain to ourselves why. If not, we should point this out. Either way, the discrepancy to the literature should be mentioned in the code.

- We need a change log entry. Like all change log entries, it should make clear what changes on the user side. Two important changes are that obedient-reasonable orders are no longer used (but still computed) and that reasonable orders can now be used in admissible heuristics (but we haven't evaluated this). Both of these are a bit hard to communicate. Why spend time computing obedient-reasonable orders if they are subsequently ignored? Why make reasonable orders usable for admissible heuristics but not try them? So I think both of these should be addressed before our next release (in December), though not necessarily in this issue.
msg10447 (view) Author: clemens Date: 2021-10-12.11:13:05
Malte and I had a look at the results and the code together. In summary, Malte didn't find anything too concerning in the experiments. Since this is fixing a bug, we agreed it should be fine to merge even if the experiment shows slightly worse performance in general. Some things we would still like to understand first and I will have a look at:
 - Why is seq-opt-bjolp consistently worse in the airport domain?
 - Why is lm_zg consistently worse in the openstacks-sat14-strips domain?

However, while going through the code diff, we discussed several smaller changes that I will implement and re-evaluate before I look deeper into these questions. One important thing we found out is that this is actually very closely related to issue202. This was not on our radar when Thomas and I wrote the HSDIP paper.
msg10438 (view) Author: clemens Date: 2021-09-14.11:39:37
I have created a pull request, would somebody be weilling to review?
https://github.com/aibasel/downward/pull/70

I have also conducted experiments to check whether this impacts the performance of the planner significantly:
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v1-optimal-issue383-hsdip-base-issue383-hsdip-v1-compare.html
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v1-lama-anytime-issue383-hsdip-base-issue383-hsdip-v1-compare.html
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v1-lama-anytime-issue383-hsdip-base-issue383-hsdip-v1-compare.html

Unfortunately, the experiments were done on a different branch, however with the exact same code version. Should I still rerun them to have the data with the correct revision numbers and algorithm names?

I see some slightly concerning results in the experiments:
- coverage goes down by 2 for "lama" (lama-anytime) and "lama-first" (satisficing), and by 1 for "lama-first-pref" (satisficing)
- time scores go up by 1.5 for "seq-opt-bjolp" (optimal) and by 3 for "lm_zg" (satisficing)
- total costs increas for all LAMA configurations, which is unexpected since we observed a different effect in our HSDIP experiments (the code diff there was more complex, though, because we have fixed some other issues as well)

Other than that, I think the results look fine. I would be interested in other people's opinions: do you think these results are concerning, given we're fixing a bug here? Or could this already be good enough to merge?
msg10436 (view) Author: clemens Date: 2021-09-14.10:01:43
As discussed in issue383, *LandmarkStatusManager::update_accepted_landmarks(..)* is non-deterministic; the function *LandmarkStatusManager::landmark_is_leaf(..)* depends on the acceptance information about other landmarks, so the order in which this is updated may influence the result.

In our HSDIP 2021 paper (https://ai.dmi.unibas.ch/papers/buechner-keller-icaps2021whsdip.pdf), Thomas and I have worked out another issue with the landmark status manager (which we call landmark graph progression in the paper): in the presence of reasonable orderings, some landmarks might not be accepted even though they were made true for the last time in all (optimal) plans. Hence, the resulting heuristic becomes inadmissible and cannot be used for cost-optimal planning. (This is not really a problem in the code, because reasonable orderings are not permitted in combination with optimal planning.)

In this issue, I would like to change the status manager so that it is (i) deterministic and (ii) flawless. We can achieve this by implementing the landmark graph progression called ARO (admissible reasonable orderings) in our paper.
History
Date User Action Args
2021-10-25 11:10:09clemenssetmessages: + msg10451
2021-10-12 11:20:52maltesetmessages: + msg10448
2021-10-12 11:13:06clemenssetstatus: chatting -> in-progress
assignedto: clemens
messages: + msg10447
2021-09-14 11:39:37clemenssetstatus: unread -> chatting
messages: + msg10438
2021-09-14 10:01:44clemenscreate