Issue1071

Title Have two separate heuristics for admissible and inadmissible LM-count
Priority wish Status resolved
Superseder Nosy List clemens, jendrik, malte, remo, salome, silvan
Assigned To Keywords
Optional summary
This is part of issue987.

Created on 2022-12-14.17:04:50 by clemens, last changed by clemens.

Summary
This is part of issue987.
Messages
msg11028 (view) Author: clemens Date: 2023-02-16.10:56:37
Done and merged. Thanks everyone!
msg11015 (view) Author: clemens Date: 2023-02-14.13:54:26
I've had a thorough live code-review session with Malte and we identified some things to polish the code. Additionally, we identified an issue that leads to significantly increased landmark generation times in some domains (particularly agricola) due to checking whether a task has conditional effects which was not done in the admissible=false case of the base version of this code and is not necessary in that case anyway. Independent of this, we decided to restructure the affected part of the code by introducing two-stage initialization to entangle the constructor a bit. In the meantime, I ran two more sets of experiments, both of which do not look as good as the last version. Here are the reports for the last ones, i.e., v4 (v3 looked very similar):
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1071/data/issue1071-v4-satisficing-eval/issue1071-v4-satisficing-compare.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1071/data/issue1071-v4-optimal-eval/issue1071-v4-optimal-compare.html

In another session, we had a look at these results together and also looked at the code diff since v2. We did not find anything suspicious there, so we don't have a good explanation for the decline in time scores, for example, unfortunately. Our best guess is that it has to do with compilation, especially because we could not reproduce the slow-down for any instance tested locally. This brings us to the suggestion to (once more) live with the fact that strange things happen in the landmark heuristics and performance is somewhat out of our control. While it is frustrating to continuously observe these declines in performance, they all serve the improved code quality to get a better understanding of what's actually going on and how we could implement that more efficiently (potentially from scratch) with optimized data structures.

I will implement some last, very minor changes from our last glance at the code and will then procede to merge, unless somebody objects by the time I'm ready. I still need to remove the experiments from the repository and write a proper commit message after all these changes.

This reminds me: One more thing we did since my last message was renaming the heuristics. The common trend in recent years was to have descriptive names for heuristics and other features that do not abbreviate. Additionally, we decided that "counting" is not really what happens in the inadmissible landmark heuristic since action costs were introduced; it rather sums up the minimal costs to achieve all remaining landmarks. Therefore, it will be renamed to "landmark_sum" heuristic, while the admissible one will be called "landmark_cost_partitioning".
msg10982 (view) Author: clemens Date: 2023-02-06.11:41:02
After the second batch of experiments, the numbers look a lot better. You can find the reports under the following links:
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1071/data/issue1071-v2-satisficing-eval/issue1071-v2-satisficing-compare.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1071/data/issue1071-v2-optimal-eval/issue1071-v2-optimal-compare.html

We observe an increase in memory requirement that mostly minor but very consistant. It seems to be almost constant when comparing different tasks. One hypothesis we came up with is the increased size of the binary because there are two more classes compared to before. Looking at the search-out-of-memory errors I assume this is no big deal; in some configurations the difference to before is positive, in others negative.

In the satisficing experiment, the total time scores of lama-first and lama-first-pref increase by 3.72 and 2.42, respectively. Even better, the search time scores improve by 4.81 and 3.53, respectively. Consequently, the improvement happens during search, meaning the preprocessing seems to slow down slightly. This can also be seen by the landmark generation times that generally go down in sum, but I attribute this to noise because we did not really touch this part of the code and it also seems to be inconsistent when for example comparing lama-first against lama-first-pref, which should be identical in this regard.

The lm-zg satisficing configuration slows down slightly according to the scores. In the optimal configurations, we again observe improved time scores, but not by as much as with the lama-first configs mentioned above.

Other than that, I find no eye-catching annomalies in the experiment data and think this is almost ready to merge. There are some more minor things to do beforehand:

1. TODO in code about documentation of preferred operators: Should there be a reference on the paper? Would lead to duplication in the Wiki entry for LM-count.

2. Decide how to deal with ambiguity between *compute_heuristic* and *get_heuristic_value* discussed in the last message and on the pull request.

3. Delete experiment scripts from the repository.

4. Write changelog entry (i.e., draft commit message).
msg10968 (view) Author: clemens Date: 2023-02-03.14:44:29
We are almost done with this issue -- sorry I forgot to keep the issue tracker up to date.

The way we implemented this consists of a base class *LandmarkHeuristic* that takes care of all parts required by both the admissible and inadmissible LM-count. In particular, the common parts are
- computing and storing the landmark graph,
- path-dependent update of reached landmarks, and
- computing preferred operators.
Note that the way preferred operators are commonly used (priority queue with only states reached via preferred operators) renders an admissible heuristic inadmissible. We nevertheless allow the admissible LM-count to do so because there is no good reason to use this heuristic for sub-optimal planning where preferred operators could be beneficial.

The base class also has an abstract function that is intended to compute the heuristic value in the derived classes.

Then, there are two classes that inherit from *LandmarkHeuristic*, namely *LandmarkCountHeuristic* (inadmissible) and *LandmarkCostPartitioningHeuristic* (admissible). LM-count still uses the old plugin name "lmcount" and the cost partitioning heuristic can be called under the name "lmcp" from the command line. The option "admissible=(true|false)" that existed previously does not exist anymore, as this distincition is now done through the choice "lmcount" or "lmcp". The documentation and other options are split based on which of the two settings used them / they belong to. Common documentation is in the base class.

We ran an experiment to check whether this affects performance. Apparently it does, as almost alle numbers in the summary are red:
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1071/data/issue1071-v1-satisficing-eval/issue1071-v1-satisficing-compare.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1071/data/issue1071-v1-optimal-eval/issue1071-v1-optimal-compare.html

The largest decrease in the scores is regarding time (search time dominates total time). One possible explanation is the additional indirection introduced by the hirarchy; computing the heuristic value (or rather result) now needs to go through 4 layers instead of 3 (Evaluator->Heuristic->LandmarkHeuristic->Landmark(Count|CostPartitioning)Heuristic instead of Evaluator->Heuristic->LandmarkCountHeuristic). I'm not sure this suffices as explanation, but if it could, we could of course test this by removing the common base class and just introduce duplicated code temporarily. The weird thing is that I cannot reproduce the slow-down locally. I tested several tasks where the base version required more than 1 minute and v1 required 30-40% more time. (I didn't do profiling, only ran the problems multiple times and looked at the time required to solve them locally.)

Since Remo and Salomé both did a code review since I prepared the experiment, which I incorporated today and yesterday, multiple things changed slightly again and I am running a second experiment before testing more locally; let's hope with these changes the difference is not as big anymore.

One thing I didn't incorporate yet is the ambiguity pointed out by Remo: All heuristics have a function *compute_heuristic*. The class *LandmarkHeuristic* implements this function (because lmcount and lmcp share some parts in it) and delegates the actual computation of the heuristic value to its derivatives in the abstract function called *get_heuristic_value*. It is not very clear why there are two functions referring to computing the heuristic value within the same heuristic implementation, so we might want to rename *get_heuristic_value* to something more descriptive or discuss other options to get rid of this ambiguity. I didn't have any good ideas so far, does somebody else have a suggestion?

As mentioned, further experiments are running and I will write again once they are done.
msg10954 (view) Author: remo Date: 2023-01-30.18:56:00
I added some very minor comments to the pull request 
(https://github.com/aibasel/downward/pull/143). Would definitely be good to get a 
more qualified review before merging.
msg10879 (view) Author: clemens Date: 2022-12-14.17:04:50
The class LandmarkCountHeuristic implements two more or less independent heuristics: 
(1) the inadmissible landmark counting heuristic as used for example in LAMA, and
(2) the admissible cost-partitioning heuristic over landmarks.
In the endeavor of cleaning up the landmark code, we would like to split these into two separate heuristic implementations. In general, I assume this change mainly requires splitting the different parts into two separate files/classes. However, there are probably some parts used by both, so we might want to move them into a third file (or maybe to utils). Maybe implementing an abstract base-class "LandmarkHeuristic" could be a reasonable solution.
History
Date User Action Args
2023-02-16 10:56:37clemenssetstatus: in-progress -> resolved
messages: + msg11028
2023-02-14 13:54:26clemenssetstatus: chatting -> in-progress
messages: + msg11015
2023-02-06 11:41:02clemenssetmessages: + msg10982
2023-02-03 14:44:29clemenssetnosy: + salome
messages: + msg10968
2023-01-30 18:56:00remosetnosy: + remo
messages: + msg10954
2022-12-14 19:25:45silvansetnosy: + silvan
2022-12-14 17:29:36maltesetnosy: + malte, jendrik
2022-12-14 17:04:50clemenscreate