Issue1075

Title Make Preferred Operators in LM-count less convoluted.
Priority wish Status resolved
Superseder Nosy List clemens, jendrik, malte, remo, salome, silvan, simon
Assigned To Keywords
Optional summary
Part of issue987.

Created on 2023-01-20.11:17:41 by clemens, last changed by clemens.

Summary
Part of issue987.
Files
File name Uploaded Type Edit Remove
parents-past-effect.png clemens, 2023-09-13.10:58:24 image/png
plot-costs.png clemens, 2023-01-31.17:49:25 image/png
Messages
msg11376 (view) Author: clemens Date: 2023-09-13.15:58:58
The changes are merged. Thanks everyone!
msg11375 (view) Author: malte Date: 2023-09-13.14:11:55
Yes, that was the idea.
msg11374 (view) Author: clemens Date: 2023-09-13.13:44:46
Thanks for the reviews Malte and Salomé!

> I assume this is not the code that was actually run in the experiments because it lacks the options that we explored, but in my view this is fine if you're confident that the code is what we want.

Yes, we implemented the things as options for the experiment but they were never intended to stay that way. The new implementation corresponds to the option "future" from the experiments. I understood our discussion from yesterday that this is what we want: A simple and clear definition that works well in practice.
msg11373 (view) Author: malte Date: 2023-09-13.12:29:20
I had a look at the pull request. I assume this is not the code that was actually run in the experiments because it lacks the options that we explored, but in my view this is fine if you're confident that the code is what we want.

I left a question regarding conjunctive landmarks on github.
msg11372 (view) Author: malte Date: 2023-09-13.12:22:38
It's interesting to see that the comparison to LAMA shows that using landmark preferred operators increases expansion/evaluation/generation scores, which means that the preferred operators do improve the guidance of the search. The reason why landmark preferred operators should still be disabled for best performance is therefore likely the computational overhead.

This could change once we actually make an effort to make the landmark heuristic computationally efficient, so that with a more efficient implementation, the best configuration might indeed be LAMA with "legacy" preferred operators. But in my view that's not a good enough reason to keep the legacy preferred operators. Having a clean definition of preferred operators is too attractive not to make this change.
msg11371 (view) Author: clemens Date: 2023-09-13.11:25:45
I created a pull request: https://github.com/aibasel/downward/pull/180
msg11370 (view) Author: clemens Date: 2023-09-13.10:58:24
We have discussed thoroughly offline the potential causes of the effects we see in the experiments. It always comes down to the fact that satisficing planner behaviour is very hard to predict, especially in cases that feature as many individual (and mostly independent) components as LAMA. Since the changes we want to make in this issue only really affect the landmark-part of LAMA, we changed the experiment setting to only consider the landmark heuristic and removed FF from the equation. And with this, we actually observed what we had anticipated in the beginning.

https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1075/data/issue1075-v4-without-ff-eval/issue1075-v4-without-ff.html

Considering all future landmarks as interesting increases coverage by 30 or more tasks. Considering only those where also all parents are past doesn't make a big difference, though. Only in the Schedule domain, it seems to have a visible effect (see the attached plot).

Looking at the scores, the effect is even stronger. From legacy to future, the "generated" score goes up by ~100 and the score for expansions and evaluations by ~150. Going to parents-past improves those further by ~5.

This experiment also confirms what we have already seen before regarding preferring simple landmarks: Ignoring the distinction between simple and disjunctive landmarks yields slighlty better results, improving coverage by 3-10 tasks in each individual interesting-if methods.

So based on this experiment, Malte, Salomé and I agreed that it is time to replace the "legacy" definition of when a landmark is interesting. We prefer the "future" definition, as it is sleak and simple, and also cheaper to compute. The effect of "parents-past" is marginal and mostly explained by a single domain.


One more thing we discussed is the negative effect the change has on the LAMA-like configurations from the previous experiments. We decided they are not important because the LAMA alias we recommend is not affected by the changes since it does not compute preferred operators anyways (option pref=false). Not using preferred operators denotes a difference to the original LAMA planner, though, which was introduced at some point because not using them apparently improved performance. We verified this by running another experiment where we included lama-first and all of our different interesting-if definitions (simple landmarks not preferred). 

https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1075/data/issue1075-v4-vs-lama-eval/issue1075-v4-vs-lama.html

Indeed, not using preferred operators yields higher coverage than with the "legacy" variant, which in turn has higher coverage than "future" and "parents-past". So this encourages our decision, since the LAMA aliases can remain independent of the definition when a landmark is interesting. Moreover, when we only look at landmark heuristics, the changes always improve planner performance in terms of coverage and running time significantly.


So the next step would be to turn the choice of which method to use for defining interesting landmarks into a fixed method. Then I'll ask around for a quick code review and then we're hopefully soon ready to merge and close this issue.
msg11366 (view) Author: malte Date: 2023-09-11.13:54:40
It's probably a good idea to look at the individual domains, not just the totals, in cases like these.
msg11365 (view) Author: clemens Date: 2023-09-11.13:26:43
Now that issue1036 was merged, we decided to reopen the investigation regarding definitions of preferred operators. As a first step, we just ran the same experiment again. Unfortunately, the image did not become any clearer...
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1075/data/issue1075-v4-first-eval/issue1075-v4-first.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1075/data/issue1075-v4-anytime-eval/issue1075-v4-anytime.html

Plan quality (parcprinter filtered out):

    | LAMA-anytime | LAMA-first | hm1-first
----|--------------|------------|-----------
LSt |       650664 |     404863 |    380056
LSf |       651133 |     404863 |      "
FSt |       653450 |     405182 |    380487
FSf |       656632 |     405401 |      "
PSt |       657805 |     405199 |    380487
PSf |       656579 |     405418 |      "

The ranking by lowest plan costs are:
LSt, LSf, FSt, PSf, FSf, PSt (LAMA-anytime)
LSt, LSf, FSt, PSt, FSf, PSf (LAMA-first)
L, F/P (hm1)
This time around, the Legacy definition of interesting landmarks seems to be dominating the plan quality metric. Also, simple=true seems to be preferrable most of the time (LAMA-anytime with P is the only exception).

Coverage:
    | LAMA-anytime | LAMA-first | hm1-first
----|--------------|------------|-----------
LSt |         2223 |       2223 |      1520
LSf |         2225 |       2226 |       "
FSt |         2212 |       2214 |      1528
FSf |         2216 |       2216 |       "
PSt |         2212 |       2213 |      1528
PSf |         2216 |       2217 |       "
Here we observe that simple=false consistently improves upon coverage by 2-4 tasks. Moving away from Legacy, however, leads to lower coverage with both LAMA variants but increases coverage in the case of hm1.

At this point, we don't have a good explanation why we see what we see. I will meet with Salomé to discuss how we could investigat this further.
msg10955 (view) Author: clemens Date: 2023-01-31.17:49:25
We are exploring some options as part of the ongoing sprint. Since the last message, we've had many offline discussions, we ran some experiments, and we've had more discussions. I'll try to summarize the current standings here.

We have implemented three options to decide whether a landmark is interesting or not. We call them "legacy" (the implementation on the current main branch), "future" (Option 1 below; all future landmarks are interesting), and "parents  reached" (Option 2 below; all future landmarks for which all predecessors in the landmark graph are reached).

We have also identified another ambiguity that we wanted to try turned on and off. Namely, the implementation on the main branch chooses to prefer only operators that achieve simple landmarks (i.e., those that say a particular fact has to be true on its own at some point), if such a landmark is marked interesting. Otherwise, it also prefers those operators that reach a disjunctive landmarks. (Conjunctive landmarks are never interesting, see issue1072.) We implemented that this feature can be turned on and off, i.e., one can choose to use this rule to prefer simple landmarks over disjuncitve ones ("simple=true") or to consider them equally ("simple=false").

In our experiment, we checked the whole cross product of these two options. We call them:
- LSt (legacy-simple=true, current implementation on the main branch)
- LSf (legacy-simple=false)
- FSt (future-simple=true)
- FSf (future-simple=false)
- PSt (parents_reached-simple=true)
- PSf (parents_reached-simple=false)
We write L, F, and P to refer to the method ignoring the second option.

Here are the full reports:
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1075/data/issue1075-v2-first-eval/issue1075-v2-first.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1075/data/issue1075-v2-anytime-eval/issue1075-v2-anytime.html

We see no clear picture in this data. Neither of the options yield a clear winner in either the anytime configurations nor the configurations that take the first solution they find.

Plan quality:
Since parcprinter dominates the report due to its large action costs, we filter it out for this summary. (Note that we cannot really compare between anytime and first configurations because the numbers stem from different reports. In particular, the reports only sum/average over instances solved by all configurations in that reports, so the sum of costs is lower for first configurations because hm1 solves significantly fewer problems.)

    | LAMA-anytime | LAMA-first | hm1-first
----|--------------|------------|-----------
LSt |       654177 |     398428 |    374652
LSf |       655342 |     398492 |      "
FSt |       657030 |     398995 |    375047
FSf |       661213 |     399356 |      "
PSt |       654849 |     397429 |    375285
PSf |       657484 |     399501 |      "
(Since hm1-landmarks are never disjunctive, *St and *Sf are identical.)

The orders bye lowest plan costs are:
LSt, PSt, LSf, FSt, FSf, PSf (LAMA-anytime)
PSt, LSt, LSf, FSt, FSf, PSf (LAMA-first)
L, F, P (hm1)
It looks like there is some preference for the legacy method and for preferring only operators that achieve simple landmarks if there are any. I also add a scatter plot of the costs that has LSt on the x-axis and values for all others on the y-axis.

Coverage:
    | LAMA-anytime | LAMA-first | hm1-first
----|--------------|------------|-----------
LSt |         2367 |       2367 |      1523
LSf |         2367 |       2369 |       "
FSt |         2356 |       2358 |      1531
FSf |         2355 |       2360 |       "
PSt |         2361 |       2362 |      1519
PSf |         2362 |       2369 |       "

There seems to be a slight gain in *Sf compared to *St. With hm1 there's an improvement in F* over L*, but the opposite is the case in both LAMA configurations. hm1 solves 10 tasks more in the schedule domain and 5 more in floortile, but also looses multiple tasks in several other domains.

The image looks similar when looking at time scores and other metrics. There are tasks which were previously solved within less than a second and now they time out. This holds also when comparing the corresponding configurations locally. We did not look much closer in such cases because we have no clue about what to look at...

This data suggests that the performance suffers from simplifying the definition of preferred operators rather than that it benefits. We think this could be related to the flawed progression that should be fixed in a different issue: The progression can currently lead to landmarks marked future even though they are not. If this happens, it is because thay have unreached parents that were actually already reached. Consider the following example landmark graph:

A --gn--> B <--r-- C

There are three simple landmarks caring about the propositions A, B, and C, respectively. The orderings tell us A must hold in the state before B holds for the first time, and achieving C requires to make B false but B must hold at some point after C holds for the first time.

Now assume the following state sequence ("plan") where states are represented by some id followed by the propositions that hold in that state in parentheses:

s0(), s1(A), s2(B), s3(C), s4(B)  

The sequence satisfies all orderings: A holds right before B holds for the first time and B holds after C holds for the first time.

Next, let us explain what happens with the different methods for deciding which landmarks are interesting. We denote by "past" and "fut" sets of landmarks that correspond to how LAMA progresses (incorrectly) which landmarks are past and future in each state.

                      ||      i n t e r e s t i n g
state | past  | fut   || legacy | future | parents reached
------|-------|-------||--------|--------|-----------------
s0()  |       | A,B,C || A,  C  | A,B,C  | A,  C
s1(A) | A     |   B,C ||     C  |   B,C  |     C
s2(B) | A     | A,B,C ||     C  | A,B,C  | A,  C
s3(C) | A,  C | A,B   ||   B    | A,B    | A,B
s4(B) | A,  C | A,B   ||   B    | A,B    | A,B

We can see that the sets of interesting landmarks are larger for "future" and "parents reached". More specifically, starting from s2 they contain A even though A does not have to be reached anymore after s1. This is of course due to the faulty progression function used by LAMA (because A is still considered future) but the legacy preferred operators ignore it anyways because it was reached before (i.e., is past). In other words, it compensates for the sub-optimal definition of LAMA progression with a (in our opinion) sub-optimal definition of interesting landmarks.

Of course this example is artificial and we have no clue whether such cases exist in the IPC benchmars. This just illustrates that it is possible to obtain strange interactions with the progression where the legacy implementation could actually benefit from its more restrictive and counterintuitive definition.

In conclusion, we think it probably makes sense to defer looking deeper into this issue until the progression stuff is integrated.
msg10944 (view) Author: clemens Date: 2023-01-26.09:53:09
Salomé, Simon, and I had a first discussion about this. We decided to start by replacing the current definition of interesting landmarks with two new alternatives and try both of them experimentally. To explain these two options, let me first introduce new terminology (stemming from our yet unpublished paper about landmark progression). Specifically, rather than talking about reached, unreached, and needed-again landmarks, I will henceforth distinguish between "past" and "future" landmarks. Past landmarks are those that have been reached, including those that we found to be needed again. Future landmarks are those that are either unreached or needed again. It is hence possible for a landmark to be both past and future. (Note that due to the flawed progression executed by LAMA, it is actually not true that all landmarks that are considered future are actually future by the definition in our paper, but this is a different story... They are partially touched by issue383 and issue1036.)

Option 1: All future landmarks are interesting. 
Option 2: All future landmarks with only past parent landmarks in the landmark graph are interesting.

We think option 1 is a good baseline and option 2 goes in the direction of the original definition but is aware of needed-again landmarks all the time. Experiments are running, we'll keep you update.
msg10934 (view) Author: clemens Date: 2023-01-20.11:17:41
The landmark count heuristic prefers operators that reach an "interesting landmark". The definition of when a landmark is interesting, seems somewhat arbitrary though: A landmark L is interesing if
(1) all landmarks were reached in the past, L does not hold in the current state but must hold in the goal, or
(2) if L has not been reached but all its parents in the landmark graph are reached.
Furthermore, only disjunctive and simple landmark can be interesting, but not conjunctive ones (see also issue1072).

With criterion (1) the implementation touches on the fact that some landmarks become "needed again" when the search progresses. This can also be the case for other reasons than L being required in the goal, but this is completely ignored as long as not all landmarks have been reached. 
When working on issue1070 we also found that a bug that changed the definition of interesting landmarks rather randomly lead to improved coverage (see msg10916). We would like to investigate this strange behaviour because we think it hints to potential for improvement. Thinking about clearer, less convoluted ways to define when a landmark is interesting might help.
History
Date User Action Args
2023-09-13 15:58:58clemenssetstatus: in-progress -> resolved
messages: + msg11376
2023-09-13 14:11:55maltesetmessages: + msg11375
2023-09-13 13:44:46clemenssetmessages: + msg11374
2023-09-13 12:29:20maltesetmessages: + msg11373
2023-09-13 12:22:38maltesetmessages: + msg11372
2023-09-13 11:25:45clemenssetmessages: + msg11371
2023-09-13 10:58:24clemenssetstatus: chatting -> in-progress
files: + parents-past-effect.png
messages: + msg11370
2023-09-11 13:54:40maltesetmessages: + msg11366
2023-09-11 13:26:44clemenssetmessages: + msg11365
2023-01-31 17:49:40clemenssetnosy: + remo
2023-01-31 17:49:25clemenssetfiles: + plot-costs.png
messages: + msg10955
2023-01-26 09:53:09clemenssetnosy: + simon
messages: + msg10944
2023-01-20 11:17:41clemenscreate