Issue1009

Title Compute landmark costs in all landmark factories
Priority bug Status resolved
Superseder Nosy List clemens, jendrik, malte, remo, salome, silvan, thomas
Assigned To Keywords
Optional summary
This is part of meta issue987.

Created on 2021-02-11.13:26:16 by salome, last changed by remo.

Summary
This is part of meta issue987.
Messages
msg10798 (view) Author: remo Date: 2022-07-19.15:12:25
Clemens reviewed the code, leading to some minor changes and fixes.

As a result of a discussion between Malte and Salomé, we decided to handle the
cost of derived landmarks in a way that does not require an additional option.
The idea is to over-approximate the set of achievers of derived landmarks by
considering all operators, consequently setting the cost of such landmarks to
the minimal cost across all operators. This means that derived landmarks are
assigned a cost of one when using a task transformation to unit cost.

With these final changes we merged the issue and I am marking it as resolved.
msg10790 (view) Author: salome Date: 2022-07-18.17:20:53
Since the coverage still dropped significantly for domains with derived predicates, we added an option which allows us to count derived landmarks as well. The results can be seen here:
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1009-v2-satisficing-issue1009-base-issue1009-v3-compare.html

For the non-lama configurations the coverage stays the same except for hm, where we lose three tasks. I would count this as v3 being unlucky, since in all three lost problems base took more than 1600 seconds to solve them.
For the lama configurations, we gain between 5 and 14 tasks, which is most likely due to the fact that now *all* derived landmarks are counted, not just the ones true in the initial state. I am a bit surprised that this effect is much stronger for lama-second and don't really have a good explanation for it.
Even though we gain tasks, lama-first has a bit worse search time scores. I'm guessing this has to do with the fact that the expansions can actually also increase significantly, for example in psr large p42-s203-n60-l3-f30 we almost double the amount of expansions.

We also ran some optimal configurations to test that our changes don't affect the optimal configs. These experiments were run on v2 where we didn't implement this new option yet, but I don't think we need to run them on v3 again; the option influences a piece of code that can only be reached in the inadmissible setting. Our optimal experiments mainly should confirm that we don't break anything and that the two new vectors that are created for *all* lmcount heuristics don't have a negative impact.
Here are the results: 
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1009-v1-optimal-issue1009-base-issue1009-v2-compare.html
(The errors come from accidentally running the parsers in the wrong order, we reparsed in the right order but we assume once a message is in the error log it won't get removed.)

Most notably, for hm we loose both tasks in tidybot-opt14 which base solved with 1571 and 1732 seconds respectively (total time). This is also the main cause for the drop in score-search-time. Otherwise some tasks flipped from out-of-time to out-of memory, most notably in exhaust with 11 such flips. This might have to do with the fact that dead-end detection is done a bit different now, and looking at the time score in exhaust it seems that at least for exhaust it is more efficient now.


All in all I'm quite happy with the results now, but there are three minor points we could still investigate further if someone is unhappy about them:
 1) the loss of time score in lama
 2) hm seemingly getting very unlucky compared to base
 3) the flip from out of memory to out of time in the optimal setting
msg10773 (view) Author: remo Date: 2022-07-15.13:06:28
Salomé and I setup and ran a second round of experiments with the aim to fix the
issues mentioned in msg10770:

- Problem 1), where lm_hm printed too much output, was remedied by pulling
  Silvan's fix.

- We handled Problem 2) by adjusting configurations such that they become
  comparable across revisions. In particular we ran lmcount with the flag
  "transform=adapt_costs(one)" for revisions v1 and v2, which emulates the
  behavior of the base revision, where all landmarks are assigned a cost of 1.

We compare the base revision to two revisions implementing our proposed
solution. Revision v1 moves the landmark cost attribute into the lmcount
heuristic, where its value is determined as described in msg10723. Revision v2
additionally implements the change in dead-end detection proposed by Salomé in
the same message.

Comparing satisficing configurations between base with v2 yields the following
results:
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1009-v1-satisficing-issue1009-base-issue1009-v2-compare.html

While there is a drop in coverage across all configurations, the only
significant changes occur in the three domains psr-large, psr-middle, and
philosophers. For the LAMA configurations, an explanation is provided in
point 3) of msg10770. In short, some derived landmarks are falsely assigned a
value of 1 instead of 0 in the rhw factory of the base revision, which are now
assigned 0 in revisions v1 and v2. For the other non-rhw configurations a
similar effect seems to be responsible for the drop in coverage. In the base
revision all landmarks, including derived landmarks, are assigned a cost of 1.
In the new code, the cost of derived landmarks is explicitly set to 0.

Given these explanations, we think that the loss in coverage can be justified by
the fact that the code handles derived landmarks more consistently now. If we
want to avoid the performance drop, we would presumably have to discuss
alternatives for handling derived landmarks.

The comparison between v1 and v2 looks as follows:
https://ai.dmi.unibas.ch/_tmp_files/christen/issue1009-v1-satisficing-issue1009-v1-issue1009-v2-compare.html

Coverage remains virtually the same, time scores drop slightly for LAMA and
increase slightly for other configurations. Seeing that the changes made from v1
to v2 simplify the dead-end detection code considerably, it seems worth it to
stick with v2.

We also started experiments to verify that the performance in the optimal case
is not affected by the proposed changes, I will post the results once they are
available.
msg10770 (view) Author: salome Date: 2022-07-14.16:55:06
We did a first round of experiments which uncovered a couple of problems with the experiment design:

1) lm_hm generated too much output and thus a significant amount of runs didn't finish. Silvan fixed this on the main branch already, so we rebased the issue branch to include this fix.

2) lm_exhaust and lm_merged lost a significant number of tasks (~200). We discussed this with Malte, it happens because in satisficing planning it can be counterproductive to look at operator costs: a state that can reach the goal in one step with cost 200 is then less desirable than a state that can reach the goal in 100 steps, each costing one; but expanding the former state would lead to a solution much faster.
We thus cannot really compare the old and new implementation as is. Instead we will start a new experiment where we use cost transformation in v1. We might also want to add documentation that satisficing planning is likely to benefit from this transformation, but we're unsure where the best place for this might be.

3) all lama configs are based on rhw, which always considered costs, and use the cost transformation (one or plusone), so we assumed that base and v1 will be similar here. The only difference is that base always uses the minimal cost of the *first* achievers, while v1 does this only if the landmark is not reached yet (otherwise it uses the minimal cost of *all possible* achievers).
However, v1 looses 20+ tasks (i.e. about half) in psr large. This happens because base sometimes computes the cost for derived landmarks wrong: It always initializes the cost of a landmark to 1, and then later updates it to the minimal cost of the first achievers. It however does not do this update if the landmark is true in the initial state; meaning all derived landmarks that are true in the initial state will have cost 1.
Since the behaviour of base is clearly unintended and a bug; I suggest that we just live with the loss... The only other alternative I see is that we need to (re)think what derived landmarks are and which costs would make sense.
msg10765 (view) Author: malte Date: 2022-07-13.10:11:16
It's good the way it is, with plusone on the heuristic so that there is an incentive towards to proceed towards the goal even when this requires zero-cost actions. No reason to use plusone for the g values.
msg10761 (view) Author: salome Date: 2022-07-13.09:32:25
"lama-second" uses no transformation on the search, but transforms both lmcount and ff to "PLUSONE". Should we do it like that or remove the transformation on the heuristics as well?
msg10755 (view) Author: malte Date: 2022-07-12.12:43:09
Looks like a good set of configurations to me.

I would recommend adding a configuration like lama-first, but not ignoring the action costs. This could be something like "lama-second", i.e., the second configuration run by LAMA, if I recall correctly.
msg10753 (view) Author: salome Date: 2022-07-12.12:33:45
We are ready to run a first round of experiments. The code doesn't address caveat 2 yet (reimplement the dead-end-detection) but we're not sure yet if the reimplementation is actually a good idea, so it's useful to test a version without it.

We are not entirely sure which configurations we want to test. The code only changes the inadmissible version of lmcount, but it's probably a good idea to run some optimal configs as well to ensure that they are not affected. I suggest the following, which is mostly the configs from issue987 but with a few changes, additions and a removal:

satisficing:
- lama-first (alias)
- like lama-first however with pref=True in lm_rhw
- lmcount(lm_merged([lm_zg(),lm_rhw()]) *altered*
- lmcount(lm_exhaust()) *new*
- lmcount(lm_hm(m=2)) *new*

optimal:
- --evaluator 'lmc=lmcount(lm_exhaust(),admissible=true)' --search 'astar(lmc,lazy_evaluator=lmc)'
- --evaluator 'lmc=lmcount(lm_hm(m=2),admissible=true)' --search 'astar(lmc,lazy_evaluator=lmc)'
- --alias seq-opt-bjolp
(I removed the seq-opt-bjolp with optimal cost partitioning since I don't think we need to test LP interaction)

I also suggest to use a 30min timeout instead of the 5min timeout we recently often used for landmark issues since the issue changes the heuristic estimates and I would expect quite some differences.
msg10744 (view) Author: malte Date: 2022-07-11.16:10:42
Sounds good! However, assertions are only useful if we also test the planner in debug mode.
msg10743 (view) Author: salome Date: 2022-07-11.16:09:42
I looked a little bit into the code to see how the factories handle first/all achievers. The base LandmarkFactory class has a member "achievers_calculated", which is set the following way:
 - all relaxation based factories set it to true in calc_achievers(), which is called by postprocess() and computes both fist and possible achievers.
 - the hm factory also sets it on true in its calc_achievers(), which is also called by its postprocess(). However, this method only computes possible achievers. First achievers are already computed during compute_h_m_landmarks().
 - the merged factory set is it to true if it's true in all subfactories.
 - the reasonable orders factory set it to the value of its (only) subfactory.

In summary, all factories do compute both first and possible achievers. Whether or not this computation is correct is a different question. However, given that the admissible heuristic already uses achievers as input to compute the cost partitioning, I would say that we do not need to look further into whether or not the computation is correct (at least not for this issue here).

I would also suggest that lmcount should add an assertion checking that "achievers_calculated" is true after the factory computed the graph.
msg10732 (view) Author: clemens Date: 2022-06-08.16:48:05
Remo and I started working on this today. We have so far removed the landmark cost attribute from the landmarks and compute the landmark costs in the setup of the heuristic as discussed in msg10723. The mentioned caveats remain open TODOs. Nevertheless, here's a link to a pull request draft for our code: https://github.com/aibasel/downward/pull/104. We do not require a code review yet, it's just so everybody can access our progress.
msg10728 (view) Author: malte Date: 2022-05-30.11:32:17
Sounds good. Having attributes that are only used in one setting of the heuristic but not another is a code smell, but it just contributes to an already existing divergence between the admissible and inadmissible case and therefore isn't an obstacle for this issue in my view. We can address this separately at a later point. 

Moving the cost into the heuristic is a step in the right direction.
msg10723 (view) Author: salome Date: 2022-05-25.12:04:12
In issue1025 (which is a duplicate of this issue) Thomas suggested to store the cost in the heuristic instead, which fits to option 3) from Malte here.

As far as we're aware lmcount is the only heuristic currently using this cost attribute, and it uses it only in the inadmissible setting. We thus suggest that lmcount should store two values for each landmark (and it should only store it in the inadmissible setting): 
1) the minimal cost over the *first* achievers, and 
2) the minimal cost over *all* achievers. 
These values can easily be computed for all factories by iterating over all/the first achievers of each landmark when initializing the heuristic.

Some caveats:
 - We are not sure yet whether all factories set the first/all achievers correctly (especially the hm factory). We will double check this.
 - As discussed in issue1011, we use empty achievers to signal that a landmark cannot be achieved. This is currently also the only way to detect dead-ends in lmcount. We could see if we can make the dead-end detection more elegant by using the stored cost (which would be infinity) instead.
msg10047 (view) Author: malte Date: 2021-02-11.21:38:06
I think option 3) is the clean one, don't store costs in the landmark nodes/landmark graph at all.

A landmark doesn't have a cost as part of its definition. The cost is a property that is externally associated with the landmark. Landmarks were used for a long time for things like goal ordering before people ever thought of associating costs with them; costs only became a thing when people started using landmarks for heuristics. I think it makes more semantic sense to handle the costs on the side of the heuristic.

It should also give us some nice efficiency gains because I think there are currently places inside the heuristic where we reach inside a landmark node only for the purpose of fetching its cost.
msg10031 (view) Author: salome Date: 2021-02-11.13:26:15
Each landmark node has a member cost (previously called min_cost), which is set differently depending on the factory: The lm_rhw factory sets cost to the minimum cost of all (first?) achievers, but all other factories just leave the cost at 1. We instead want that the landmark cost is set to the same value no matter which factory is used.

I see two ways to implement this:
1) Set the cost in the constructor of LandmarkNode. This is similar to the current behavior of lm_rhw, which sets the cost of each node right after creating it.
2) Let lmcount set the cost. This avoids computing costs for landmarks which are later discarded, but is not that nice from a design perspective since it changes the landmark graph after it has been created by the factory (something we were getting away from in issue988). We could also store the landmark cost in the heuristic instead, but this doesn't seem very clean to me either.

For either option we loose some synergy with lm_rhw since it uses the result of a previous computation, but this is unavoidable if we want to compute costs for other factories. But we might anyway need to rethink how to compute costs correctly and how to handle 0-cost actions.
History
Date User Action Args
2022-07-19 15:12:25remosetstatus: chatting -> resolved
messages: + msg10798
2022-07-18 17:20:53salomesetmessages: + msg10790
2022-07-15 13:06:28remosetmessages: + msg10773
2022-07-14 16:55:06salomesetmessages: + msg10770
2022-07-13 10:11:16maltesetmessages: + msg10765
2022-07-13 09:32:25salomesetmessages: + msg10761
2022-07-12 12:43:09maltesetmessages: + msg10755
2022-07-12 12:33:45salomesetmessages: + msg10753
2022-07-11 16:10:42maltesetmessages: + msg10744
2022-07-11 16:09:42salomesetmessages: + msg10743
2022-06-08 16:48:05clemenssetnosy: + remo
messages: + msg10732
2022-05-30 11:32:17maltesetmessages: + msg10728
2022-05-25 12:04:12salomesetmessages: + msg10723
2021-02-11 21:38:07maltesetmessages: + msg10047
2021-02-11 13:26:16salomecreate