Issue996

Title Remove cycle-breaking code in landmark factories
Priority wish Status resolved
Superseder Nosy List clemens, jendrik, malte, salome, silvan, thomas
Assigned To thomas Keywords
Optional summary
This is part of meta issue987.

Created on 2021-01-27.20:49:10 by thomas, last changed by clemens.

Summary
This is part of meta issue987.
Files
File name Uploaded Type Edit Remove
domain.pddl clemens, 2023-08-25.11:01:14 application/octet-stream
problem.pddl clemens, 2023-08-25.11:00:58 application/octet-stream
Messages
msg11360 (view) Author: clemens Date: 2023-09-07.17:39:44
Merged, thanks everyone.
msg11359 (view) Author: clemens Date: 2023-09-07.16:41:26
We changed the code to set a flag whether a cycle of natural orderings was found during preprocessing. If this flag is set, the heuristic always returns infinity as the heuristic value.

I have also changed some other stuff like adding or changing comments and log output, but this is really the main thing that changed since the last version. Nevertheless, I ran another experiment set of experiments and this is what we got (comparing v2 to v3, not base to v3):
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue996/data/issue996-v3-satisficing-v2-eval/issue996-v3-satisficing-v2-issue996-v2-issue996-v3-compare.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue996/data/issue996-v3-optimal-v2-eval/issue996-v3-optimal-v2-issue996-v2-issue996-v3-compare.html

We didn't expect significant changes and are once more surprised by how different the results look. So we looked at the plot of the runtimes (e.g. https://ai.dmi.unibas.ch/_visualizer/?properties_file=https%3A%2F%2Fai.dmi.unibas.ch%2F_experiments%2Fai%2Fdownward%2Fissue996%2Fdata%2Fissue996-v3-optimal-v2-eval%2Fproperties.tar.xz&xattribute=search_time&yattribute=search_time&y_range=%5B0.4371835156202792%2C+11.0%5D&x_range=%5B0.009000000000000001%2C+11000.0%5D&entries_selection_mode=indicate+versions+by+substring&x_entries_substring=v2&y_entries_substring=v3&relative=True)
There clearly is a pattern to see there, but we don't believe this has to do with the changes. Note that the experiment ran on infai_3, the new partition we don't have much experience with so far. We have observed similar "outliers" before, I think, and could track it down to hyperthreading or something on one of the compute nodes of the cluster. Maybe there's something similar to be uncovered here? We'll start an investigation by informing everyone on the Fast Downward Discord and potentially suggest it as a topic for the upcoming sprint.

Zooming in at the points that ended up on the center line looks like the differences can be explained by random noise. At least, we don't see a systematic pattern of increased runtime of v3 over v2 there. Since Malte already gave green light to merge for the results of v2 and also reviewed the code of v3 already, we would go ahead and merge this now.
msg11350 (view) Author: malte Date: 2023-09-06.13:59:26
I'm done with the code review.

I only looked at the experiment summaries very superficially, and what I saw looks good to merge.

From my side the main thing to do before merging is to address the code review questions regarding the exiting on unsolvable tasks.
msg11298 (view) Author: clemens Date: 2023-08-30.12:14:07
After the change, I ran the experiment again. Here are the reports:
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue996/data/issue996-v2-satisficing-eval/issue996-v2-satisficing-issue996-base-issue996-v2-compare.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue996/data/issue996-v2-optimal-eval/issue996-v2-optimal-issue996-base-issue996-v2-compare.html

The results are a bit worse than before I would say, because while coverage only increased in v1, it sometimes decreases in v2. Also, the scores change negatively more often than before, but Salomé and I could not find anything suspicious enough to trigger a deeper investigation. We have looked explicitly on a scatter-plot of the total time and some of the scores and it looks like it's purely random noise.

This means, we think this issue is ready to be merged. However, we would like Malte's feedback first, on whether he's fine with the solution for detecting cycles of natural orderings. And if desired, please also take a look at the results and tell us if you find something you'd like to discuss before we merge.
msg11291 (view) Author: clemens Date: 2023-08-25.11:00:58
My first version removed a piece of code that checked whether the landmark graph has cycles of natural orderings in which case the problem is unsolvable. This was indicated by removing all first achievers from the landmarks in such cycles (see also issue937). This was done in the mk_acyclic function and we also removed all natural orderings involved to fit the name and the rest of the implementation.

Now that I removed the mk_acyclic completely, this logic also disappeared. To not loose the functionality to detect unsolvability based on cycles of natural orderings, I implemented a new function in the landmark heuristic(s) which checks exactly that. It terminates the planner even before the search component is called. I don't know whether this is a proper solution, but it made sense to me and was easy enough to implement for now. If we would like to deal with this differently, maybe it makes sense to open a new issue for this.

For testing, I will add a PDDL problem which causes a cycle of natural orderings when called with --alias lama-first (from the Fast Downward mailing list as mentioned in msg9025 in issue937).
msg11288 (view) Author: clemens Date: 2023-08-23.14:44:17
You can find a pull request for these changes here: https://github.com/aibasel/downward/pull/172

Salomé already volunteered to do a code review, thanks!

I also ran some experiments and had a quick look through the results with Salomé:
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue996/data/issue996-v1-optimal-eval/issue996-v1-optimal-issue996-base-issue996-v1-compare.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue996/data/issue996-v1-satisficing-eval/issue996-v1-satisficing-issue996-base-issue996-v1-compare.html

We didn't find anything concerning. Even better, the majority of the effects seem to be positive, which is surprising given that landmark issues tend to shock us with unexpected negative behaviour.

Note that variations in the performance are only really expected for configurations using reasonable orderings, because without them, cycles should only occur in unsolvable tasks. The configurations to focus on hence are: lm_hm2 (opt), lama-first (sat), lama-first-pref (sat), and lm_zg (sat).

Some observations about the optimal case:
- We observe no changes in the configurations without reasonable orderings.
- We solve 3 more tasks with lm_hm2, 2 in snake-opt18-strips and 1 in tidybot-opt11-strips.
- The landmark generation time of lm_hm2 goes down, which makes sense since it previously included the logic to break cycles which we removed.
- The search score of lm_hm2 goes up by 1.41. This seems to be pretty much due to the additionally solved tasks: +0.19, +0.60, and +0.61 (= +1.40). Consequently, the gain in heuristic accuracy due to more orderings seems to be negligible, apart from these three tasks at least.
- There is one task where 33 more states are expanded before the last f-layer of A*. As we show in our paper, this can happen even with more informed landmark heuristics due to their path-dependent nature; while more orderings dominates the heuristic along one path, it might lead to less information when multiple paths lead to the same state but not all of them are discovered. Then, that state might be expanded when considering more orderings when it would not be expanded with fewer.

Some observations about the satisficing case:
- lm_zg seems to not be affected. In particular, the changes lead to the same number of orderings over all, meaning there were never orderings removed due to cycles in lm_zg.
- The LAMA configurations solve 1 and 2 more problems (one each in trucks-strips, and lama-first-pref one more in transport-sat14-strips).
- Plan quality is slightly improved.
- Landmark gaph generation time goes down consistently in the majority of the domains. These times generally seem to be rather noisy, though, so that's probably something worth looking at in a separate issue. Thoughts?
- Time scores go up, but not by much. The improvement is not limited to the additionally solved tasks, though, but spreads consistently through most domains.

Overall, I think this should be almost ready to be merged. I don't see a strong argument against it in terms of the experiment.

It may be worth pointing out the relationship of this issue to issue994. It seems like issue994 was introduced to introduce a command line option to decide whether cycles should be kept or broken. Shortly after that, we must have had the idea to do so using a landmark factory that makes the landmark graph acyclic (as discussed below), and opened a new issue for that. Now that we've changed the objective of this issue to get rid of cycle-breaking entirely, I think issue994 is deprecated. I will add a note there as well and probably close it soon unless somebody objects sooner.
msg11286 (view) Author: clemens Date: 2023-08-22.10:09:11
By "this code" I meant everything related to breaking cycles in the main branch of the Fast Downward repository, not the lm_acyclic factory. Although, the lm_acyclic code would of course also go to waste, since we don't have a good reason to support that at the moment, unless we would like to maintain something that is close to what LAMA did in the past. I don't have strong feelings about this, but if we evaluate dropping the cycle-breaking code entirely and we find that it improves performance, I don't see the point in an lm_acyclic factory. So maybe let's wait with that decision until we have these results?
msg11270 (view) Author: malte Date: 2023-08-07.19:32:51
By "this code" you mean the lm_acyclic factory? Yes, we don't want to introduce this any more as far as I am concerned.

Or by "this code", do you mean the existing cycle-breaking code? If we remove it (which we of course plan to do), we should run experiments to see what the experimental impact is.
msg11263 (view) Author: clemens Date: 2023-08-04.14:34:56
The discussion below is outdated. We now have a good understanding of why cycles were broken in the past: original LAMA would never accept (i.e., consider reached) landmarks that occur in a cycle. However, the landmark status manager responsible for this was refactored in issue1036 and this problem is no longer relevant. I'm confident it is okay to have cycles in the landmark graph now and we can skip the step of moving this logic to its own landmark factory. I therefore changed the title of this issue accordingly. Does anybody have objections to proced by just dropping this code entirely?
msg9943 (view) Author: malte Date: 2021-01-29.13:14:55
> Malte, is it correct that you argue that I should add the lm_acyclic landmark
> factory to all landmark factories in aliases and portfolios, even if it is
> obvious that this won't do anything (e.g., because of the option
> no_orders=true)?

No, I don't feel strongly about this.

For what it's worth, we think that no_orders=true is a suspicious option. I see it is used in the FDSS-2018 portfolio, but this is likely either the effect of a bug (that makes removing orders better than it should be) or of random noise combined with automated parameter tuning.

Apart from the no_orders case, we shouldn't make too many assumptions about which LM factories do or don't produce orders because this is currently not really as it should be and may change. So I would assume that all LM factories can produce cyclic orders when reasonable_orders are enabled, and that all LM factories can produce cyclic orders on unsolvable tasks.

Regarding the other question, I'm not sure. The problem is that lm_merged doesn't keep all landmarks and merges, so the two variants might behave differently. I'd be in favour of either preserving the current behaviour (in terms of the final landmark graph incl. orderigns), or at least running experiments to make sure that we're fine with the new behaviour. If this can be achieved with the "make acyclic only once" version, it's preferable.
msg9940 (view) Author: thomas Date: 2021-01-29.11:39:22
Malte, is it correct that you argue that I should add the lm_acyclic landmark factory to all landmark factories in aliases and portfolios, even if it is obvious that this won't do anything (e.g., because of the option no_orders=true)?

Also, is it OK to add it only once for nested landmark factories, e.g. turn

lmc=lmcount(lm_merged([lm_rhw(),lm_hm(m=1)]),admissible=true)

into 

lmc=lmcount(lm_acyclic(lm_merged([lm_rhw(),lm_hm(m=1)])),admissible=true)

rather than 

lmc=lmcount(lm_acyclic(lm_merged([lm_acyclic(lm_rhw()),lm_acycliclm_hm(m=1))])),admissible=true)

even though the latter is what is happening at the moment?
msg9928 (view) Author: silvan Date: 2021-01-29.08:58:27
To summarize the discussion; should we go ahead and review the new factory for making landmark graphs acyclic? We could do a light review to prevent growing too attached... :-)
msg9922 (view) Author: malte Date: 2021-01-28.21:37:20
> 2. For these configurations, we'd like to know if breaking the cycles has any
> effect. If breaking cycles never has an effect, we can as well rename this
> issue to "remove the cycle breaking part". An experiment that compares the
> configurations with and without cycle breaking should be able to tell us.

I think we want to keep breaking cycles at least for now.

One reason for this is that this a part of the code where there definitely were and probably still are bugs layered on top of other bugs that interact in weird ways. Whether cycle-breaking has an effect on performance may change once the other bugs are fixed.

Another reason is that "established knowledge" is that cycles need to be broken. We believe this is wrong and that we have an interesting scientific contribution related to this insight. But if we want to ever publish this, we will probably regret having removed the part of the code that we need for an experimental comparison.
msg9921 (view) Author: malte Date: 2021-01-28.21:29:43
> 1. We want all configurations for which we know that they produces cycles (we know
> that such configurations must exist, and it's of course not necessary to have all,
> just a large enough subset such that we can answer the following questions). Finding
> a good subset of configurations is something I need help with.

We should have at least a partial answer to this from our ICAPS submission, shouldn't we?
msg9920 (view) Author: thomas Date: 2021-01-28.21:23:38
Unfortunately, I lost the logs of this experiment and cannot answer if cycles are found (I didn't parse anything that could tell us). We'll have to set up another experiment, for which we should clearly describe what we want to know. From the discussion here, I get the following:

1. We want all configurations for which we know that they produces cycles (we know that such configurations must exist, and it's of course not necessary to have all, just a large enough subset such that we can answer the following questions). Finding a good subset of configurations is something I need help with. 
2. For these configurations, we'd like to know if breaking the cycles has any effect. If breaking cycles never has an effect, we can as well rename this issue to "remove the cycle breaking part". An experiment that compares the configurations with and without cycle breaking should be able to tell us.
3. Regardless of cycles, it might be good to look into the question how orders are currently exploited in the code and if it ever pays off to search for reasonable orders. However, this is not part of this issue but should rather be answered somewhere else.
msg9918 (view) Author: malte Date: 2021-01-28.21:05:46
> The configuration issue990-v2-lama-first-pref in the satisficing experiment is a
> configuration as described by you: reasonable orders are enabled (and pref=true as
> well). What I wanted to point out that, despite caring about reasonable orders (and
> hence having cycles in some cases) and even though preferred operators are enabled
> (notice the pref=true), the configuration produce the same results as the
> configuration issue990-v2-lama-first-pref-cyclic) which only differs in the fact that
> cycles are *not* removed.

Do we know that it actually finds cycles to remove? Removing cycles won't have an impact if there aren't any cycles.
msg9917 (view) Author: malte Date: 2021-01-28.21:03:20
> However, the only part in the code that I found (I didn't search properly
> though) that actually does something with orders after the landmark graph has
> been created is code that deals with preferred operators

I think the main use of the edges is for the "needed_again" computation in the status manager, which in turn is used by the heuristic. I don't know if that code works correctly, but at least there are bits of code there that check for edges, and "needed_again" landmarks seems to factor into the heuristic value.

> after all, there should be some reason why cycles are broken - why bother if
> breaking the cycles doesn't affect any part of the code?

There really is no good reason (modulo some uncertainty regarding "obedient reasonable" orders that we should look into). It's based on the breaking of cycles in the use of landmarks for subgoal serialization, where it *does* make sense, but I don't think it ever made sense for heuristic computation. We should really finish the old landmark paper draft that clears all this up. :-)

> This does of course not show that removing cycles isn't relevant somewhere, but
> it's not necessarily relevant for a satificing configuration that creates
> reasonable orders, and not even if preferred operators are enabled in addition.

That makes me wonder how much benefit we actually get from reasonable orders and from needed_again. This may be worth testing, but it's not necessarily related to this issue.
msg9915 (view) Author: thomas Date: 2021-01-28.20:54:34
I didn't want to imply that preferred operators cannot handle cycles or are even affected by it. However, the only part in the code that I found (I didn't search properly though) that actually does something with orders after the landmark graph has been created is code that deals with preferred operators, so I thought that there might be something in the code that crashes in the presence of cycles (after all, there should be some reason why cycles are broken - why bother if breaking the cycles doesn't affect any part of the code?).

The configuration issue990-v2-lama-first-pref in the satisficing experiment is a configuration as described by you: reasonable orders are enabled (and pref=true as well). What I wanted to point out that, despite caring about reasonable orders (and hence having cycles in some cases) and even though preferred operators are enabled (notice the pref=true), the configuration produce the same results as the configuration issue990-v2-lama-first-pref-cyclic) which only differs in the fact that cycles are *not* removed.

This does of course not show that removing cycles isn't relevant somewhere, but it's not necessarily relevant for a satificing configuration that creates reasonable orders, and not even if preferred operators are enabled in addition.
msg9909 (view) Author: malte Date: 2021-01-28.19:54:17
Ah, I missed Thomas's experimental results. Well, they certainly don't contradict what I wrote in the previous message. ;-)

I am not sure if this is really related to preferred operators. It is certainly related to reasonable orders. Cycles in the landmark graphs can only exist in two (overlapping) cases:

1. if reasonable orders are enabled
2. if the landmark code already detects the problem is unsolvable

Case 2 is one we currently don't handle well and in which we often crash. But we already have three issues open for that, linked from issue987.

So Case 1 is the more interesting one here. So the main configurations of interest are the ones with reasonable orders, and in particular the satisficing ones. (For optimal configurations, reasonable orders are considered off-limits in the literature, which is partially based on a misunderstanding, but also we don't know if *our implementation* of reasonable orders is actually correct for optimal configurations due to things like the difference between strict and non-strict orders.)


I think in an issue like this one we should not just look at runtime, expansions etc., but also at the landmarks. In many landmark issues, it would be useful to include statistics about the number and type of landmarks and the number and type of landmark orderings. With many people working on landmark issues, it would probably be a good idea to coordinate on this to not have to reinvent the wheel too much. I think we would then see quite clearly that in some configurations there are never any cycles to break.
msg9908 (view) Author: malte Date: 2021-01-28.19:43:58
For what it's worth, after we've done this, perhaps we will want to throw it away. ;-)

The idea behind making landmark graphs acyclic is largely misguided. Perhaps there is some point to it because of "obedient reasonable" orders, but these also deserve some closer scrutiny and should perhaps go away.

But this is likely something we cannot do right now. We probably need to address other problems with the landmark code first. So for now, I think a landmark factory that makes landmark graphs acyclic could be useful. Just don't grow too attached to it...
msg9874 (view) Author: thomas Date: 2021-01-28.09:26:03
There is no pull request for this yet because it branches of issue990 rather than main (since it's been part of issue990 at some point), which I'll fix once issue990 is merged. However, there is an (initial) implementation for which I ran some tests which can be found here:

https://ai.dmi.unibas.ch/_tmp_files/tkeller/issue990-v2-optimal.html
https://ai.dmi.unibas.ch/_tmp_files/tkeller/issue990-v2-satisficing.html

In the optimal experiment, there are 4 versions of both algorithm configurations: issue-990-base is the same one that was used as base for issue990, issue990-v1 is the final result of issue990 and would be issue996-base if issue990 would have been merged in its current state. issue990-v2 without the "cyclic" suffix is equivalent to the former two configurations, i.e., it uses the new acyclic landmark factory. Finally, issue990-v2 with the "cyclic" suffix does *not* use the new acyclic landmark factory, which results in a landmark factory that keeps the possibly cyclic landmark graph. If I'm not missing something, this doesn't make a difference for either of the two algorithm configurations.

This is as expected as orders only seem to play a role in the lmcount heuristic if preferred operators are used (this still has to be verified, I didn't put enough effort into the code to say this with 100% certainty). I therefore ran one configuration in the satisficing part that uses preferred operators: issue990-v2-lama-first-pref is *not* equivalent to issue990-base-lama-first or issue990-v1-lama-first, but differs from these in the fact that preferred operators are used. Even in this case the results between the cyclic and acyclic variants do not differ much, and it seems that preferred operators cares about orders but not about cycles.

These are obviously preliminary results, but they might serve as a starting point for a discussion on the question if there is even a case where the acyclicity of the landmark graph matters at all (implying that we don't need to add it at all), or if there are algorithm configurations (e.g., in the aliases or portfolios) where we do not need to add the acyclic landmark factory because the configurations are equivalent with and without the addition.
msg9867 (view) Author: thomas Date: 2021-01-27.20:49:09
We would like to move the code that makes a landmark graph acyclic to a new landmark factory
History
Date User Action Args
2023-09-07 17:39:44clemenssetstatus: in-progress -> resolved
messages: + msg11360
2023-09-07 16:41:26clemenssetmessages: + msg11359
2023-09-06 13:59:26maltesetmessages: + msg11350
2023-08-30 12:14:07clemenssetmessages: + msg11298
2023-08-25 11:01:14clemenssetfiles: + domain.pddl
2023-08-25 11:00:58clemenssetfiles: + problem.pddl
messages: + msg11291
2023-08-23 14:44:17clemenssetmessages: + msg11288
2023-08-22 10:09:11clemenssetmessages: + msg11286
2023-08-07 19:32:52maltesetmessages: + msg11270
2023-08-04 14:34:56clemenssetmessages: + msg11263
title: Add landmark factory that generates acyclic landmark graph -> Remove cycle-breaking code in landmark factories
2021-01-29 13:14:55maltesetmessages: + msg9943
2021-01-29 11:39:22thomassetmessages: + msg9940
2021-01-29 08:58:27silvansetmessages: + msg9928
2021-01-28 21:37:20maltesetmessages: + msg9922
2021-01-28 21:29:43maltesetmessages: + msg9921
2021-01-28 21:23:38thomassetmessages: + msg9920
2021-01-28 21:05:46maltesetmessages: + msg9918
2021-01-28 21:03:20maltesetmessages: + msg9917
2021-01-28 20:54:34thomassetmessages: + msg9915
2021-01-28 19:54:17maltesetmessages: + msg9909
2021-01-28 19:43:59maltesetmessages: + msg9908
2021-01-28 09:26:03thomassetmessages: + msg9874
title: Add landmark factory that generate acyclic landmark graph -> Add landmark factory that generates acyclic landmark graph
2021-01-27 20:49:10thomascreate