Issue 1072: Landmark Count ignores conjunctive landmarks for preferred operators

Title	Landmark Count ignores conjunctive landmarks for preferred operators
Priority	bug	Status	deferred
Superseder		Nosy List	clemens, jendrik, malte, salome, silvan, simon
Assigned To		Keywords
Optional summary	Part of issue987.

Created on 2022-12-21.13:51:07 by clemens, last changed by clemens.

Summary
Part of issue987.

Messages
msg11646 (view)	Author: clemens	Date: 2024-07-17.15:46:15
Salomé, Malte and I have talked about this issue in person. Long story short: We decided to defer working on this issue until we have experimented with using landmark achievers to determine preferred operators instead of checking whether operator effects achieve the landmark. Slightly longer story: We agreed that ideally, the definition of when an operator is preferred, is simple. It should be as simple as "an operator is preferred if it achieves a future landmark" where "achieving" means the landmark does not hold in the state where the operator is applied and it holds in the state resulting from applying the operator. Unfortunately, we don't see an easy and efficient way to do this for conjunctive landmarks. The suggested implementation is an approximation of this, which considers an operator preferred as soon as it achieves one atom in the conjunction of a future landmark, which did not hold previously. As Malte has written in msg11639, we usually assume that if a conjunctive landmark is present, then all its elements are also present as simple landmarks. Hence, the operator that achieves a part of the conjunction also achieves the corresponding simple landmark and will hence be preferred anyway already. The assumption is thus, that it is not very helpful to add the proposed change. Nevertheless, we don't like that conjunctive landmarks are dealt with implicitly without any comment. One idea that came up during the meeting was that we could use the (precomputed) achiever information of landmarks instead of operator effects to decide which operators are preferred. First and possible achievers are computed for all landmarks, no matter whether disjunctive, conjunctive or simple, so one nice aspect about this solution would be that we can compute preferredness independent of the landmark type. A drawback is that achievers are also just an approximation (e.g., conditional effects considered unconditional in lm_hm), so it is still not the solution we would ideally like to have. Yet, we find it appealing enough to persue that approach first before putting more efforts into the one previously proposed in this issue. If it works well enough, it renders this issue obsolete. Otherwise, we could still dig it up again and continue where we leave it. If we should return back to this issue, there are some things to improve (see pull request). The probably most worrying change is that removing conjunctive landmarks takes time linear in the number of landmarks (finding a landmark in the list of landmarks containing one of its atoms). This could probably be solved by a different implementation for removing all conjunctive landmarks if the respective option is set. For the sake of completeness: We also discussed one possible approach that is closer to the ideal solution than anything mentioned above, but it has its own issues. The idea was to use the set-difference of future landmarks between one state and its successor to determine whether at least one landmark is achieved. At least in eager search, all required information should be available at the time where it is needed. The problem is that preferred operators are computed as part of the heuristic which does not have access to all this information naturally, so implementing this would require fundamental changes. Further, lazy search works differently and does not have the successor information when it would be required for this approach to work. Since LAMA in particular depends on the lazy rather than eager search, this didn't sound like a reasonable approach under these circumstances. Feel free to add if I forgot something relevant, Malte and Salomé.
msg11639 (view)	Author: malte	Date: 2024-07-16.17:01:21
I haven't followed the entire discussion, so I'm not entirely sure what definition of preferred operators you have converged on. If you're interested in comments even though they come from a not so well informed position, here are two thoughts (quoting from Clemens's msg11522): > One thing I thought about when migrating the changes was the following: > Operators are preferred if they achieve any atom of a conjunctive future > landmark, even if that atom already holds in the state where this is checked. > However, in those cases they are not really preferrable because the landmark > can be achieved without applying them. It might be worth checking whether the > operator actually adds a part of the conjunctive landmark that does not yet > hold and only prefer those operators. I'm not 100% convinced in general of the idea that we should prefer operators that "make progress" towards a conjunctive landmark, but don't achieve it. That's to some extent what we used to do for simple and disjunctive landmarks (using a relaxed exploration) as well, but we eventually removed it because it gives us a more complicated definition of what a preferred operator is, while not really being experimentally good enough to warrant the complication. I view partially achieving a conjunctive landmark in a similar light. Say "a and b" is a conjunctive landmark, and currently neither holds. It follows that "a" is a landmark from the current state and that "b" is a landmark in the current state. In the case where neither of them is true in the initial state, a systematic landmark creation method (like ours) would also detect that they are landmarks, so they would already be part of the set of landmarks that we track. (If they are true in the initial state, our method may or may not record them as landmarks -- they clearly are, but perhaps our algorithms don't like such landmarks.) I think this is a conceptually cleaner way to get the preferred operators to prefer progress than subdividing the achievement of a landmark into multiple steps. Of course, depending on what our algorithms do, it might not work for atoms that are initially true. It also might not work for atoms that were true at some point, are now false, and are not detected to be future. But is that enough reason to move from a comparatively simple concept of preferred operator for landmarks to something like we're considering here? We might instead resolve those missed opportunities by strengthening our reasoning about future landmarks (e.g., every time "a and b" are future, also infer that "a" is future if not currently true and that "b" is future if not currently true) and limiting preferred operators to those that achieve a future landmark. > The same check could be done for disjunctive and simple landmarks, though: > If an atom holds, then it makes no sense to prefer an operator that adds it, > even if the corresponding landmark is future, because it means we need to go > to a state where the atom does not hold before adding it again to actually > achieve the future landmark, and only in the state where the atom doesn't > hold, operators that add it should be preferred. That makes sense.
msg11638 (view)	Author: clemens	Date: 2024-07-16.16:11:16
I profiled the running time of visitall-sat11-strips/problem12.pddl with the call --search "let(hlm,landmark_sum(lm_factory=lm_hm(m=2),pref=true),lazy_greedy([hlm],preferred=[hlm],bound=8))" which was feasible in terms of time required to execute the profiling. The visitall problem discuessed below is problem18.pddl, though, and also the configuration from the experiment additionally computes reasonable orderings and has no bound on the plan cost. So maybe this profile is not very meaningful, but scaling it up in any of these dimensions was not feasible for me. The outcome of this very limited experiment is about as expected: Running time goes up for the new version, but only really in the function that computes preferred operators. Everything else seems to be pretty stable across both versions. This is not only true for the sample counter in the profile, but also for the number of evaluated and expanded states, memory footprint, as well as the reported plan. To me, this seems like preferred operators don't appear to be beneficial in visitall, but computing them takes a bit of time. What is a little bit strange, though, is that in the experiment report, problem16.pddl indicates that the new version is more than 20 seconds faster than the base version. This time is gained during search, at the jump from g=177 to g=182 (from 169s to 617s in base and from 182s to 610s in v5). So this would be a counter-argument to the hypothesis that preferred operators don't help. Further, I could not really reproduce the time differences locally, so it could also be that this is just grid noise. At this point, I don't have any good ideas what else I could analyze and I also still question whether it's worth to do so. But the question remains: Do we want this change, despite facing a(nother) minor performance decrease of the landmark code? Any input is appreciated!
msg11627 (view)	Author: clemens	Date: 2024-07-10.17:19:07
I've looked into the snake-task mentioned below. It seems like we just get lucky in one case and unlucky in the other regarding whether the landmark graph can be computed within the time limit or not. In the case where the search time is 0.09 seconds, the total time is 1750.59 seconds, so not too much before timeout. Thus, I assume that in the other case we timed out before starting the search; if the search was started, it might be similarly fast as in the base configuration. Looking at the times for the task no longer solved in caldera-split, there seems to be a similar problem. In the solved case, the search time is ~340 seconds with a total time of ~1700 seconds, leaving ~1360 seconds for precomputations of which ~1357 are spent in the landmark generation. In the new version where we no longer solve the problem, landmark generation alone takes ~1525 seconds, leaving too little time for the search to complete (assuming it would also take around 340 seconds). Same statistics for the remaining unsolved task in visitall-sat11: search time ~1135 seconds, ~1603 seconds, difference ~468 seconds which is approximately the time measured for landmark generation. In the unsolved case, landmark generation is in fact slightly faster, so it seems like this is a task we should consider to figure out where our currently suggested changes have limitations. A general observation is that the lm_hm(m=2) landmark factory seems to have large variance in the landmark graph generation time for a given problem; our changes do not affect this part of the code, yet we see big differences in this metric. While unrelated to this issue, it can have an impact on the result, as we can see in the cases of the tasks we lose regarding coverage. Although we would also expect to see some (random) improvements if it was really unrelated to the change we do here. And then again, maybe we should not focus so much on coverage when evaluating our changes, but introduce other metrics that give us some idea about statistical significance of the differences we see. Regarding next steps, I will spend some more time looking into the visitall-sat11 problem. To be honest, though, I'm not sure whether it is really worth that much time. In the end, my opinion is that it is strange that conjunctive landmarks are ignored in the current code base when computing preferred operators, we now have an implementation and results don't look much different, so why not just merge to have a less arbitrary codebase and be happy about that. :-)
msg11620 (view)	Author: malte	Date: 2024-07-10.11:11:11
I agree with what Clemens wrote. Additionally, before focusing on optimizing the method to look up conjunctive landmarks it would be a good idea to measure to what extent it is a bottleneck.
msg11618 (view)	Author: clemens	Date: 2024-07-09.16:57:37
Thinking a bit more about this, I'm convinced it doesn't help to have a flag indicating whether the graph contains any conjunctive landmarks or not. The suggestion was to use this before checking whether there is a conjunctive landmark for a particular fact, but the lookup from facts to conjunctive landmarks should simply be empty if no conjunctive landmarks exist, which is fast to check, I assume pretty much as fast as checking a boolean flag. Although branch prediction might suffer since opposed to the flag it cannot be marked as static or constant information, but I don't know too much about these optimizations by the compiler, unfortunately. It's probably a good idea to look into that one task which was previously solved in less than a second and now is not solved within the time limit. I'll do so tomorrow.
msg11534 (view)	Author: salome	Date: 2024-01-22.17:09:06
In general the results look good to me. For lama it seems that the overhead is on average at around 2-3% of search time, which is of course not ideal but also not too tragic in my opinion. What I find interesting is that in miconic-fulladl the new version is consistently slightly faster (around 10%), while having the same amount of expansions. For the conjunctive configuration, we lose 3 tasks (in caldera-split, snake and visitall-sat11), most notably in snake we lose a task which the base version solved in only 0.9 seconds. I assume that we are just really unlucky and send the search in the "wrong" direction right at the beginning, but it might be worth taking a look at the task to see what is going on. One optimization Clemens and I discussed is that we could add a flag to the landmark graph whether or not the graph contains conjunctive landmarks at all. We could then check this flag before we check if there is a conjunctive landmark containing a specific fact. As Clemens pointed out that check might be expensive, and my hope is that a general "contains_conjunctive_landmarks" flag for the graph (which is only set once in the beginning) would work much better for branch prediction. However, it makes the code slightly more ugly in my eyes so I'm not sure if it's worth doing that just to (potentially) regain some of the very small performance loss. @Malte do you have an opinion/hunch here? The percentage estimates above are guessed from the following scatterplot comparing lama search time base vs v4: https://ai.dmi.unibas.ch/_visualizer/?xattribute=search_time&yattribute=search_time&properties_file=https%3A%2F%2Fai.dmi.unibas.ch%2F_experiments%2Fai%2Fdownward%2Fissue1072%2Fdata%2Fissue1072-v5-single-eval%2Fproperties.tar.xz&entries_list=issue1072-base-lama-first+issue1072-v5-lama-first+lama&y_range=%5B0.7106740898757299%2C+11.0%5D&x_range=%5B0.009000000000000001%2C+11000.0%5D&relative=True&autoscale=True&xsize=1000&ysize=1000&groupby=domain
msg11526 (view)	Author: clemens	Date: 2024-01-18.11:22:46
The suggested change actually brought down the number of expansions back to the ballpark of not preferring operators based on conjunctive landmarks. Now, there are several domains where considering them actually reduces the number of expansions quite abit (e.g., Caldera-split, Nomystery, and Sokoban). In general, the change doesn't seem to affect the configuration much, though. Time scores become a bit worse, expansion/evaluation/generation scores slightly better. See https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1072/data/issue1072-v5-single-eval/issue1072-v5-single-issue1072-base-issue1072-v5-compare.html for the results of a single-search configuration (I didn't run anytime search on v5 yet). Note that the last change also affects which operators we consider preferred for disjunctive and simple landmarks. The results indicate that it doesn't affect lama-first. I don't see anything too worrying in the experiment report. Since the change closes a gap in the definition of preferred operators for landmarks, I think it would be a good extension of the code. There is also no clear benefit, though, so I'm not sure how to proceed. I could run the anytime experiment to have some more data if people would be interested in that.
msg11525 (view)	Author: clemens	Date: 2024-01-17.11:48:12
I ran the experiments, here are the results: https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1072/data/issue1072-v4-single-eval/issue1072-v4-single-issue1072-base-issue1072-v4-compare.html https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1072/data/issue1072-v4-anytime-eval/issue1072-v4-anytime-issue1072-base-issue1072-v4-compare.html LAMA, which is mainly there to make sure nothing breaks in the non-conjunctive case, doesn't seem to be affected much. It appears to be a bit slower in general, which kinda makes sense because when computing preferred operators there is now an additional check for every effect of applicable operators whether there are conjunctive landmakrs containing it. Even if there are never conjunctive landmarks in LAMA (which uses lm_rhw()), this might not be predictable by branch predictions or optimizations as such and therefore checking this over and over again adds up. I didn't verify this hunch, though, and it might as well be something else. Regarding the configuration which actually uses conjunctive landmarks: Doing so seems to have mixed effects depending on the domain. Schedule stands out with significantly worse number of expansions/evaluations. I will look into that and in particular check whether it might have to do with what I mentioned in my last message: Considering an operator preferred when it achieves a part of the conjunction which already holds might be a bad idea.
msg11522 (view)	Author: clemens	Date: 2024-01-16.11:15:02
The landmark code has changed a lot since we've been working on this issue. Since we got rid of the mk_acyclic_graph function, I'm curious how this looks today. Because of the many changes, it wouldn't have been trivial to merge the current main branch into the existing pull request, so I decided to copy the changes from the old one into a new, up-to-date branch. You can find a pull request here: https://github.com/aibasel/downward/pull/210 It is not obvious to me from the archived experiments what data we're interested in. Probably most interesting is the effect it has to consider preferred operators for satisficing heuristics with conjunctive landmarks, e.g. something like lama-first but with lm_hm(m=2) instead of lm_rhw(). Besides that, we need to make sure that the changes don't affect recommended configurations like LAMA negatively. I will setup experiments for these to see results of the current version in an updated codebase. One thing I thought about when migrating the changes was the following: Operators are preferred if they achieve any atom of a conjunctive future landmark, even if that atom already holds in the state where this is checked. However, in those cases they are not really preferrable because the landmark can be achieved without applying them. It might be worth checking whether the operator actually adds a part of the conjunctive landmark that does not yet hold and only prefer those operators. The same check could be done for disjunctive and simple landmarks, though: If an atom holds, then it makes no sense to prefer an operator that adds it, even if the corresponding landmark is future, because it means we need to go to a state where the atom does not hold before adding it again to actually achieve the future landmark, and only in the state where the atom doesn't hold, operators that add it should be preferred.
msg11031 (view)	Author: malte	Date: 2023-02-21.16:45:04
I'm not so worried about what happens in mk_acyclic_graph. The whole idea of making the landmark graph acyclic is bad in the first place, and we should work on getting rid of this soon. In the meantime, some changes of behaviour within that function are not a concern to me. (This is just a comment on this specific aspect; I did not look at experimental results more generally. Let me know if I should, or more generally if you're waiting for me to do something/comment on something.)
msg11017 (view)	Author: clemens	Date: 2023-02-14.15:21:19
To give some more context: The function mk_acyclic_graph (I'm very unhappy with that name, by the way, but also by the idea of the function itself of course) explores the landmark graph starting from every node one after the other (skipping those nodes for which it previously found that it cannot be part of any cycle). It marks nodes encountered as visited and procedes until a previously visited node is encountered. If this happens, it removes the first weakest ordering on the path from the visited node back to itself. Hence, this choice depends on the order in which these landmark nodes are considered/expanded and this might change in the different versions due to the use of an unordered set as the data structure. The reason why a different amount of orderings is removed is, because cycles can overlap in the set of orderings they consist of, and hence it is possible to get rid of multiple cycles at once by removing a single ordering. Since the choice by the implemented algorithm is greedy and it simply removes the first-weakest ordering, this does not have to be the same number over all for two different versions.
msg11016 (view)	Author: simon	Date: 2023-02-14.15:02:56
In the new line of experiments (v3) Clemens and I looked into the difference of orderings. For domains like trucks we waved it of as part of the translator non- determinism. However, blocksworld for example is not affected by that but has different numbers of orderings. We found that the step which makes the landmark graph acyclic uses an unorederd set. We checked it locally for probBLOCKS-10-0.pddl to confirm that the set is the same. We assume that the different order in the set removes the landmark-orders differently resulting in a different LM-graph with more or fewer edges.
msg11011 (view)	Author: simon	Date: 2023-02-10.17:30:11
The current version has some form of hierarchy such that if a operator to a simple landmark can be preferred (due to them being interesting) only operators to simple landmarks are preferred. Disjunctive ones only if no simple was found. In the PR: https://github.com/aibasel/downward/pull/149 conjunctive landmarks are now able to become interesting if the option is set. Also the hierarchy can be controlled. We did some experiments https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1072/data/ It seems that the use of the hierarchy does not cause any benefit that is significant over the noise. As we did not see any reason to have this hierarchy in the first place the goal is to remove it. Additionally, the conjunctive landmarks should be always used if the landmark factory produces them. A further cleanup of the code is needed and a followup experiment, too.
msg10899 (view)	Author: clemens	Date: 2022-12-21.13:51:07
The lmcount heuristic ignores conjunctive landmarks when computing preferred operators. In particular, lgraph->get_node(fact) returns a nullpointer when there is neither a simple landmark for fact nor a disjunctive landmark including fact. We should discuss how to derive preferred operators also for conjunctive landmarks and not just let these cases silently fall through. (This came up when working on issue1070.)

History
Date	User	Action	Args
2024-07-17 15:46:15	clemens	set	status: in-progress -> deferred messages: + msg11646
2024-07-16 17:01:21	malte	set	messages: + msg11639
2024-07-16 16:11:16	clemens	set	messages: + msg11638
2024-07-10 17:19:07	clemens	set	messages: + msg11627
2024-07-10 11:11:11	malte	set	messages: + msg11620
2024-07-09 16:57:37	clemens	set	messages: + msg11618
2024-01-22 17:09:06	salome	set	messages: + msg11534
2024-01-18 11:22:46	clemens	set	messages: + msg11526
2024-01-17 11:48:12	clemens	set	messages: + msg11525
2024-01-16 11:15:02	clemens	set	status: chatting -> in-progress messages: + msg11522
2023-02-21 16:45:04	malte	set	messages: + msg11031
2023-02-14 15:21:19	clemens	set	messages: + msg11017
2023-02-14 15:02:56	simon	set	messages: + msg11016
2023-02-10 17:30:11	simon	set	nosy: + simon messages: + msg11011
2022-12-21 13:51:07	clemens	create

Issue1072