Issue995

Title Refactor options for LandmarkFactories
Priority wish Status resolved
Superseder Nosy List clemens, jendrik, malte, salome, silvan, thomas
Assigned To salome Keywords
Optional summary
This is part of meta issue987.

Created on 2021-01-27.11:15:06 by salome, last changed by salome.

Summary
This is part of meta issue987.
Files
File name Uploaded Type Edit Remove
issue995-v1-optimal-issue995-base-issue995-v1-search_time-seq-opt-bjolp.png salome, 2021-02-18.15:10:42 image/png
issue995-v2-optimal-issue995-base-issue995-v2-search_time-seq-opt-bjolp-optimal.png salome, 2021-02-19.08:59:10 image/png
Messages
msg10291 (view) Author: salome Date: 2021-05-06.16:35:08
We decided to keep the command line options as they were in the pull request and I merged the issue.
msg10288 (view) Author: silvan Date: 2021-05-05.14:47:41
If Malte has a strong opinion on how these options should be called, I'm fine with them. Otherwise, I already expressed my opinion. We can also discuss this again, but then Malte should be part of the discussion.
msg10287 (view) Author: salome Date: 2021-05-05.13:58:31
Silvan, Clemens and I met today to discuss the pull request and the topic we most discussed was whether to change some of the option names. Since Malte also commented on this in the pull request I thought it's best to pick up the discussion here for best visibility.

In the meeting we agreed to rename the following options, because they better describe what the option actually does:
use_orders (already renamed from no_orders) -> discard_orders
disjunctive_landmarks -> discard_disjunctive_landmarks
conjunctive_landmarks -> discard_conjunctive_landmarks

Malte in the meantime commented on the pull request that in general "positive" flags are good practice but we should care most about the user interface and not revert the "direction" of options unless there is a good reason for it.

I think for use_orders we could meet both inputs by undoing the change and go back to no_orders (although I would suggest the name "discard_orders"). We originally agreed to change that option to a "positive" version but I don't remember anymore why we wanted this.
For con/disjunctive landmarks, the opinions clash. The easiest way to deal with this is to just leave the options as they are. I do think that discard_... would better describe what we are actually doing, since con/disjunctive landmarks are always built and then (depending on the option) potentially discarded. Another option would be to rename them to "keep_...". This way we keep the "direction" but add some more meaning to it.

Opinions? Should we meet to discuss this?
msg10273 (view) Author: malte Date: 2021-04-30.18:37:18
I don't think we have to blame the cache again, the LP solver is highly unpredictable.

The lmcount configurations that use the LP solver aren't particularly good in the first place. We primarily test them to make sure we have decent test coverage and to not miss really big performance changes, but I don't think we care if these landmark heuristic configurations move up or down a little bit. They can move much more dramatically by using a different LP solver version or changing tiny things about the LPs that are semantically neutral.

One thing I noticed in the experiments is that we have some tasks that get killed because they reach the hard output limit. Not great, but it seems to cancel out between base and v3.

So I see nothing really bad in these experimental results.
msg10269 (view) Author: salome Date: 2021-04-30.18:11:41
I pulled in the newest main branch and reran the experiments:

https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v3-optimal-issue995-v3-base-issue995-v3-compare.html
https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v3-satisficing-issue995-v3-base-issue995-v3-compare.html

All looks fine except for seq-opt-bjolp-optimal where we loose over 2 points of time score. I can't really explain this since the landmark generation time (which is the only thing that should change) is actually getting faster. I tried to see locally what happens based on the task driverlog/p02 (where we have a ~10% increase in the experiments) but when profiling the new version was actually faster, and almost all time was anyway spent in the LP solver which I most definitively didn't touch :).

Do we also consider this result as weird cache optimization or should we look closer into it?
msg10132 (view) Author: salome Date: 2021-02-23.17:28:31
To my understanding the landmark generation should not be part of search_time. The search timer is started right before calling engine->search() (planner.cc line 56), which first of all calls initialize (search_engine.cc line 88), which in turn outputs "Conducting best first search". This output occurs in the log after the landmark graph is built.

I nevertheless parsed the landmark generation time and updated the reports (still the same links as before but I'm posting them again):
https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v2-optimal-issue995-base-issue995-v2-compare.html
https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v2-satisficing-issue995-base-issue995-v2-compare.html

While the generation is a bit worse, it doesn't seem to explain the loss in score to me. For example, in lm_zg generation time is roughly 1% slower, but the overall search time score is better.

@Clemens: Regarding the segfaults, it seems to be a case of the translator running out of memory and not handling it properly. It only happens in organic-synthesis and also happened in my preliminary experiments, but I don't know why and the logs are not very helpful (for example, the three runs I just checked all did not have an error log).
msg10108 (view) Author: malte Date: 2021-02-19.14:03:01
Is it possible that the search time that is measured includes the time spent by the landmark factories, and the main difference is in the landmark factories?

It would be good to measure and report the time to construct the landmarks in issues like these. I suggest coordinating with Thomas, who is looking into the same thing for issue1004.
msg10105 (view) Author: clemens Date: 2021-02-19.09:57:34
I don't understand why the runtime changes either... I went through the entire changeset again and there was nothing that cought my eye. 
The only changed file relevant during search are the landmark_graph.(h|cc) and landmark_count_heuristic.(h|cc), correct? The changes there should have no effect on the search, though. I'm sorry, I can't help...
 
One question that popped up when looking at the reports: can you explain the segfaults?
msg10104 (view) Author: salome Date: 2021-02-19.08:58:09
The experiments are finished:

https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v2-optimal-issue995-base-issue995-v2-compare.html
https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v2-satisficing-issue995-base-issue995-v2-compare.html

I don't understand the results. lama-first and seq-opt-bjolp with optimal cost partitioning are significantly worse in search_time, even though the issue doesn't change anything in the search part. I also made a relative scatterplot of bjolp-optimal (attached), and the difference looks systematic now. Normal bjolp on the other hand seems to not be affected, even though they only differ in the cost assignment part which the issue didn't touch.
The only thing I changed that is remotely related to the search part is that the landmark graph now stores a boolean whether or not the reasonable orders factory has been used (we only need to know this to check in the beginning that we don't call admissible=true when reasonable orders are used). Could this be the culprit?

Does anyone have a suggestion how I should proceed?
msg10099 (view) Author: salome Date: 2021-02-18.18:21:35
I addressed the small comments in the pull request request and started new experiments. The Github Actions failed, but it complains about something with CPLEX so I assume this is due to our Github Action story (tox test ran through locally).
msg10094 (view) Author: salome Date: 2021-02-18.15:10:42
We discussed the design and the experiment results in a meeting today.

Regarding design, Malte and Thomas found the compositional approach of having reasonable orders as a landmark factory an advantage and mentioned that other parts of the code are also going in this direction (for example the SumEvaluator which takes a vector of Evaluators). We do however want to be mindful that the command line doesn't become too complicated.

We briefly discussed whether we want to have a transition period where the command line option for reasonable orders is kept but internally we call the factory. We however decided against it since the issue changes the command line usage anyway (removing irrelevant options).


Regarding experiments, we noticed 3 things:
1) Memory usage is higher in lama-first, most likely due to the nested factories.
2) Time scores in optimal configurations are a bit worse, especially search_time where nothing should have changed.
3) lama-first (the only configuration using reasonable orders) has differences in the amount of landmark graph edges.

After the meeting I looked into these issues and found the following:
1) I implemented a function that releases the memory from the operators_eff_lookup vector (from LandmarkFactory), but couldn't see any difference in the local test I ran. After remembering that the output shows *peak* memory, I realized that the reason why I don't see any change is that peak memory is actually reached when building the graph, thus freeing memory afterwards doesn't show. I tried finding an example where the search eventually requires more memory, but couldn't really see a difference there either. I'm thus wondering if the factories are actually deleted after the graph is built. The landmark heuristic only saves the graph, and as far as I can tell the Options object should also be deleted before the search starts, so I think there should be no more pointers to the factory left. Can someone confirm this?

2) I can't really find a reason for this or even an example where v1 is consistently worse. I created some absolute and relative scatter plots for the optimal configs. You can't see anything on the absolute plots, and the relative ones seem pretty random to me. I attached the one for bjolp and ran some local test on the outlier in openstacks-opt11-strips. Here are the search times (in increasing order):
issue995-base: 82.48s, 84.72s, 86.36s, 85.41s
issue995-v1:   83.21s, 84.81s, 85.18s, 95.56s
One of the runs of v1 is noticeably worse, but it's strange that it only happened once (randomness?). Should I investigate further or just start another experiment once I'm done with the small changes from the pull request?

3) This is the fault of my custom parser. Since reasonable orders are a factory now, they get their own output of how many landmarks and edges were found, and the output for the "outer" factory happens after the output of the "inner" factory. The parser however parses the *first* occurrence of the "# edges" pattern. I rewrote the parser so it parses the last occurrence instead. I also updated the report to use the new parser, now there is no differences in edges anymore.


In summary, my plan currently is to fix the smaller things from the pull request and then rerun the experiments, but I would be happy to hear your opinion about my suspicion regarding 1) and your thoughts on 2).
msg10079 (view) Author: clemens Date: 2021-02-17.18:09:02
I have reviewed the pull request. My main criticism was the introduction of a new landmark factory for the reasonable orders. In msg9914, Malte writes about a discussion about this somewhere else. However, it seems that the contents of this discussion are lost, so I could not read up on the background of this change. I would like to open a new discussion about this.

In my understanding, currently there are two major things happening in the landmark factories.
1. Landmarks are computed with different strategies (i.e., RHW, hm, ZG, and exhaust).
2. Postprocessing of landmark graphs according to certain needs (e.g., merged together, extended with reasonable orders, made acyclic, cleaned from conjunctive/disjunctive landmarks, cleaned from all orderings, etc.)

I believe the second category is what we refer to as options within this issue. In my opinion, those things are very similar to each other, so to me it does not make much sense to have one of these things (i.e., reasonable orders) in their own factory while keeping everything else as options. 

I haven't come up with a design suggestion yet, and I don't think doing so is easy. However, moving reasonable orders into a factory now is a step in the wrong direction in my oppinion. Assuming every option might end up as a factory, the calls become extremely nested. Additionally, some nesting orders might not make sense at all.

What are your opinions? And what did I miss in the previous discussion on this?
msg10051 (view) Author: salome Date: 2021-02-12.09:25:15
Here are the experiment results:

https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v1-optimal-issue995-base-issue995-v1-compare.html
https://ai.dmi.unibas.ch/_tmp_files/simon/issue995-v1-satisficing-issue995-base-issue995-v1-compare.html

Memory and time scores go down a little bit and we loose one task each in seq-opt-bjolp and lm_hm (optimal). The only configuration using reasonable orders (lama-first) does not loose time, but the memory increase is more significant than other configurations (except optimal lm_hm). The difference in landmarks for lm_exhaust (satisficing) occurs in trucks.
msg10046 (view) Author: malte Date: 2021-02-11.21:30:53
Another way to view this is to observe that natural orders are transitive, so X -nat-> Y and Y -nat-> X implies X -nat-> X, i.e., "In every plan, X is first achieved strictly before X is first achieved." This implies that there are no plans.

But like Salomé said, we currently have no way of dealing with this. The way we "deal" with it is having the planner crash in the mk_acyclic_graph call (at least in the debug build). I don't think this can be handled nicely before these crash issues are addressed.
msg10042 (view) Author: clemens Date: 2021-02-11.17:56:23
Having a cycle of natural orderings only is impossible in solvable tasks for sure. It goes against the definition of natural orderings. A natural ordering X->Y implies that Y can only become true after X. Hence, if X can also only become true after Y, then they cannot become true at all. Though, they must be true at some point in all plans (because they are landmarks). So they can only be correct landmarks if no plan exists.
msg10039 (view) Author: salome Date: 2021-02-11.17:33:52
Yes that's my assumption as well. But since we don't handle unsolvable tasks yet I didn't want to implement it that way now. I happily change it though if you all think it's ok.
msg10038 (view) Author: thomas Date: 2021-02-11.17:27:08
I believe it is only possible to get a cyclic graph without reasonable orderings if the instance is unsolvable.

If this is true, changing the implementation such that we always check for unsolvability and only call mk_acyclic_graph in the reasonable_orderings factory could solve this issue (and possibly also issue937)?
msg10037 (view) Author: salome Date: 2021-02-11.17:25:36
Sorry, just realized it's only one configuration from issue987 that doesn't make sense. The other configuration I removed was lama-first with pref=true since it's not that trivial to run it in both revisions due to changed command line options, but I can set it up if it's needed.
msg10035 (view) Author: salome Date: 2021-02-11.17:17:16
I updated the pull request and added some comments as well

The main thing I'm worried about is that turning reasonable_orderings into a factory currently causes all configurations with reasonable orderings to run mk_acyclic_graph two times.

I started experiments with configurations from issue987 (minus some that didn't make sense, I'll add a comment in issue987) and will update the results here when they are in.
msg10018 (view) Author: salome Date: 2021-02-10.09:33:47
I created a pull request: https://github.com/aibasel/downward/pull/26

I still need to adapt the aliases etc (which is probably why the github actions currently fail), but there are a few things I'm not happy about and would like to discuss already:

1) Reasonable orders is currently an option that all graphs factories add and handle in their generate method. We discussed making this a factory, but if we want this it should be coordinated with issue996 (factory for acyclic graphs).

2) I would like to rename the function "generate" in the different factories to something like "postprocess", since they do things like discarding landmarks/orderings, adding reasonable orderings, setting landmark IDs, making the graph acyclic and computing achievers. Actually I would like to rework generate / generate_landmarks / generate_relaxed_landmarks in general since I find the control flow a bit strange but that is probably best done in a fresh issue.

3) The function for detecting only causal landmarks now has a check that the task has no conditional effects and no axioms. I'm not sure if the no axioms part is needed, and I'm not sure if this couldn't be handled more gracefully. Furthermore, we're pretty sure that it can't handle conjunctive landmarks correctly but there is no check in place yet, mostly because I removed the option from lm_hm which is the only factory that can produce conjunctive landmarks. But we should probably at least add a comment, or even better explicitly check that the graph contains no conjunctive landmarks.
msg9914 (view) Author: malte Date: 2021-01-28.20:50:39
I agree on removing the irrelevant options.

Note, with any option changes, that we need to make sure that the aliases, portfolios, test scripts and examples on the wiki are updated where necessary. Any command-line change also needs to be described very clearly in CHANGES.md.


However, removing the reasonable orders option for lm_merged is not so simple. From what the others wrote, my understanding is that reasonable orders are computed on demand by a generic procedure. If we merge graph1 and graph2, then lm_merged can potentially find reasonable orders between one landmark from graph1 and another from graph2. Obviously, neither of graph1 and graph2 can do this on its own.

There is also the issue that lm_merged doesn't simply merge the landmark graphs due to some of the limitations we have if I recall correctly (in particular, no overlapping landmarks). This is less important because that restriction is a bit of an unfortunate limitation in the first place, which we should eventually address.

None of this matters if we make computing reasonable orders its own factory, as proposed elsewhere.
msg9913 (view) Author: malte Date: 2021-01-28.20:43:55
> zg
> --
>  -> relevant: reasonable_orders, only_causal_landmarks, no_orders
> Makes mostly sense, but as with hm I thought that the landmarks should always
> be causal.

Yes. This is weirder. Either a corner case in a definition (e.g. landmarks that are part of the goal or initial state) or a bug.
msg9912 (view) Author: malte Date: 2021-01-28.20:41:35
> hm
> --
>  -> relevant: reasonable_orders, only_causal_landmarks, conjunctive_landmarks, no_orders
> Makes mostly sense, however I thought that in the paper they said that the discovered
> landmarks are always causal landmarks.

The tested configuration uses m=2.

I don't really know what the only_causal_landmarks option does, but I wouldn't be surprised at all if it were buggy for conjunctive landmarks. We should look at the code for it.

Irrespective of this, I think it would be interesting to test m=1, too.
msg9911 (view) Author: malte Date: 2021-01-28.20:36:27
The reason for the nondeterministic translation in trucks-strips is the internal timeout for invariant synthesis (300 seconds by default). I am not aware of other domains where invariant synthesis hits 300 seconds. There may be other such domains (although we did test this every now and then and only ever saw trucks-strips). But citycar-sat14-adl certainly doesn't hit the time limit. On my not very fast desktop,  no task takes longer than 2.22 seconds to translate. This doesn't completely rule out nondeterministic translator behavior somewhere else, but we tried quite hard some time ago to make the translator deterministic, and I would look for other explanations first. If we want to really rule this out, we can include an md5 (or other) hash of output.sas in the experimental results.

I vaguely recall something in the landmark code behaving nondeterministically every now and then, but I don't know if we ever fixed it or even diagnosed it. It could be something like a set ordered by landmark node pointers where traversal order matters for some things.
msg9895 (view) Author: silvan Date: 2021-01-28.18:10:29
Thanks for the analysis! 

> however I think we need to investigate why exhaust and rhw have changing landmark numbers when they shouldn't, and we should figure out why only_causal_landmarks is relevant for hm and zg.

The fact that the change is small and that for one of the two configs in question the change not only happened in citycar-sat14-adl but also in trucks-strips rang an alarm bell: is the translation of citycar-sat14-adl non-deterministic as in trucks-strips? The latter can definitely the changes you could observe. If the translation is deterministic in citycar-sat14-adl, then the cause of finding different landmarks must have a different cause. Since it is an adl domain, I would first look if conditional effects or axioms could be the culprit.

> If everyone agrees I will move the options according to this anaylsis
I agree.

> I would suggest that we remove all options from merged, since the relevant options depend on the subfactories used.
I agree.
msg9891 (view) Author: salome Date: 2021-01-28.17:47:30
I forgot an important note to the experiment: I only computed the graphs, i.e. the code exists after the graph is computed. Otherwise the experiment would have taken ages (it was over 100'000 runs).
msg9890 (view) Author: salome Date: 2021-01-28.17:29:56
The experiment results are in. For each generation method I ran a "base" version with standard settings (no reasonable orderings, allow non-causal landmarks, allow disjunctive landmarks, allow conjunctive landmarks, have orders), as well as changing one setting at a time (i.e. no_orders has all settings on standard but turns no_orders on).

https://ai.dmi.unibas.ch/_tmp_files/simon/exhaust.html
https://ai.dmi.unibas.ch/_tmp_files/simon/hm.html
https://ai.dmi.unibas.ch/_tmp_files/simon/rhw.html
https://ai.dmi.unibas.ch/_tmp_files/simon/zg.html

There are unexplained errors, most of them about writing too much output (I didn't think dumping the graph could reach that limit...), some when a factory does not support axioms/conditional effects, and a few segfaults in organic-synthesis-opt18-strips and caldera-sat18-adl (which I gathered from the meeting this morning is a known issue?)


Here is a breakdown of the results. I defined an option relevant if the relevant metric changes when toggling the option.

exhaust
-------
 -> relevant: reasonable_orders, only_causal_landmarks
This makes sense since exhaust does not generate orders, disjunctive landmarks or conjunctive landmarks. What surprised me however is that the amount of (simple) landmarks also changes a bit for settings where it shouldn't. This apparently only happens in two domains: citycar-sat14-adl and trucks-strips.

hm
--
 -> relevant: reasonable_orders, only_causal_landmarks, conjunctive_landmarks, no_orders
Makes mostly sense, however I thought that in the paper they said that the discovered landmarks are always causal landmarks.

rhw
---
 -> relevant: reasonable_orders, only_causal_landmarks, disjunctive_landmarks, no_orders
Makes sense, however there is a glaring issue: When running with disjunctive_landmarks=false, the graph still contains disjunctive landmarks in some domains. Furthermore, the amount of total landmark changes again changes a bit for settings where it shouldn't (only happens in citycar-sat14-adl).

zg
--
 -> relevant: reasonable_orders, only_causal_landmarks, no_orders
Makes mostly sense, but as with hm I thought that the landmarks should always be causal.


So overall reasonable_orders and only_causal_landmarks are relevant everywhere, no_orders is relevant except for in exhaust, and disjunctive_landmarks / conjunctive_landmarks are only relevant in rhw / hm respectively.

If everyone agrees I will move the options according to this anaylsis, however I think we need to investigate why exhaust and rhw have changing landmark numbers when they shouldn't, and we should figure out why only_causal_landmarks is relevant for hm and zg.

A last note: I would suggest that we remove all options from merged, since the relevant options depend on the subfactories used. I think it is nicer from a design perspective and might also be beneficial to the performance (throwing stuff away before merging the graphs). One potential downside is that commands might become longer; for example if we want reasonable orders on both subfactories we need to specify it separately for each.
T
msg9849 (view) Author: salome Date: 2021-01-27.11:15:05
Currently, all LandmarkFactories share a list of options which do not always make sense for a specific factory. For example, we can specify whether lm_rhw should keep conjunctive landmarks, but only lm_hm can generate conjunctive landmarks.
We want to look into which options make sense for which factory and remove options from factories where they don't make sense.

The documentation of LandmarkFactories (http://www.fast-downward.org/Doc/LandmarkFactory) already provides a list of relevant options. As per Silvan's suggestion on Discord I will verify this with an experiment.
History
Date User Action Args
2021-05-06 16:35:08salomesetstatus: in-progress -> resolved
messages: + msg10291
2021-05-05 14:47:41silvansetmessages: + msg10288
2021-05-05 13:58:31salomesetmessages: + msg10287
2021-04-30 18:37:18maltesetmessages: + msg10273
2021-04-30 18:11:42salomesetmessages: + msg10269
2021-02-23 17:28:31salomesetmessages: + msg10132
2021-02-19 14:03:01maltesetmessages: + msg10108
2021-02-19 09:57:34clemenssetmessages: + msg10105
2021-02-19 08:59:10salomesetfiles: + issue995-v2-optimal-issue995-base-issue995-v2-search_time-seq-opt-bjolp-optimal.png
2021-02-19 08:58:43salomesetfiles: - issue995-v1-optimal-issue995-base-issue995-v1-search_time-seq-opt-bjolp-optimal.png
2021-02-19 08:58:09salomesetfiles: + issue995-v1-optimal-issue995-base-issue995-v1-search_time-seq-opt-bjolp-optimal.png
messages: + msg10104
2021-02-18 18:21:35salomesetmessages: + msg10099
2021-02-18 15:10:43salomesetfiles: + issue995-v1-optimal-issue995-base-issue995-v1-search_time-seq-opt-bjolp.png
messages: + msg10094
2021-02-17 18:09:03clemenssetmessages: + msg10079
2021-02-12 09:25:16salomesetmessages: + msg10051
2021-02-11 21:30:53maltesetmessages: + msg10046
2021-02-11 17:56:23clemenssetmessages: + msg10042
2021-02-11 17:33:52salomesetmessages: + msg10039
2021-02-11 17:27:08thomassetmessages: + msg10038
2021-02-11 17:25:36salomesetmessages: + msg10037
2021-02-11 17:17:16salomesetmessages: + msg10035
2021-02-10 09:33:47salomesetmessages: + msg10018
2021-01-28 20:50:39maltesetmessages: + msg9914
2021-01-28 20:43:56maltesetmessages: + msg9913
2021-01-28 20:41:35maltesetmessages: + msg9912
2021-01-28 20:36:27maltesetmessages: + msg9911
2021-01-28 18:10:30silvansetmessages: + msg9895
2021-01-28 17:47:30salomesetmessages: + msg9891
2021-01-28 17:29:56salomesetmessages: + msg9890
2021-01-27 11:17:43salomesetsummary: This is part of meta issue987.
2021-01-27 11:15:06salomecreate