Issue383

Title Reasonable orders in landmark graph not always generated
Priority bug Status resolved
Superseder Nosy List clemens, florian, gabi, jendrik, malte, salome, silvan, thomas
Assigned To Keywords
Optional summary

Created on 2013-06-21.15:36:04 by salome, last changed by clemens.

Messages
msg11394 (view) Author: clemens Date: 2023-09-22.10:46:47
Done!
msg11393 (view) Author: clemens Date: 2023-09-22.10:27:04
> Clemens, did you test this by trying optimal instead of uniform cost partitioning?

I did now. I tested at least one instance in every domain where the sum of initial heuristic values is lower. With optimal cost partitioning, the initial heuristic values were always at least as high with the new version. 

With this result, I interpret the previous messages to give green light for merging. I'm going to do so then.
msg11391 (view) Author: malte Date: 2023-09-21.20:13:45
> I have verified in one task that something along these lines is the case. (The actual case is a bit
> more complicated, but the reason for lower heuristic values is definitely uniform cost
> partitioning.)

Clemens, did you test this by trying optimal instead of uniform cost partitioning?

For me it would be sufficient to look at one case where we experimentally saw a worse initial h value with uniform cost partitioning. If this is resolved with optimal cost partitioning, then I think we're good to merge. (We don't need to solve the task if it only happens in tasks that are tough to solve -- just make sure that the initial h value doesn't go down when we use optimal instead of uniform cost partitioning on such a task.)
msg11382 (view) Author: salome Date: 2023-09-15.16:44:28
Overall the results sound good! I looked into where the loss for lama comes from, and the most noticeable is two out of memory errors in trucks-strips (everything else is +-1). Looking at the expansions, it seems that they are somewhat consistently worse in this domain. But of course, in other domains they are consistently better, so nothing too concerning in my opinion.

I did a relative scatterplot of the expansions:
https://ai.dmi.unibas.ch/_visualizer/?xattribute=expansions&yattribute=expansions&properties_file=https%3A%2F%2Fai.dmi.unibas.ch%2F_experiments%2Fai%2Fdownward%2Fissue383%2Fdata%2Fissu383-v5-sat-eval%2Fproperties.tar.xz&y_range=%5B0.0009148895225314929%2C+11000.0%5D&x_range=%5B1.8%2C+110000000.00000001%5D&entries_selection_mode=indicate+versions+by+substring&x_entries_substring=base&y_entries_substring=v5&relative=True&groupby=name&xsize=1000&ysize=1000

It shows very nicely that the performance improvement for zg is due to less expansions. For lama, I'd say its mostly balanced, more often worse on smaller problems but more often better on larger problems.

I also did a scatterplot for expansions until last jump in the optimal configs:
https://ai.dmi.unibas.ch/_visualizer/?xattribute=expansions_until_last_jump&yattribute=expansions_until_last_jump&properties_file=https%3A%2F%2Fai.dmi.unibas.ch%2F_experiments%2Fai%2Fdownward%2Fissue383%2Fdata%2Fissu383-v5-opt-eval%2Fproperties.tar.xz&y_range=%5B0.02533069077628825%2C+11.0%5D&x_range=%5B0.9%2C+110000000.00000001%5D&entries_selection_mode=indicate+versions+by+substring&x_entries_substring=base&y_entries_substring=v5&relative=True&groupby=name&xsize=1000&ysize=1000

In my opinion it shows quite clearly that we have on average the same or a bit less expansions in v5, and the "positive outliers" (less expansions) are much more extreme than the negative ones.


All in all, the results look good to me. I also did an thorough code review and have no comments :P.
msg11381 (view) Author: clemens Date: 2023-09-15.14:01:17
Ah, maybe add a pull request ;-) Here it is, really simple change: removing 3 lines. 
https://github.com/aibasel/downward/pull/181
msg11380 (view) Author: clemens Date: 2023-09-15.13:59:05
As usual, I ran one experiment with satisficing configurations and one with optimal ones. The configs slightly differ from the ones recommended in issue987 because the change only affects configs with reasonable orderings, so I added reasonable orderings to all recommendations where it makes sense. (lm_exhaust does not generate any orderings, so that's the only one that is missing.)

https://ai.dmi.unibas.ch/_experiments/ai/downward/issue383/data/issu383-v5-sat-eval/issu383-v5-sat-issue383-base-issue383-v5-compare.html
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue383/data/issu383-v5-opt-eval/issu383-v5-opt-issue383-base-issue383-v5-compare.html

The results are mostly convincing I would say. First of all, the change leads to a significant increase in the number of orderings found by the different landmark generators (all of the additional orderings should be reasonable). In the satisficing benchmarks, we find roughly 13% more orderings with lm_rhw and close to 1% with lm_zg. In the optimal benchmarks, we find a whopping 800% more orderings with lm_hm2 and 2% more with lm_bjolp=lm_merge(lm_rhw, lm_hm1).

Generating these additional orderings increases the lmgraph_generation_time, which is expected and in my opinion not too dramatic. While these additional orderings generally provide more information, it is also more expensive to progress all these orderings, making the heuristic evaluation slower. We can observe this for example in the lama-first configurations of satisficing or the hm2 config in the optimal report. Nevertheless, the improved heuristic guidance seems to merit the change, as it improves scores regarding evaluated, expanded, and generated states in all tested configurations. (Least improvement in hm2(opt) with ~1 point, largest improvement in zg(sat) between 18 and 33 points.) 

Coverage increases in all optimal configurations, though only between 1 and 3, so not really significantly. In satisficing, it goes down by 1 or 2 with the lama-first configs, but up by 17 (!) with lm_zg. With lama-first we can instead observe improved plan quality, i.e. sum of costs (for once not dominated by parcprinter).

In the hm2 config of the optimal experiment, there are two peculiarities that might seem to hint at a bug at first:
(1) In some cases, the number of expanded states before the last f-layer goes up and
(2) in some cases the initial heuristic value goes down.
These are unexpected effects in an experiment where we basically aim to improve heuristic quality by adding more information to be exploited to the landmark graph.

In our paper about landmark progression, we show why (1) is possible with path-dependent heuristics. To give a brief overview: We show that with more future landmarks (due to considering more reasonable orderings) it is possible that a state (let's call it s) has higher heuristic value and is therefore expanded later (if at all) and its information isn't progressed to its successors. If one of these successors is also reached through another path, it is missing information from the path through s (which would otherwise be provided if s was expanded earlier) and therefore the successor might look more promising than it would when it had the information from s. Not sure whether I could communicate the main point well enough, but it doesn't really matter because after thinking about (2) I don't think this is the issue here.

Regarding (2), I needed more thought to come up with an explanation why this might happen. Now, I've convinced myself that the observation is due to the use of uniform cost partitioning. Since we consider more orderings, we potentially have more future landmarks in the initial state. Consequenlty, the operator costs are potentially split among more landmarks. With this, achieving a single landmark becomes cheaper on average. The following example shows why this can lead to lower heuristic values despite summing over more landmarks as well.

Consider two landmarks L1 and L2 where L1 has a single achiever (a) and L2 has two achievers (a, b). Assume cost(a)=3 and cost(b)=1. In a scenario with less information, a given state s only considers L1 a future landmark, so the full cost of a is assigned to L1 and the heuristic estimates the cost of achieving the goal (aka at least all future landmarks) evaluates to 3. However, with more information, i.e., both L1 and L2 are future in s, the cost of a is split among L1 and L2 such that both consider cost(a)=1.5. Then, the estimated cost to achieve L1 is 1.5 and the estimated cost to achieve L2 is 1 (because min(cost(a), cost(b))=min(1.5, 1)=1) and consequently the heuristic evaluates to 2.5 which is lower than 3. 

I have verified in one task that something along these lines is the case. (The actual case is a bit more complicated, but the reason for lower heuristic values is definitely uniform cost partitioning.) This seem to have a negative impact on the performance only in the case of hm2, though. seq-opt-bjolp also uses uniform cost partitioning by default and benefits a lot by the change, with evaluation, expansion and generation scores going up by 12 to 17 points. Similar effects can be seen with optimal cost partitioning and BJOLP landmark generation. So all in all, I would say the results show clear improvements of performance and the underlying change resolves the bug that triggered this issue more than 10 years ago.

Does anybody want to raise any objections to merging this?
msg11379 (view) Author: malte Date: 2023-09-13.17:10:46
Sounds good!
msg11378 (view) Author: clemens Date: 2023-09-13.17:09:54
Going back to the origins of this issue, the problem seems to be a return statement in the reasonable orderings factory. It makes the reported orderings depend on the order in which landmarks are processed and therefore it may be non-deterministic. One suggestion was to replace the return with continue, which makes intuitive sense but broke something else. Now that we have updated the landmark progression in issue1036, this shouldn't break anymore, I believe, so continue is back on the table.

However, looking at the context of this return/continue, it seems to prevent orderings A --r--> B from being generated if B holds in the initial state. I'm not convinced we want that. With the old progression semantics, this might have made sense, but now I'd argue the more orderings the better; we observed in the past that the gain in heuristic accuracy makes up for the increased heuristic evaluation time.

So now I suggest to simply remove the check whether B holds in the initial state. I will set up experiments to see the impact of the change.
msg11290 (view) Author: clemens Date: 2023-08-24.17:22:48
We found a TODO note in the code referring to issue383. It is in landmark_factory.h, lines 36-39. It talks about reasonable orderings not being allowed in optimal configurations but this is no longer the case after issue1036, so we should remove it and the function it refers to when we work on issue383.
msg11287 (view) Author: clemens Date: 2023-08-22.10:56:37
issue1036 is now resolved and I would like to continue working on this issue. I belive we are back to "straightforward" for this issue, namely just droping the return statement mentioned in the opening message. Since this introduces cycles (as discussed in msg10098), maybe it makes sense to await issue996 which removes the cycle-breaking code, because otherwise we just introduce additional effort.
msg10437 (view) Author: clemens Date: 2021-09-14.10:05:21
I have opened issue1036 to work on one part of the problems discussed here, namely the flawed update of accepted landmarks in the landmark status manager. After resolving that, it should become easier to get to the ground of the origin of this issue, namely that some reasonable orderings are not generated.
msg10101 (view) Author: clemens Date: 2021-02-18.20:20:36
I agree that our discussions have driven away from the "simple" bugfix the issue originally addressed. Since the change intends to recover the performance loss caused by the fix, I'd vote for keeping it here, but I'm fine with either decision.

We also thought about just checking whether there are reasonable orders at all to resolve the problem. However, we didn't like that solution very much because assume there is a single reasonable order among thousands of other orderings. The graph would say that it has reasonable and hence start looping over all orders although only a single one is relevant. It would fix the optimal case, yes, but it wouldn't be a proper solution for the real underlying problem.

I would say yes, reasonable orders could be used for optimal configurations too. However, obedient-reasonable orders should not be used! Since in the current implementation you either get both or neither, the reasonable option should still not be used for optimal configurations! But discussing the sense behind obedient-reasonable orders should probably happen in another thread.
msg10100 (view) Author: salome Date: 2021-02-18.18:37:47
I agree with the intended changes, but I wonder if we shouldn't rather open a new issues for this. since this one is straying further and further from its original topic ;). I don't have a strong opinion on this though, just wanted to bring it up.

I first thought that you could instead for the moment just first check if reasonable orders are used at all before starting to loop over the parents. This would fix the runtime for the optimal configs because they (currently) cannot use reasonable orderings. But now I remember that you don't have an easy way to check this currently. issue995 would fix this since it moves the use_reasonable_orderings check to the graph rather than the factory. But in order to use this you would either need to mirror that change or wait until issue995 is merged, and neither of these options are great :/.

Also another point now that I'm rereading my text: With the changes in this issue, optimal configurations could actually use reasonable orders right? Or am I missing something?
msg10098 (view) Author: clemens Date: 2021-02-18.18:07:10
I met with Thomas to discuss the experiment results. In a nutshell, the effects for satisficing configurations look promising. They have a positive effect on the search time, but not always on the total time. 
The amount of reasonable orders increases for v2 and v3, which is expected. However, from v2 to v3 it again decreases for some problem instances. I was able to track this down to LandmarkFactory::mk_acyclic_graph(). Allowing reasonable orders towards landmarks that hold initially constitutes cycles, which are later destroyed again by said function. I thought, it would only remove one ordering within each cycle to do so, but it appears to remove multiple in the instance I had a closer look on. On the one hand, I'm not so sure anymore that v3 makes much sense in this issue, as it only introduces additional effort. On the other hand, however, with the updated algorithm to update reached landmarks, cycles should not be a problem anymore and I'd love to see the code take another step in that direction.

With the optimal configurations, the results are quite different. There, the changes affect the search time negatively in all configurations. Thomas and I have found the source for this and I'll try to briefly explain it here. It has to do with the when we check for unreached parents. Let me explain the two variants at hand.
1) In the current main branch, this happens in  LandmarkStatusManager::update_reached_lms(): if a landmarks was *not reached* previously, it is checked whether all parents are reached and if so, this landmark is marked reached.
2) In this issue, we move this check to LandmarkStatusManager::landmark_needed_agian(), which is calld for all *reached* landmarks in every state.

While the amount of *not reached* landmarks should shrink over time, the opposit is the case for *reached*. I'm not quite sure whether this is correct, but potentially, we have introduced exponentially more calls to this parent-check compared to before. Though, I'm completely sure that the way it was done before (variant 1) is simply wrong. Consider the following example landmark graph>

A -r-> B -n-> C

Now let's say there is a plan <o1, o2, o3, o4> such that o1 achieves B, o2 achieves C, o3 achieves A and o4 achieves B. The plan satisfies all ordering constraints as B before C, and later A before C. So let us think about what happens when using variant 1 described above:
 - Nothing is reached in the initial state.
 - When applying o1, we update which landmarks are reached. o1 affects B and B was not reached before, so it checks whether all its parents are reached. This is not the case since A is not reached, hence be will not be set to be reached after applying o1.
 - In the next step, o2 is applied which reaches C. C was not reached before, so we check for all its parents whether they are reached. This is not the case, as B was not marked reached before, so C is also not reached.
 - Applying o3 reaches A, and A has no parents, so it is marked reached (the first landmark that is reached, according to this logic).
 - Last, we apply o4 which reaches B for the second time, although it was not marked reached before. Now that A is reached, we will mark B reached as well.

As I've said before, the plan satisfies all orderings. However, apparently C is never marked reached with this strategy of keeping track of reached landmarks. We'd say this should definitely be addressed in this issue, despite the increasing runtime.

We would still like to try and fix the problem of increased runtime and suggest the following solution: checking for unreached parents is only necessary for (obedient-) reasonable orders, as the definition of natural orderings does not allow this at all. Instead of looping over all parents, it would be sufficient to loop over all reasonable orders. We propose a post-processing of the landmark graph, where parents are split into reasonable_parents and others. This should speed up the check for unreached parents significantly. (For example, there are no reasonable orders in the reported optimal configurations at all, so the function would return immediately.) 
Taking this postprocessing one step further, we could also remove all orderings which are never used later. For example, currently we only ever use greedy-necessary children for the one needed again criterion and reasonable parents for the other. Loosing everything else could possibly speed up all these functions where we loop over parents and children only to filter out the relevant ones.
I will implement the suggested change and set up an experiment to see whether it has the intended effect. 

In the meantime: are there any other opinions or thoughts about this?
msg10080 (view) Author: clemens Date: 2021-02-17.18:35:10
Malte, Thomas, Salomé, Florian and I have met for a discussion to get a deeper understanding what the LandmarkStatusManager does and what it's supposed to do. It was suggested to write a blog-entry to get deeper into the outcomes of this discussion. Hence, I will not go in much depth here.

I have implemented the discussed changes in stages, so we can see the impact of each individual change.
 - One insight was, that the update of *reached* landmarks does not correspond to the theory. Namely, instead of using the information of the parent state for the update, it copies the parent and makes the update inplace. This leads to non-deterministic behaviour. Furthermore, the name *reached* indicates that it flags all landmarks that were true at some point on the path. However, instead it only marked landmarks to be reached if all their parents are reached. With the non-determinism, this has lead to erroneous behaviour. 
   Our solution uses the intuitive notion of *reached* and marks all landmarks that were previously reached. If it has unreached parents, this is rather an indicator that the landmark is needed again. There are other reasons why a landmark is needed again and we suggest to move this parent-check there. These semantic changes altogether correspond to v1 in my new implementation.
 - From v1 to v2 I only change one line in the code: the *return* mentioned in the first message of this thread is replaced with a *continue* because it simply doesn't make sense otherwise.
 - The *continue* mentioned neglects potential reasonable orders that point towards facts that hold in the initial state. Hence, it leads to the assumption that landmarks that hold initially are reached and never needed again. However, this is not necessarily the case. A reasonable order that points towards such a landmarks could tell us otherwise and we would like to find those to improve our heuristics. Therefore, v3 does only continue for obedient-reasonable orders (which are a completely different story which we did not discuss yet) and adds such reasonable orderings pointing towards landmarks that hold initially.

I ran experiments for several configurations of landmark generators. You can find them under the following link. I might add a discussion in another message, but probably not today...
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v1-v2-v3-single-search-compare.html
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v1-v2-v3-anytime-search-compare.html
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v1-v2-v3-optimal-compare.html
msg10006 (view) Author: clemens Date: 2021-02-09.08:33:47
I have created a pull request: https://github.com/aibasel/downward/pull/23

I also ran experiments before merging the changes from main:
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v4-single-search-issue383-base-issue383-v4-compare.html
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v4-anytime-search-issue383-base-issue383-v4-compare.html

As with other issues, I cannot tell whether the impact on the runtime is too high. However, since the code before was clearly doing something strange, maybe the increased runtime is more acceptable in this case.
There are also several domains where the cost attribute is affected negatively. This was surprising to me, but I am used to results from optimal planning where costs don't change (or otherwise something is clearly wrong). Is it also concerning for satisficing planning? Clearly, we would like to find improved costs with the anytime-search configurations, which is not the case, unfortunately. What about the single-search configurations? Is it possible that the costs change after a single LAMA run? Given the significant changes in the logic, I think yes, but this is only my intuition.
msg9999 (view) Author: clemens Date: 2021-02-05.09:50:32
You were correct Salomé: it's not that simple. I realized that the last paragraph in msg9988 is incorrect.

Consider a planning task with 3 binary variables A, B, and C. Initially, none are true and the goal is C. The following operators exist:
 - o1 with pre={not A, not B, not C} and eff={A}
 - o2 with pre={A} and eff={not A, B}
 - o3 with pre={B, not C} and eff={C}
The only plan is <o1, o2, o3>. I will list the states that occur on that plan for future reference:
 - s0={not A, not B, not C}
 - s1={A, not B, not C}
 - s2={not A, B, not C}
 - s4={not A, B, C}

Several (disjunctive) fact landmarks can be identified, but for my case we'll only need L1={A} which is the only thing we can achieve from the initial state and L2={C} which is required by the goal.

There is a natural ordering L1->L2 as C cannot hold before A in all plans. However, since o2 deletes A and therefore A is not true in s2, the state right before C is achieved for the first time. According to my last statement in msg9988, L1 is a needed landmark in s2 since L1->L2, L1 does not hold, and L2 is not reached. This is incorrect!

The statement really only is true for greedy-necessary orders which denote a direct precondition to reach a landmark for the first time.

However, I think there is another case for marking a landmark as needed:
 - For a reasonable order X -> Y, if X is not reached, then mark Y as needed.

Since Y cannot hold in order to achieve X (i.e., they interfere, see msg9988), if X is not reached then Y must be reached (again) after X is reached. Note that the special case about simultaneously reaching X and Y comes into play here, which Malte suggests to get rid of. Furthermore, this is independent of whether Y holds in the current state or not.
msg9998 (view) Author: malte Date: 2021-02-04.21:08:34
Makes sense to me, although we also need to fix reasonable orders to exclude the co-occurring case.
msg9988 (view) Author: clemens Date: 2021-02-04.11:41:47
Good catch, I will update this and run the experiments again.

Actually, I think it really is this simple. Here's an informal explanation of the ordering types:
 - A natural order X -> Y denotes that in all plans X must hold strictly before Y can hold for the first time.
 - A greedy-necessary order X -> denotes that X must hold in the state right before Y becomes true for the first time. gn orders are a special case of natural orders.
 - A reasonable order X -> Y denotes that Y is needed after X and in order to achieve X, Y cannot be true (it "interferes" with Y). The order is non-natural, because plans may exist where Y holds at some point before X holds for the first time. (One unfortunate special case about reasonable orders: X and Y may become true simultaneously bye the same action.)

Independent of the ordering type of X -> Y, if Y is *not reached*, then X is needed (again) given it does not hold in the current state. I suggest to simplify the landmark status manager accordingly: a landmark is either needed or not needed. LM-count does not need more information than that, as it either counts the landmark (or rather its cost) or it doesn't count it when computing the heuristic value. In this setting, a landmark L is *needed* if either
- it has never been reached before, or
- L does not hold in the current state and an ordering L -> L' exists such that L' is not reached.
msg9987 (view) Author: salome Date: 2021-02-04.10:49:01
Oups, I meant msg9982.
msg9986 (view) Author: salome Date: 2021-02-04.10:48:01
Did you adjust the semantics of needed_again as you suggested in msg9984? The current implementation sets needed_again only for (greedy)-necessary edges, so if you left it at that I think pretty much all information from reasonable orderings will be lost. Furthermore, it could be that we loose information since only "reached" is path dependent (i.e. we intersect reached over all paths but don't do anything like that for needed again), but I'm not sure if that can make a difference (and what would even be the right thing to do).

I think we might need to rethink what needed_again actually should be on a larger scale, also in combination of allowing cycles in the landmark graph. My simplistic view would be that reached denotes whether we achieved the landmark at any point, and a landmark is needed again if it is currently false and one of its predecessors has not been reached yet. But I'm sure it's not quite as simple as that :).

As a side note: If I understand reasonable orders correctly, they in a way encode a similar information as our current semantics of needed_again: A landmark X is needed again if it is reached, currently false and greedy-necessarily ordered before another landmark Y. A reasonable order from Z to X exists if Z is ordered before Y and interferes with X (it thus makes sense to achieve Z before X, since otherwise we would achieve X, make X false in order to achieve Z and then need to achieve X again in order to achieve Y). So in a sense the reasonable order tells us that X will eventually be needed again if we make it true before Z. It's of course not the same and reasonable orders can't replace the current needed_again semantics but I found it an interesting connection.
msg9984 (view) Author: clemens Date: 2021-02-03.21:53:06
I ran an experiment with a code version as suggested: if a landmark L is true in a state, the *reached* flag is set for that landmark (even if an ordering L'->L exists where L' is not reached). The report can be found here:
https://ai.dmi.unibas.ch/_tmp_files/buechner/issue383-v2-single-search-issue383-base-issue383-v2-compare.html

It seems that wrongly claimed unsolvability is not an issue anymore. However, the costs are higher with the new version. As I have not much experience, I don't know how bad this is. Can someone help me interpret this?

I also started an experiment with two anytime-configurations that will run over night. I'll keep you updated with the results.
msg9983 (view) Author: clemens Date: 2021-02-03.17:16:27
Of course I meant to say lmcount is not goal-aware. It should be safe.
msg9982 (view) Author: clemens Date: 2021-02-03.17:12:47
I think I have tracked this down to its source. Salomé's suggestion in msg2496 to replace the return with a continue goes in the right direction, but does not fix all problems. But I'll explain that further down.

The main problem here is, that lmcount is not safe, meaning it sometimes yields heuristic values > 0 in goal states. This is the case because in the LandmarkStatusManager, a landmarks L set to "not reached" if there exists an ordering L'->L such that L' is not reached (see LandmarkStatusManager::landmark_is_leaf()). However, the loop in which this leaf-check occurs is non-deterministic: it loops over all landmarks in the order of their IDs, so if ID(L') > ID(L), then L will be evaluated before L' and be set as "not reached" since L' will only be set reached later. This can lead to problems because if L'->L is a reasonable ordering, theoretically it is fine if they become true in the same time step.

Currently, I don't know what would be the best approach to fix this. Intuitively, the PerStateBitset *reached* should really denote which landmarks have been reached at least once. I think, the information that was intended to encode with setting some landmark to be not reached if it has an unreached parent, should also be encoded in the landmark_status member which can set a landmark to be "needed_again".

Now back to why the continue is not the best solution. Basically, it is a good starting point because with this change (and the ones required to fix the above problem), the code should be fine. However, if we continue there, we don't find all orderings that could be interesting. Consider for example a landmark that holds in the initial state and must also hold in the goal, but some other landmark requires an action that deletes it. We would like to have the information that this landmark is "required again" which we can derive if a reasonable ordering pointing towards that landmark exists.
msg9981 (view) Author: silvan Date: 2021-02-03.09:05:04
In issue987, we collected a set of configurations that should be run for landmark configurations. See issue990 for scripts from which you can copy.
msg9980 (view) Author: clemens Date: 2021-02-03.08:09:14
I am currently looking into this with Thomas for our cyclic landmark heuristics. We have a potential fix and I am setting up experiments to evaluate it. I use the report in msg6121 as a starting point. Are there any other configurations that should be tested?
msg6124 (view) Author: jendrik Date: 2017-01-15.22:02:56
Sure, I added it to the agenda.
msg6123 (view) Author: malte Date: 2017-01-14.21:02:52
> Coverage goes down by a lot, especially in miconic-fulladl, parcprinter and
sokoban.
> Interestingly, miconic-fulladl is the domain that looks really good in the
> total_time plots. It seems that a lot of tasks are now detected as unsolvable
(see for example
>
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue383-v1.html#error-miconic-fulladl).

miconic-fulladl does actually have unsolvable tasks, but not that many. I think
there are 11. So this means that the change is not correct, and we will probably
have to defer this until we actually understand it better.

> Maybe we have to let this issue rest until we can look at the landmark code as
a whole.
> Let's hope its not another 3.5 years.

Jendrik, can you add revising the landmark code to the agenda for the next Fast
Downward meeting that I can attend? I think it would be good to start making a
refactoring plan.
msg6122 (view) Author: malte Date: 2017-01-14.12:52:51
> Another thing that surprised me in the results was that lama-first has the
> same coverage as seq-sat-lama-2011. I thought lama-first is the first step
> of seq-sat-lama-2011. Are the other steps not helping, or just not showing
> up in the report?

Lama is an iterated search configuration, so the later steps are only run after
the first step has succeeded. Their purpose is to find solutions of higher quality.
msg6121 (view) Author: florian Date: 2017-01-14.11:08:13
Unfortunately, this was not the easy fix, I hoped it would be.

(still all results in one report, sorry)
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue383-v1.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue383-v1-total_time-lama-first-issue383-base-issue383-v1.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue383-v1-total_time-seq-sat-lama-2011-issue383-base-issue383-v1.png

Coverage goes down by a lot, especially in miconic-fulladl, parcprinter and sokoban.
Interestingly, miconic-fulladl is the domain that looks really good in the
total_time
plots. It seems that a lot of tasks are now detected as unsolvable (see for example
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue383-v1.html#error-miconic-fulladl).

Looks like this is the same issue mentioned in msg2522 and msg2543.

Another thing that surprised me in the results was that lama-first has the same
coverage as seq-sat-lama-2011. I thought lama-first is the first step of 
seq-sat-lama-2011. Are the other steps not helping, or just not showing up in
the report?

Maybe we have to let this issue rest until we can look at the landmark code as a
whole.
Let's hope its not another 3.5 years.
msg6113 (view) Author: silvan Date: 2017-01-13.11:55:45
> I don't really understand the code either. Silvan, Salomé and Manuel have been
> involved in it more recently, and perhaps they have an idea if the change makes
> sense, although this is probably a part of the code none of us really
> understands. :-)

The recent work mainly consisted in pure code refactoring, but did not deal with
changing the semantics. Unfortunately, I also don't understand what the code
should do (or what reasonable order should do).
msg6110 (view) Author: malte Date: 2017-01-13.01:48:50
I don't really understand the code either. Silvan, Salomé and Manuel have been
involved in it more recently, and perhaps they have an idea if the change makes
sense, although this is probably a part of the code none of us really
understands. :-)

This cannot affect our optimal configurations since these don't use reasonable
orders, so I guess the worst thing that can happen if we break this is that we
make LAMA-style configurations a bit less efficient, which is something we can
test experimentally. (Potential bugs for optimal configurations are a larger
worry because we might we break admissibility by deriving wrong orders, and
ostensibly optimal configurations performing non-optimally is a worse sin than
bad performance.)

Hence, I'm also fine with merging this after an experiment. I suggest testing
with lama-first and with seq-sat-lama-2011. Because the latter is anytime, they
would need different reports, I think. (For the anytime configurations, many of
our attributes don't make sense.)

More long-term, we definitely need to clean up the landmark code. But I think
this is more something for a "task force" than for the review queue. If someone
is interested in this, I suggest we discuss the topic further once I'm back in
office.
msg6091 (view) Author: florian Date: 2017-01-12.11:50:49
I prepared a pull request for the proposed change but I really don't understand
what the code is supposed to do (the patch only recreates the control flow from
6 years ago). Malte, should I put this into the review queue? Or is someone else
familiar enough with this part of the code to review it?

https://bitbucket.org/FlorianPommerening/downward-issues/pull-requests/33/issue383/diff
msg6077 (view) Author: malte Date: 2017-01-11.18:58:09
Thanks! Regarding your suggestion, a change that makes the code deterministic is
certainly welcome if it doesn't screw too much with performance. :-)
msg6076 (view) Author: florian Date: 2017-01-11.18:55:43
The revision is 4f49059e7a0f ("Integrating emil-new branch").
msg6075 (view) Author: malte Date: 2017-01-11.18:54:11
Hi Florian, can you give the global revision ID? "630" is a local revision ID of
your repository. For example, in one of my repositories, "630" is global revision
bd51e9c0fbee ("scripts: separate preprocessing from search in experiments"),
which I assume isn't the right one. (It only touches some Python scripts.)
msg6073 (view) Author: florian Date: 2017-01-11.16:35:25
I had a look at the file history. The "return" statement that causes the
non-deterministic behavior here originated in revision 630. Before that revision
the control flow was different and maybe this change was unintended. Here is the
simplified control flow before and after revision 630:

Before revision 630:
for (LandmarkNode *node : nodes) {
    if (!obedient_orders && node->is_true_in_goal()) {
        A;
    } else if (!node->is_true_in_initial_state()) {
        B;
    }
}

After revision 630:
for (LandmarkNode *node : nodes) {
    if (node->is_true_in_initial_state())
        return;

    if (!obedient_orders && node->is_true_in_goal()) {
        A;
    } else {
        B;
    }
}

Both versions are different from the fix in msg2496 (replacing return by
continue) for landmarks that are true in the initial state *and* the goal state.
For them, the code "A;" was executed in the old version but is not executed with
return or continue. I don't really understand what the code does, so I don't
know if the old code did the right thing, but maybe this difference is worth
looking at.
msg6072 (view) Author: florian Date: 2017-01-11.15:40:56
This issue came up again in the evaluation of issue688.

If I understood the discussion below correctly, there are two problems: the loop
looking for reasonable orders breaks at a non-deterministic time, and if the
loop is run for all landmarks then it produces a secondary bug. Does this mean
that we would get the secondary bug also if we are unlucky with the
non-deterministic order? If so, I would suggest to fix the non-determinism
first, so we get the secondary bug deterministically and then open a new issue
for the bug.
msg2545 (view) Author: malte Date: 2013-07-23.14:10:53
> I tried adding the condition that the edge type needs to be bigger than
> reasonable to "if (!reached[parent_p->get_id()])", and then it could solve the
> tasks sokoban-opt08-strips/p01.pddl

With this change, would reasonable orders still have any effect on the search,
i.e., are they used for anything else?

Generally speaking, I think the use of orderings needs to be reviewed and
rethought a bit more globally. I would suggest an offline discussion.
msg2543 (view) Author: salome Date: 2013-07-21.14:18:48
After some searching we discovered that the solver wrongly reports dead ends in
certain states because it says a landmark is not reached yet even if it is true
in the current state. The function update_reached_lms in
landmark_status_manager.cc checks for landmarks that are true in the current
state if they are leaves and only if they are they are counted as reached. Leaf
means that all parents of this landmark are reached already.

What we suspect is that with reasonable orderings this doesn't work anymore,
since (at least to my understanding) reasonable orderings do not necessarily
apply. So if a solution needs to go against a reasonable order, it will end up
in a dead end since the landmark with this reasonable order will always have an
unsatisfied parent-landmark. Is this correct?

I tried adding the condition that the edge type needs to be bigger than
reasonable to "if (!reached[parent_p->get_id()])", and then it could solve the
tasks sokoban-opt08-strips/p01.pddl
msg2522 (view) Author: gabi Date: 2013-06-25.18:36:04
We found some sokoban tasks that are incorrectly classified as unsolvable with
this change, so I backed it out of default.
msg2505 (view) Author: malte Date: 2013-06-24.20:34:55
Brief note to avoid duplicated effort: I'll start a small experiment for this.
msg2504 (view) Author: malte Date: 2013-06-24.18:03:25
Actually, no point in comparing BJOLP here, as it doesn't use reasonable orders.
msg2503 (view) Author: malte Date: 2013-06-24.17:40:10
Same comments as for issue289 applies:

Given the nastiness of the landmark code, I think it'd be good to see a
before/after experiment for all LM generation changes. The two configurations
I'd suggest checking are:

- ipc seq-sat-lama-2011 on all tasks
- ipc seq-opt-bjolp on all optimal tasks without axioms/conditional effects
msg2499 (view) Author: gabi Date: 2013-06-24.11:53:29
Resolved. Thanks Salomé!
msg2496 (view) Author: salome Date: 2013-06-21.15:36:03
When running experiments for issue289 I discovered that on some problems in the
scheduling domain there was a significant difference in expansions between my
fix and the original version, even though both versions found the exact same
landmarks and thus should behave exactly the same. Sometimes the fix has 10
times more expansions or is even not solvable while the other version has below
100'000 expansions.

I backtracked the error to the generation of reasonable orderings: The function
approximate_reasonable_orders in landmark_factory.cc loops over all landmarks
and searches for reasonable orders. As soon as a landmark is true in the initial
state, the function is exited altogether without checking the rest of the
landmarks. Since we cannot know in which order the Landmarks are passed (it
iterates through a set of Landmark pointers) this leads to non-deterministic
behavior.

I think there should rather be a continue statement instead of exiting the
function, as it is just above when checking for disjunctive landmarks. I tried
this out on two problems already and it seems to work, but I will test this further.
History
Date User Action Args
2023-09-22 10:46:47clemenssetstatus: chatting -> resolved
messages: + msg11394
2023-09-22 10:27:04clemenssetmessages: + msg11393
2023-09-21 20:13:45maltesetmessages: + msg11391
2023-09-15 16:44:28salomesetmessages: + msg11382
2023-09-15 14:01:17clemenssetmessages: + msg11381
2023-09-15 13:59:05clemenssetmessages: + msg11380
2023-09-13 17:10:46maltesetmessages: + msg11379
2023-09-13 17:09:54clemenssetmessages: + msg11378
2023-08-24 17:22:48clemenssetmessages: + msg11290
2023-08-22 10:56:37clemenssetmessages: + msg11287
2021-09-14 10:05:22clemenssetmessages: + msg10437
2021-02-18 20:20:37clemenssetmessages: + msg10101
2021-02-18 18:37:47salomesetmessages: + msg10100
2021-02-18 18:07:11clemenssetmessages: + msg10098
2021-02-17 18:35:11clemenssetmessages: + msg10080
2021-02-09 08:33:48clemenssetmessages: + msg10006
2021-02-05 09:50:32clemenssetmessages: + msg9999
2021-02-04 21:08:34maltesetmessages: + msg9998
2021-02-04 11:41:47clemenssetmessages: + msg9988
2021-02-04 10:49:01salomesetmessages: + msg9987
2021-02-04 10:48:01salomesetmessages: + msg9986
2021-02-03 21:53:06clemenssetmessages: + msg9984
2021-02-03 17:16:27clemenssetmessages: + msg9983
2021-02-03 17:12:49clemenssetmessages: + msg9982
2021-02-03 09:05:04silvansetmessages: + msg9981
2021-02-03 08:09:14clemenssetnosy: + thomas, clemens
messages: + msg9980
2017-01-15 22:02:56jendriksetmessages: + msg6124
2017-01-14 21:02:52maltesetmessages: + msg6123
2017-01-14 12:52:51maltesetmessages: + msg6122
2017-01-14 11:08:13floriansetmessages: + msg6121
2017-01-13 11:55:45silvansetmessages: + msg6113
2017-01-13 01:48:50maltesetmessages: + msg6110
2017-01-12 11:50:49floriansetmessages: + msg6091
2017-01-11 18:58:09maltesetmessages: + msg6077
2017-01-11 18:55:43floriansetmessages: + msg6076
2017-01-11 18:54:11maltesetmessages: + msg6075
2017-01-11 16:35:25floriansetmessages: + msg6073
2017-01-11 15:40:56floriansetnosy: + florian
messages: + msg6072
2013-07-23 14:10:53maltesetmessages: + msg2545
2013-07-21 14:18:48salomesetmessages: + msg2543
2013-06-25 18:36:04gabisetmessages: + msg2522
2013-06-24 20:34:55maltesetmessages: + msg2505
2013-06-24 18:03:25maltesetmessages: + msg2504
2013-06-24 17:40:10maltesetstatus: resolved -> chatting
messages: + msg2503
2013-06-24 11:53:29gabisetstatus: unread -> resolved
nosy: + gabi
messages: + msg2499
2013-06-24 11:24:16silvansetnosy: + silvan
2013-06-21 18:57:33jendriksetnosy: + jendrik
2013-06-21 15:37:58maltesetnosy: + malte
2013-06-21 15:36:04salomecreate