Issue1038

Title lazy search with reopening does not always respect the g bound
Priority bug Status chatting
Superseder Nosy List jendrik, malte
Assigned To Keywords
Optional summary

Created on 2021-10-31.18:25:18 by jendrik, last changed by jendrik.

Messages
msg10458 (view) Author: jendrik Date: 2021-11-02.10:17:03
Thanks for the analysis, Malte! I think your hypothesis makes sense. I also verified it by comparing real_g values to path costs of all registered states at the end of the lazy search using the configuration from msg10456.

diff --git a/src/search/search_engines/lazy_search.cc b/src/search/search_engines/lazy_search.cc
index 754230748..0704d1e0c 100644
--- a/src/search/search_engines/lazy_search.cc
+++ b/src/search/search_engines/lazy_search.cc
@@ -158,7 +158,7 @@ SearchStatus LazySearch::step() {
 
     SearchNode node = search_space.get_node(current_state);
     bool reopen = reopen_closed_nodes && !node.is_new() &&
-        !node.is_dead_end() && (current_g < node.get_g());
+        !node.is_dead_end() && (current_g < node.get_g()) && (current_real_g < bound);
 
     if (node.is_new() || reopen) {
         if (current_operator_id != OperatorID::no_operator) {
@@ -189,8 +189,19 @@ SearchStatus LazySearch::step() {
                 }
             }
             node.close();
-            if (check_goal_and_set_plan(current_state))
+            if (check_goal_and_set_plan(current_state)) {
+                for (StateID state_id : state_registry) {
+                    const State &state = state_registry.lookup_state(state_id);
+                    vector<OperatorID> path;
+                    search_space.trace_path(state, path);
+                    int path_cost = calculate_plan_cost(path, task_proxy);
+                    SearchNode node = search_space.get_node(state);
+                    if (path_cost != node.get_real_g()) {
+                        cerr << state.get_id() << ": " << path_cost << " != " << node.get_real_g() << endl;
+                    }
+                }
                 return SOLVED;
+            }
             if (search_progress.check_progress(current_eval_context)) {
                 statistics.print_checkpoint_line(current_g);
                 reward_progress();

 
     vector<OperatorID> applicable_ops;
     successor_generator.generate_applicable_ops(s, applicable_ops);

The output is (order: stateID, path cost, real_g):

[...]
#73824: 44 != 42
#73854: 19 != 21
#73895: 44 != 42
#73923: 42 != 40
#73983: 44 != 42
#73984: 44 != 42
#74965: 20 != 22
#74971: 18 != 20
#74972: 18 != 20
#76269: 44 != 42
#76271: 44 != 42
#76272: 44 != 42
#76518: 13 != 15
#78202: 23 != 27
#78203: 24 != 28
#78652: 43 != 41

So the error goes in both directions and indeed some states have path costs >= 43 while all states have real_g values below 43.

Do you have an idea what to do about this? A pragmatic solution would be to discourage transforming the cost function for the search when a bound is given and nodes are reopened.
msg10457 (view) Author: malte Date: 2021-10-31.21:16:59
Hi Jendrik, I can confirm the bug, both for the original revision you're referencing here and with the suggested fix (which looks good to me).

As to why this happens, I have a hypothesis. Our g values and real_g values are not actually correct in general.

Let's ignore the g/real_g distinction for a second and start with something simpler, a standard GBFS (lazy or eager) with no bound and no cost transformations and hence real_g not coming into play. When we find a new path to a node that is better than previous ones and reopening is enabled, we update the g value of this node based on its parent and the action cost. But that means that the g values of all descendants of this node are now wrong, and there is in general no guarantee that they will be reexpanded and update before we terminate. So the g values are in general not internally consistent: for example, we can have a node with a g value of 10 and a parent connected by a cost-1 action with a g value of 8.

As such, this is not necessarily a huge problem for pruning and eventual optimality. It cannot cause the kind of problem you're seeing here. In particular, we can only overestimate g due to such errors. (But this already shows that our pruning is a bit shaky.)

Now, with cost transformations we can also get an error in the opposite direction because our reopening and reconnecting of parents is tied to the *original* g values.

Let's say we have a node with state s, g = 10 and real_g = 5. Later, we find a new path to s with g = 8 and real_g = 7. We will update the parent (because we reopen based on g), now switching to the more expensive path. We also update the real_g value to reflect this, but our update also affects the real path costs of the descendants of this node, and *their* real_g values are not updated and are now too optimistic. So I assume in the example we see here, real_g never reaches 43 but the actual real path cost does.

Does this hypothesis make sense to you? Then the next step would be to test it, and then to think what we can do about it.
msg10456 (view) Author: jendrik Date: 2021-10-31.18:25:17
Under revision 80f9815824738862def6c378b86f0f19cf9267b4, when I run 

./build.py debug && ./fast-downward.py --debug ../benchmarks/sokoban-opt08-strips/p11.pddl --evaluator 'hcea=cea(transform=adapt_costs(one))' --evaluator 'hlm=lmcount(lm_reasonable_orders_hps(lm_rhw()),transform=adapt_costs(one))' --search 'lazy_wastar([hcea,hlm],w=3,preferred=[hcea,hlm],cost_type=one,reopen_closed=true,verbosity=silent,bound=43)'

the planner quickly finds a plan with cost 43, which violates the g bound. I thought the fix would be the following:

diff --git a/src/search/search_engines/lazy_search.cc b/src/search/search_engines/lazy_search.cc
index 754230748..a4bc41925 100644
--- a/src/search/search_engines/lazy_search.cc
+++ b/src/search/search_engines/lazy_search.cc
@@ -158,7 +158,7 @@ SearchStatus LazySearch::step() {
 
     SearchNode node = search_space.get_node(current_state);
     bool reopen = reopen_closed_nodes && !node.is_new() &&
-        !node.is_dead_end() && (current_g < node.get_g());
+        !node.is_dead_end() && (current_g < node.get_g()) && (current_real_g < bound);
 
     if (node.is_new() || reopen) {
         if (current_operator_id != OperatorID::no_operator) {

This actually helps for some tasks and configurations, but not for the above invocation. I have gone over the code many times now, but don't see where the bug is. Does anybody have an idea why lazy search with reopening fails to respect the g bound?
History
Date User Action Args
2021-11-02 10:17:03jendriksetmessages: + msg10458
2021-10-31 21:17:00maltesetstatus: unread -> chatting
messages: + msg10457
2021-10-31 18:25:18jendrikcreate