Issue127

Title ff() and add() heuristics much slower than lmcut()
Priority bug Status resolved
Superseder Nosy List erez, gabi, malte, patrik, silvia
Assigned To malte Keywords 1.0
Optional summary

Created on 2010-10-02.06:17:29 by patrik, last changed by malte.

Files
File name Uploaded Type Edit Remove
all.groups.gz patrik, 2010-10-03.05:30:11 application/x-gzip
binary-all.groups.gz patrik, 2010-10-04.06:45:20 application/x-gzip
binary-output.sas.gz patrik, 2010-10-04.06:44:57 application/x-gzip
issue127.soln malte, 2011-03-04.18:27:13 application/octet-stream
output.sas.gz patrik, 2010-10-02.06:17:29 application/x-gzip
Messages
msg1260 (view) Author: malte Date: 2011-03-05.05:46:08
I was just slightly surprised to see that the CG heuristic achieved the best
quality here (with quite a gap), so I was a little bit suspicious if there might
be something fishy going on. But there is no reason to suspect anything untoward
going on that would be related to the original issue, so you can indeed safely
ignore this. :-)
msg1259 (view) Author: patrik Date: 2011-03-05.03:44:36
I've long since forgotten which instance this was (and probably also changed the
domain), so I can't validate the plan. I can of course run the planner (with
these heuristic configs) on some problems from the (current version of the) same
domain.

But is validity really an issue? It should be unaffected by changes to
heuristic, right? (and this is not a "complicated" domain; no cond effs with
simultaneous add/dels or other weird stuff ;))
msg1256 (view) Author: malte Date: 2011-03-04.19:02:12
> I still have to look into the reasonable orders things.

I'm not sure if it has anything to do with reasonable orders actually, or if
this is just the last thing that is printed in the output. In any case, this is
now the separate issue222, and I'm closing this one.
msg1253 (view) Author: malte Date: 2011-03-04.18:27:13
OK, I reran Patrik's tasks on the most recent code, and performance
looks much better -- for example eager_greedy(ff(cost_type=1)) solves
the task within 13.87 seconds using 41448 evaluations, so there are
several thousand evaluations per second. So the basic problem seems to
be fixed. :-) I still have to look into the reasonable orders things.

I tested four heuristics (ff, add, cea, cg) using cost_type=1 or
cost_type=2, with lazy_greedy and eager_greedy and without preferred
operators. (I also did some basic tests with preferred operators, in
which they made performance a bit worse.) We can solve the problem
with ff and cg (either search algorithm), but not with add or cea. On
the binary variant, cea and add give the same init_h value as
expected.

In detail, these are the results I got on the non-binary variant (60
second timeout, debug mode of the planner):

eager_greedy(cea(cost_type=1)):
init_h = 2761048
timed out

eager_greedy(cea(cost_type=2)):
init_h = 3571840
timed out

eager_greedy(add(cost_type=1)):
init_h = 15659154
timed out

eager_greedy(add(cost_type=2)):
init_h = 18667772
timed out

eager_greedy(ff(cost_type=1)):
init_h = 676
solved with length 705, cost 65 after 14.61 seconds

eager_greedy(ff(cost_type=2)):
init_h = 770
solved with length 701, cost 54 after 17.77 seconds

eager_greedy(cg(cost_type=1)):
init_h = 10383
solved with length 676, cost 40 after 16.57 seconds

eager_greedy(cg(cost_type=2)):
init_h = 10450
solved with length 692, cost 39 after 16.13 seconds

I'm attaching the best plan found with these configs so that you can
validate it (I can only validate PDDL-based stuff, but you probably
have some way to validate this, right?).
msg1029 (view) Author: malte Date: 2011-01-05.05:16:48
This should be retested with the new (as of IPC) implementations of these
heuristics. I hope this is now fixed for ff() and add(), and should be fixed for
cea() and cg() once issue186 is resolved.
msg543 (view) Author: patrik Date: 2010-10-04.07:28:18
Yes, I also had the crash with lazy_greedy(add()). (But no more, after update to
r4959.)

After applying fixes like those suggested, ff is much faster. (For issue A, I
added prop->reached_by = 0 just after the first "if"; that should have the same
effect, right?)

Binary versions of the problem files attached.
msg542 (view) Author: malte Date: 2010-10-03.16:48:01
PS: Patrik, h^add() is still much slower here per node than h^max(), which is
due to the bucket-based implementation inside the heuristic which is problematic
if h values are very large compared to the number of propositions in the task.
But the performance difference on this task is now much, much smaller than before.

We'll deal with the remaining wastage eventually as part of properly supporting
action costs, but it's somewhat difficult to do that in a way that doesn't
degrade performance in the "easy" cases, so that needs more thought. (I should
note that h^add and consequently FF(h^add) have this problem already in the
unit-cost case since h^add values can be large also for unit-cost problems. So
this is not strictly related to properly supporting action costs -- but we'll
need similar mechanisms in the implementation, so it's worth handling these
together.)
msg541 (view) Author: malte Date: 2010-10-03.16:38:54
A proper fix for this for the additive heuristic has been committed in r4958;
relaxed plans are no longer computed there.

I made some basic sanity tests, but more tests are of course appreciated.

Still need to fix this for the FF heuristic and check other heuristics that may
have related issues:
 * FF heuristic in the LAMA/FF synergy implementation
 * context-enhanced additive heuristic
 * causal graph heuristic

Also, we still need to look into the reasonable order thing.
msg540 (view) Author: malte Date: 2010-10-03.15:51:40
Great, thanks Erez!
msg539 (view) Author: malte Date: 2010-10-03.15:50:02
Here's the explanation of what's going on. The relaxed plan computation
(AdditiveHeuristic::collect_relaxed_plan and the code that calls it) is
the culprit. Why does the additive heuristic compute a relaxed plan at
all? Here's the explanation from the code (all code in this message from
additive_heuristic.cc):

// Collecting the relaxed plan marks helpful actions as preferred.
// The plan itself is not used, but it is good to have some sort
// of mechanism not to visit the same operator twice during
// backchaining.

(Somewhat ironically, that purpose of not revisiting the same operator
is not actually achieved at all -- see point (A) below. However, let's
assume we want to keep the relaxed plan, e.g. for fixing this for the FF
heuristic.)

The max and LM-cut heuristics don't support preferred operators, so they
don't compute relaxed plans and hence don't suffer from this slowness.

The relaxed plan computation is not actually necessary if you don't want
preferred operators, so the whole block related to relaxed plans in
AdditiveHeuristic::compute_heuristic could be safely commented out for
your purposes. But of course it's interesting to know why relaxed plan
computation is so slow, and it can be fixed. There is a combination of
factors at work here:

  (A) unnecessary recursion
  (B) unnecessary dynamic memory allocation
  (C) bad guesses of how large the relaxed plan will be

To give some runtime numbers of what matters how much here, I've hacked
SearchProgress::inc_evaluated() in search_progress.h to exit after 100
evaluated states and then run
# ./downward --search 'astar(add())' < output
on a bunch of different versions of the code, which gives the following
runtime (CPU time sys + user):

baseline (r4956):                 23.609 seconds
baseline + (A) fixed:              5.408 seconds
baseline + (A) + (B) fixed:        2.576 seconds
baseline + (A) + (B) + (C) fixed:  0.764 seconds
baseline without relaxed plans:    0.760 seconds

And here's what is wrong with the code and how to fix it. I'm quoting
the two relevant methods from additive_heuristic.cc in their original
form (baseline) with some comments added that explain how to fix (A),
(B) and (C):

First, (A): unnecessary recursions. Here's the code of
collect_relaxed_plan that computes the relaxed plan and as a side effect
marks its applicable operators as preferred.

void AdditiveHeuristic::collect_relaxed_plan(Proposition *goal,
                                             RelaxedPlan &relaxed_plan) {
    UnaryOperator *unary_op = goal->reached_by;
    if(unary_op) { // We have not yet chained back to a start node.
        // [issue127] To fix (A), uncomment the following three lines:
        // if (unary_op->unsatisfied_preconditions == -42)
        //     return;
        // unary_op->unsatisfied_preconditions = -42;
	for(int i = 0; i < unary_op->precondition.size(); i++)
	    collect_relaxed_plan(unary_op->precondition[i], relaxed_plan);
	const Operator *op = unary_op->op;
	bool added_to_relaxed_plan = relaxed_plan.insert(op).second;
	if(added_to_relaxed_plan
	   && unary_op->h_add_cost == unary_op->base_cost
	   && !op->is_axiom()) {
	    set_preferred(op); // This is a helpful action.
	}
    }
}

If you ignore the code in the comments, you'll see that the recursion
can revisit the same unary operator a huge number of times --
essentially, if an operator contributes to the h_add value X times, it
will receive X recursive calls. In cases where h_add >> h_FF, this is a
huge overhead, and the commented-out code removes it. (Of course that's
a hackish way, not how it should actually be fixed.)

I just looked at the h^cea code, and it looks like that may have a
similar problem, which probably means that the CG heuristic also has a
similar problem. So these should be fixed, too; I'll open a separate
issue for that. Unfortunately, I don't have a test case for h^cea (much
less h^CG), since on this task we get infinite h^cea values.

Patrik, how difficult would it be for you to generate an encoding of
this problem with binary variables only? In that case, h^cea should
equal h^add, so we should be able to reproduce the problem.

For (B) and (C), we need to look into the method that calls
collect_relaxed_plan:

int AdditiveHeuristic::compute_heuristic(const State &state) {
    setup_exploration_queue();
    setup_exploration_queue_state(state);
    relaxed_exploration();

    int total_cost = 0;
    for(int i = 0; i < goal_propositions.size(); i++) {
	int prop_cost = goal_propositions[i]->h_add_cost;
	if(prop_cost == -1)
	    return DEAD_END;
	total_cost += prop_cost;
    }

    // [issue127] Comment out the following block to get rid of
    //            preferred operators altogether.
    RelaxedPlan relaxed_plan;
    // [issue127] To fix (B), remove the preceding line and replace
    //            with the following two:
    // static RelaxedPlan relaxed_plan;
    // relaxed_plan.clear();
    relaxed_plan.resize(2 * total_cost);
    // [issue127] After fixing (B), remove the preceding line to fix (C).
    // Collecting the relaxed plan marks helpful actions as preferred.
    // The plan itself is not used, but it is good to have some sort
    // of mechanism not to visit the same operator twice during
backchaining.
    for(int i = 0; i < goal_propositions.size(); i++)
        collect_relaxed_plan(goal_propositions[i], relaxed_plan);
    // [issue127] The following commented out line contains some
    //            diagnostics that could be used before fixing (B) and
    //            (C) to see how bad our plan size estimates are.
    // cout << "***" << total_cost << "+++" << relaxed_plan.size() <<
"..." << relaxed_plan.bucket_count() << endl;

    return total_cost;
}

Note that in the original form, ignoring the comments, this creates a
new relaxed plan (which is a hash_set) for every heuristic computation.
This causes an overhead compared to simply generating it once and
reusing it all the time, due to initialization of the hash_map, dynamic
memory allocation, etc. So our fix to issue (B) is to make the relaxed
plan static, which of course means we have to clear it at some point
(doesn't make it smaller, just empties the buckets, so there should be
no memory allocation overhead the next time it is filled). Here it's
cleared before being used, but it may be nicer to clear it after being
used, before "return total_cost".

Regarding issue (C), note the line "relaxed_plan.resize(2 *
total_cost);" The idea behind that is that we want to avoid too many
expensive resizings of the relaxed plan (which require rehashing etc.),
so we initialize the hash map to a "best guess" of how large it's going
to be. The factor 2 is there to add some slack which hash tables need,
although 2 is probably more than necessary. But more importantly, our
guess for the required size is "total_cost", the h_add value of the
state. In this case, this is a terrible guess: e.g. for the initial
state we have h_add(init) = 15659154 and h_FF(init) = 683, so we
overshoot by a factor of more than 20000.

In the case where we fix neither (B) and (C), this means that for every
heuristic computation we initialize a new hash set with about 30 million
entries, where really it would be enough to have a hash set with around
a thousand entries. You can see that this is going to be expensive.

The solution is simple: after we've made the relaxed plan static, there
is no reason to be afraid of reallocations any more since the bucket
size will grow monotonically over time and hence this is easily
amortized over different heuristic value computations. So we simply get
rid of the resize.

(One issue here is that I don't actually know if there's a guarantee for
hash_sets that the bucket size doesn't decrease if you clear(); anyone
know something definitive about that? I'm assuming that there should be
such a guarantee since hash_set::clear is analogous to vector::reserve()
etc., but I'd rather be on the safe side.)
msg538 (view) Author: erez Date: 2010-10-03.15:33:47
I can confirm the bug with lazy, and have already fixed it in r4957.

It was caused by initializing the lowest_bucket member in StandardOpenList to 
999999, which is lower than the heuristic value. I changed it to 
numeric_limits<int>::max().
msg537 (view) Author: malte Date: 2010-10-03.15:17:16
Mystery solved, I think. Detailed explanation follows in a longer message.

Nosying everyone since this may affect all users of h_add and h_FF.

Very short summary: it's a performance issue which degrades performance of h_add
and h_FF to Omega(h_add value of the state), which can be terrible in cases
where h_add is grossly overestimating. In your example, this is the case: we
have h_add(init) = 15659154 compared to h_FF(init) = 683. (I don't think it
matters that the task has action costs since as far as I can tell these are
ignored by these heuristics at the moment.)

If anyone knows any benchmark domains where h_add >> h_FF, please let me know so
that we can look at performance there, too!


But firstly, Patrik, can you confirm the crash with lazy_greedy(add())?
If yes, we should open a separate issue for that.

I guess we should also open a separate issue for the slow reasonable order
computation on this task. Anyone got any ideas on that? (Silvia, don't bother
with this if you'd rather work on your thesis...)
msg536 (view) Author: malte Date: 2010-10-03.14:05:00
Hmmm... I can confirm that the FF heuristic takes ages to evaluate states on
this example, that h^max is much faster and that landmark generation takes a
long time (aborted after a minute). However, I get a different result with
h^add: a crash more or less immediately after search is started:

helmert@alfons:~/planning/trunk/downward/search$ ./downward --search
'lazy_greedy(add())' < output
Simplifying transitions... done!
Conducting lazy best first search, bound = 2147483647
Initializing additive heuristic...
Simplifying 11802 unary operators... done! [8753 unary operators]
Best heuristic value: 15659154 [g=0, 1 evaluated, 0 expanded, t=1.28s]
Peak memory: 211216 KB
caught signal 11 -- exiting
Segmentation fault

Did you try with the same search algorithm?

This may be a (separate) issue with the "lazy_greedy" search related to the
large h values -- don't think we've seen values like that before, so maybe
there's another overflow issue somewhere there. With A* I get the same slowness
as with the FF heuristic.
msg535 (view) Author: malte Date: 2010-10-02.17:31:04
As I recall it, Blai's change should make the code slower in cases where action
costs are low. (Unfortunately I don't see a way to implement things in a way
that is efficient both for unit-cost and high-action-cost settings.)

I really wonder what's going on here. Will look into it.

Note: the landmark generation code (in particular generating reasonable orders)
also appears to be very slow for this task.

Patrik, can you also attach the all.groups file (or original PDDL)? That is
needed for some parts of the landmark code.
msg534 (view) Author: patrik Date: 2010-10-02.06:17:29
Example problem attached (only the output.sas file; note: this file was not
generated by the downward translator, but by other means; it's accepted by the
preprocessor, though, and the preprocessors output is accepted by the search
engine, which generates a valid plan, so I don't think there's an issue with the
input file).

On this problem, the ff and additive heuristics are much slower than lmcut: the
former evaluate ~1 node/second, while lmcut manages ~400 nodes/sec. My first
guess was that this is because the problem has many zero cost actions (which are
not recognised by the ff or add heuristics), but the hmax() heuristic, which
also does not consider action costs, achieves a node evaluation rate that is
comparable with lmcut. (All heuristics were tested in astar() search, but the
slowness of ff and add remains also in, e.g., lazy_greedy.)

Might this have something to do with the improvement of lmcut that Blai did
recently? (cg. issue #116; although this domain doesn't have high action
costs... and it looks like Blai's fix isn't included in the version I'm using
anyway).
History
Date User Action Args
2011-03-05 05:46:08maltesetstatus: chatting -> resolved
messages: + msg1260
2011-03-05 03:44:36patriksetstatus: resolved -> chatting
messages: + msg1259
2011-03-04 19:02:12maltesetstatus: testing -> resolved
messages: + msg1256
2011-03-04 18:27:13maltesetfiles: + issue127.soln
messages: + msg1253
2011-01-05 20:45:56maltesetstatus: chatting -> testing
2011-01-05 05:16:48maltesetmessages: + msg1029
2010-10-15 18:14:15maltesetkeyword: + 1.0
2010-10-04 07:28:18patriksetmessages: + msg543
2010-10-04 06:45:20patriksetfiles: + binary-all.groups.gz
2010-10-04 06:44:57patriksetfiles: + binary-output.sas.gz
2010-10-03 16:48:01maltesetmessages: + msg542
2010-10-03 16:38:54maltesetmessages: + msg541
2010-10-03 15:51:40maltesetmessages: + msg540
2010-10-03 15:50:02maltesetmessages: + msg539
2010-10-03 15:33:47erezsetmessages: + msg538
2010-10-03 15:17:16maltesetnosy: + erez, gabi, silvia
messages: + msg537
2010-10-03 14:05:00maltesetmessages: + msg536
2010-10-03 05:30:11patriksetfiles: + all.groups.gz
2010-10-02 17:31:04maltesetstatus: unread -> chatting
assignedto: malte
messages: + msg535
2010-10-02 17:27:05maltesetnosy: + malte
2010-10-02 06:17:29patrikcreate