Issue 127: ff() and add() heuristics much slower than lmcut()

Title	ff() and add() heuristics much slower than lmcut()
Priority	bug	Status	resolved
Superseder		Nosy List	erez, gabi, malte, patrik, silvia
Assigned To	malte	Keywords	1.0
Optional summary

Created on 2010-10-02.06:17:29 by patrik, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
all.groups.gz	patrik, 2010-10-03.05:30:11	application/x-gzip
binary-all.groups.gz	patrik, 2010-10-04.06:45:20	application/x-gzip
binary-output.sas.gz	patrik, 2010-10-04.06:44:57	application/x-gzip
issue127.soln	malte, 2011-03-04.18:27:13	application/octet-stream
output.sas.gz	patrik, 2010-10-02.06:17:29	application/x-gzip

Messages
msg1260 (view)	Author: malte	Date: 2011-03-05.05:46:08
I was just slightly surprised to see that the CG heuristic achieved the best quality here (with quite a gap), so I was a little bit suspicious if there might be something fishy going on. But there is no reason to suspect anything untoward going on that would be related to the original issue, so you can indeed safely ignore this. :-)
msg1259 (view)	Author: patrik	Date: 2011-03-05.03:44:36
I've long since forgotten which instance this was (and probably also changed the domain), so I can't validate the plan. I can of course run the planner (with these heuristic configs) on some problems from the (current version of the) same domain. But is validity really an issue? It should be unaffected by changes to heuristic, right? (and this is not a "complicated" domain; no cond effs with simultaneous add/dels or other weird stuff ;))
msg1256 (view)	Author: malte	Date: 2011-03-04.19:02:12
> I still have to look into the reasonable orders things. I'm not sure if it has anything to do with reasonable orders actually, or if this is just the last thing that is printed in the output. In any case, this is now the separate issue222, and I'm closing this one.
msg1253 (view)	Author: malte	Date: 2011-03-04.18:27:13
OK, I reran Patrik's tasks on the most recent code, and performance looks much better -- for example eager_greedy(ff(cost_type=1)) solves the task within 13.87 seconds using 41448 evaluations, so there are several thousand evaluations per second. So the basic problem seems to be fixed. :-) I still have to look into the reasonable orders things. I tested four heuristics (ff, add, cea, cg) using cost_type=1 or cost_type=2, with lazy_greedy and eager_greedy and without preferred operators. (I also did some basic tests with preferred operators, in which they made performance a bit worse.) We can solve the problem with ff and cg (either search algorithm), but not with add or cea. On the binary variant, cea and add give the same init_h value as expected. In detail, these are the results I got on the non-binary variant (60 second timeout, debug mode of the planner): eager_greedy(cea(cost_type=1)): init_h = 2761048 timed out eager_greedy(cea(cost_type=2)): init_h = 3571840 timed out eager_greedy(add(cost_type=1)): init_h = 15659154 timed out eager_greedy(add(cost_type=2)): init_h = 18667772 timed out eager_greedy(ff(cost_type=1)): init_h = 676 solved with length 705, cost 65 after 14.61 seconds eager_greedy(ff(cost_type=2)): init_h = 770 solved with length 701, cost 54 after 17.77 seconds eager_greedy(cg(cost_type=1)): init_h = 10383 solved with length 676, cost 40 after 16.57 seconds eager_greedy(cg(cost_type=2)): init_h = 10450 solved with length 692, cost 39 after 16.13 seconds I'm attaching the best plan found with these configs so that you can validate it (I can only validate PDDL-based stuff, but you probably have some way to validate this, right?).
msg1029 (view)	Author: malte	Date: 2011-01-05.05:16:48
This should be retested with the new (as of IPC) implementations of these heuristics. I hope this is now fixed for ff() and add(), and should be fixed for cea() and cg() once issue186 is resolved.
msg543 (view)	Author: patrik	Date: 2010-10-04.07:28:18
Yes, I also had the crash with lazy_greedy(add()). (But no more, after update to r4959.) After applying fixes like those suggested, ff is much faster. (For issue A, I added prop->reached_by = 0 just after the first "if"; that should have the same effect, right?) Binary versions of the problem files attached.
msg542 (view)	Author: malte	Date: 2010-10-03.16:48:01
PS: Patrik, h^add() is still much slower here per node than h^max(), which is due to the bucket-based implementation inside the heuristic which is problematic if h values are very large compared to the number of propositions in the task. But the performance difference on this task is now much, much smaller than before. We'll deal with the remaining wastage eventually as part of properly supporting action costs, but it's somewhat difficult to do that in a way that doesn't degrade performance in the "easy" cases, so that needs more thought. (I should note that h^add and consequently FF(h^add) have this problem already in the unit-cost case since h^add values can be large also for unit-cost problems. So this is not strictly related to properly supporting action costs -- but we'll need similar mechanisms in the implementation, so it's worth handling these together.)
msg541 (view)	Author: malte	Date: 2010-10-03.16:38:54
A proper fix for this for the additive heuristic has been committed in r4958; relaxed plans are no longer computed there. I made some basic sanity tests, but more tests are of course appreciated. Still need to fix this for the FF heuristic and check other heuristics that may have related issues: * FF heuristic in the LAMA/FF synergy implementation * context-enhanced additive heuristic * causal graph heuristic Also, we still need to look into the reasonable order thing.
msg540 (view)	Author: malte	Date: 2010-10-03.15:51:40
Great, thanks Erez!
msg539 (view)	Author: malte	Date: 2010-10-03.15:50:02
Here's the explanation of what's going on. The relaxed plan computation (AdditiveHeuristic::collect_relaxed_plan and the code that calls it) is the culprit. Why does the additive heuristic compute a relaxed plan at all? Here's the explanation from the code (all code in this message from additive_heuristic.cc): // Collecting the relaxed plan marks helpful actions as preferred. // The plan itself is not used, but it is good to have some sort // of mechanism not to visit the same operator twice during // backchaining. (Somewhat ironically, that purpose of not revisiting the same operator is not actually achieved at all -- see point (A) below. However, let's assume we want to keep the relaxed plan, e.g. for fixing this for the FF heuristic.) The max and LM-cut heuristics don't support preferred operators, so they don't compute relaxed plans and hence don't suffer from this slowness. The relaxed plan computation is not actually necessary if you don't want preferred operators, so the whole block related to relaxed plans in AdditiveHeuristic::compute_heuristic could be safely commented out for your purposes. But of course it's interesting to know why relaxed plan computation is so slow, and it can be fixed. There is a combination of factors at work here: (A) unnecessary recursion (B) unnecessary dynamic memory allocation (C) bad guesses of how large the relaxed plan will be To give some runtime numbers of what matters how much here, I've hacked SearchProgress::inc_evaluated() in search_progress.h to exit after 100 evaluated states and then run # ./downward --search 'astar(add())' < output on a bunch of different versions of the code, which gives the following runtime (CPU time sys + user): baseline (r4956): 23.609 seconds baseline + (A) fixed: 5.408 seconds baseline + (A) + (B) fixed: 2.576 seconds baseline + (A) + (B) + (C) fixed: 0.764 seconds baseline without relaxed plans: 0.760 seconds And here's what is wrong with the code and how to fix it. I'm quoting the two relevant methods from additive_heuristic.cc in their original form (baseline) with some comments added that explain how to fix (A), (B) and (C): First, (A): unnecessary recursions. Here's the code of collect_relaxed_plan that computes the relaxed plan and as a side effect marks its applicable operators as preferred. void AdditiveHeuristic::collect_relaxed_plan(Proposition goal, RelaxedPlan &relaxed_plan) { UnaryOperator unary_op = goal->reached_by; if(unary_op) { // We have not yet chained back to a start node. // [issue127] To fix (A), uncomment the following three lines: // if (unary_op->unsatisfied_preconditions == -42) // return; // unary_op->unsatisfied_preconditions = -42; for(int i = 0; i < unary_op->precondition.size(); i++) collect_relaxed_plan(unary_op->precondition[i], relaxed_plan); const Operator op = unary_op->op; bool added_to_relaxed_plan = relaxed_plan.insert(op).second; if(added_to_relaxed_plan && unary_op->h_add_cost == unary_op->base_cost && !op->is_axiom()) { set_preferred(op); // This is a helpful action. } } } If you ignore the code in the comments, you'll see that the recursion can revisit the same unary operator a huge number of times -- essentially, if an operator contributes to the h_add value X times, it will receive X recursive calls. In cases where h_add >> h_FF, this is a huge overhead, and the commented-out code removes it. (Of course that's a hackish way, not how it should actually be fixed.) I just looked at the h^cea code, and it looks like that may have a similar problem, which probably means that the CG heuristic also has a similar problem. So these should be fixed, too; I'll open a separate issue for that. Unfortunately, I don't have a test case for h^cea (much less h^CG), since on this task we get infinite h^cea values. Patrik, how difficult would it be for you to generate an encoding of this problem with binary variables only? In that case, h^cea should equal h^add, so we should be able to reproduce the problem. For (B) and (C), we need to look into the method that calls collect_relaxed_plan: int AdditiveHeuristic::compute_heuristic(const State &state) { setup_exploration_queue(); setup_exploration_queue_state(state); relaxed_exploration(); int total_cost = 0; for(int i = 0; i < goal_propositions.size(); i++) { int prop_cost = goal_propositions[i]->h_add_cost; if(prop_cost == -1) return DEAD_END; total_cost += prop_cost; } // [issue127] Comment out the following block to get rid of // preferred operators altogether. RelaxedPlan relaxed_plan; // [issue127] To fix (B), remove the preceding line and replace // with the following two: // static RelaxedPlan relaxed_plan; // relaxed_plan.clear(); relaxed_plan.resize(2 total_cost); // [issue127] After fixing (B), remove the preceding line to fix (C). // Collecting the relaxed plan marks helpful actions as preferred. // The plan itself is not used, but it is good to have some sort // of mechanism not to visit the same operator twice during backchaining. for(int i = 0; i < goal_propositions.size(); i++) collect_relaxed_plan(goal_propositions[i], relaxed_plan); // [issue127] The following commented out line contains some // diagnostics that could be used before fixing (B) and // (C) to see how bad our plan size estimates are. // cout << "**" << total_cost << "+++" << relaxed_plan.size() << "..." << relaxed_plan.bucket_count() << endl; return total_cost; } Note that in the original form, ignoring the comments, this creates a new relaxed plan (which is a hash_set) for every heuristic computation. This causes an overhead compared to simply generating it once and reusing it all the time, due to initialization of the hash_map, dynamic memory allocation, etc. So our fix to issue (B) is to make the relaxed plan static, which of course means we have to clear it at some point (doesn't make it smaller, just empties the buckets, so there should be no memory allocation overhead the next time it is filled). Here it's cleared before being used, but it may be nicer to clear it after being used, before "return total_cost". Regarding issue (C), note the line "relaxed_plan.resize(2 total_cost);" The idea behind that is that we want to avoid too many expensive resizings of the relaxed plan (which require rehashing etc.), so we initialize the hash map to a "best guess" of how large it's going to be. The factor 2 is there to add some slack which hash tables need, although 2 is probably more than necessary. But more importantly, our guess for the required size is "total_cost", the h_add value of the state. In this case, this is a terrible guess: e.g. for the initial state we have h_add(init) = 15659154 and h_FF(init) = 683, so we overshoot by a factor of more than 20000. In the case where we fix neither (B) and (C), this means that for every heuristic computation we initialize a new hash set with about 30 million entries, where really it would be enough to have a hash set with around a thousand entries. You can see that this is going to be expensive. The solution is simple: after we've made the relaxed plan static, there is no reason to be afraid of reallocations any more since the bucket size will grow monotonically over time and hence this is easily amortized over different heuristic value computations. So we simply get rid of the resize. (One issue here is that I don't actually know if there's a guarantee for hash_sets that the bucket size doesn't decrease if you clear(); anyone know something definitive about that? I'm assuming that there should be such a guarantee since hash_set::clear is analogous to vector::reserve() etc., but I'd rather be on the safe side.)
msg538 (view)	Author: erez	Date: 2010-10-03.15:33:47
I can confirm the bug with lazy, and have already fixed it in r4957. It was caused by initializing the lowest_bucket member in StandardOpenList to 999999, which is lower than the heuristic value. I changed it to numeric_limits<int>::max().
msg537 (view)	Author: malte	Date: 2010-10-03.15:17:16
Mystery solved, I think. Detailed explanation follows in a longer message. Nosying everyone since this may affect all users of h_add and h_FF. Very short summary: it's a performance issue which degrades performance of h_add and h_FF to Omega(h_add value of the state), which can be terrible in cases where h_add is grossly overestimating. In your example, this is the case: we have h_add(init) = 15659154 compared to h_FF(init) = 683. (I don't think it matters that the task has action costs since as far as I can tell these are ignored by these heuristics at the moment.) If anyone knows any benchmark domains where h_add >> h_FF, please let me know so that we can look at performance there, too! But firstly, Patrik, can you confirm the crash with lazy_greedy(add())? If yes, we should open a separate issue for that. I guess we should also open a separate issue for the slow reasonable order computation on this task. Anyone got any ideas on that? (Silvia, don't bother with this if you'd rather work on your thesis...)
msg536 (view)	Author: malte	Date: 2010-10-03.14:05:00
Hmmm... I can confirm that the FF heuristic takes ages to evaluate states on this example, that h^max is much faster and that landmark generation takes a long time (aborted after a minute). However, I get a different result with h^add: a crash more or less immediately after search is started: helmert@alfons:~/planning/trunk/downward/search$ ./downward --search 'lazy_greedy(add())' < output Simplifying transitions... done! Conducting lazy best first search, bound = 2147483647 Initializing additive heuristic... Simplifying 11802 unary operators... done! [8753 unary operators] Best heuristic value: 15659154 [g=0, 1 evaluated, 0 expanded, t=1.28s] Peak memory: 211216 KB caught signal 11 -- exiting Segmentation fault Did you try with the same search algorithm? This may be a (separate) issue with the "lazy_greedy" search related to the large h values -- don't think we've seen values like that before, so maybe there's another overflow issue somewhere there. With A* I get the same slowness as with the FF heuristic.
msg535 (view)	Author: malte	Date: 2010-10-02.17:31:04
As I recall it, Blai's change should make the code slower in cases where action costs are low. (Unfortunately I don't see a way to implement things in a way that is efficient both for unit-cost and high-action-cost settings.) I really wonder what's going on here. Will look into it. Note: the landmark generation code (in particular generating reasonable orders) also appears to be very slow for this task. Patrik, can you also attach the all.groups file (or original PDDL)? That is needed for some parts of the landmark code.
msg534 (view)	Author: patrik	Date: 2010-10-02.06:17:29
Example problem attached (only the output.sas file; note: this file was not generated by the downward translator, but by other means; it's accepted by the preprocessor, though, and the preprocessors output is accepted by the search engine, which generates a valid plan, so I don't think there's an issue with the input file). On this problem, the ff and additive heuristics are much slower than lmcut: the former evaluate ~1 node/second, while lmcut manages ~400 nodes/sec. My first guess was that this is because the problem has many zero cost actions (which are not recognised by the ff or add heuristics), but the hmax() heuristic, which also does not consider action costs, achieves a node evaluation rate that is comparable with lmcut. (All heuristics were tested in astar() search, but the slowness of ff and add remains also in, e.g., lazy_greedy.) Might this have something to do with the improvement of lmcut that Blai did recently? (cg. issue #116; although this domain doesn't have high action costs... and it looks like Blai's fix isn't included in the version I'm using anyway).

History
Date	User	Action	Args
2011-03-05 05:46:08	malte	set	status: chatting -> resolved messages: + msg1260
2011-03-05 03:44:36	patrik	set	status: resolved -> chatting messages: + msg1259
2011-03-04 19:02:12	malte	set	status: testing -> resolved messages: + msg1256
2011-03-04 18:27:13	malte	set	files: + issue127.soln messages: + msg1253
2011-01-05 20:45:56	malte	set	status: chatting -> testing
2011-01-05 05:16:48	malte	set	messages: + msg1029
2010-10-15 18:14:15	malte	set	keyword: + 1.0
2010-10-04 07:28:18	patrik	set	messages: + msg543
2010-10-04 06:45:20	patrik	set	files: + binary-all.groups.gz
2010-10-04 06:44:57	patrik	set	files: + binary-output.sas.gz
2010-10-03 16:48:01	malte	set	messages: + msg542
2010-10-03 16:38:54	malte	set	messages: + msg541
2010-10-03 15:51:40	malte	set	messages: + msg540
2010-10-03 15:50:02	malte	set	messages: + msg539
2010-10-03 15:33:47	erez	set	messages: + msg538
2010-10-03 15:17:16	malte	set	nosy: + erez, gabi, silvia messages: + msg537
2010-10-03 14:05:00	malte	set	messages: + msg536
2010-10-03 05:30:11	patrik	set	files: + all.groups.gz
2010-10-02 17:31:04	malte	set	status: unread -> chatting assignedto: malte messages: + msg535
2010-10-02 17:27:05	malte	set	nosy: + malte
2010-10-02 06:17:29	patrik	create

Issue127