Issue 77: Add evaluation contexts - Fast Downward issue tracker

Title	Add evaluation contexts
Priority	wish	Status	resolved
Superseder		Nosy List	erez, florian, gabi, jendrik, malte, silvia
Assigned To	malte	Keywords
Optional summary

Created on 2010-03-01.10:21:12 by erez, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
relativescatter.py	florian, 2015-06-04.16:33:16	text/x-python
svn.diff	erez, 2010-03-03.13:08:46	application/octet-stream

Messages
msg4259 (view)	Author: malte	Date: 2015-06-05.13:03:42
Great! :-)
msg4258 (view)	Author: jendrik	Date: 2015-06-05.12:12:40
Making blind search faster is issue537.
msg4256 (view)	Author: jendrik	Date: 2015-06-05.12:04:20
I'll make a small announcement on the mailing list.
msg4255 (view)	Author: jendrik	Date: 2015-06-05.12:03:23
I converted your notes into https://bitbucket.org/jendrikseipp/lab/issue/18/add-relative-scatter-plot-report issue77 is merged and pushed :)
msg4254 (view)	Author: malte	Date: 2015-06-05.11:12:32
I see, so 0.0 is the baseline rather than 1.0 as I thought? Then the slowdown is much less dramatic than I thought (which makes sense -- I didn't remember it to be this severe from my previous experiment). If we add the report to lab, can we change the y labelling to something a bit more intuitive? I think we had misunderstandings over the correct interpretation before, and if I recall correctly, there is also currently something asymmetric about values above vs. below the baseline. I'll write up some suggestions about the plot here, but feel free to move them somewhere more appropriate, e.g. if you want to open a lab issue: - Show the ratio between the two values on the y axis, so 1.0 means "the same", > 1 means "larger than base" and < 1 means "smaller than base". (As usual, we need to do something about zero values.) - Use a logarithmic scale for the y axis. Then smaller changes are more visible (which is what we want here) and changes that are equally dramatic up or down are at the same distance from the baseline. With a non-logarithmic scale, the image of the ratios would be optically misleading in the sense that random data would not cluster evenly around the baseline. - Make sure that outliers don't dominate the range of the scale. If we automatically set the y range to fit the largest data point, then a single extreme value means that all the "normal" data points get tightly squashed around the baseline. One way to achieve this is to fix the y range to something like [0.5, 2.0]; another to set the y range in such a way that it fits a certain percentage of the data points (say 95%). In either case, outliers should be shown too in the same way we show unsolved tasks as outliers in our scatter plots in papers. - Always make the y range symmetric so that the baseline ends up in the middle. Then it's visually immediately clear if something is good, bad or neutral.
msg4253 (view)	Author: florian	Date: 2015-06-05.10:35:30
I think a point on y = 0.4 means 40% more time (140% of the x-coordinate), not a factor of 4.
msg4252 (view)	Author: malte	Date: 2015-06-05.00:19:42
Thanks! If I read this correctly, things tend to be a factor 2-4 slower in blind search, which is quite significant, but I still think we can merge this and worry about optimization later. I also didn't see anything terrifying in the eager satisficing search results. If everybody else is happy with this, I think this can be merged. Thanks a lot for all the work, Jendrik! :-)
msg4251 (view)	Author: jendrik	Date: 2015-06-05.00:11:05
Thanks, Florian! Here is the relative scatter plot: http://ai.cs.unibas.ch/_tmp_files/seipp/astar_blind-total_time.png Also, the results for eager search with satisficing heuristics are in: http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v7-sat-eager-issue77-v7-base-issue77-v7-compare.html
msg4243 (view)	Author: florian	Date: 2015-06-04.16:33:16
I can't find it in the experiments either (where have we used that so far?) I'm attaching the version of the file that I found on my harddisk. We should merge this into lab sometime, preferably after updating the scatter plot class.
msg4242 (view)	Author: jendrik	Date: 2015-06-04.15:12:18
Florian, can you add a note about the report that Malte mentions to the experiments/README file, please? I can't find it in the experiments.
msg4241 (view)	Author: malte	Date: 2015-06-04.15:07:20
> Here is the scatter plot: Thanks, but for things like these, can we make it a default to use the modified scatter plot we discussed a while ago that makes it easier to see differences that are in the factor of 1-2 range? (The one where equal results end up in a horizontal line.) Perhaps we can come up with a name for this kind of plot to avoid confusion in the future.
msg4240 (view)	Author: jendrik	Date: 2015-06-04.14:54:57
Here is the scatter plot: http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v7-opt-issue77-v7-base-issue77-v7-total_time-astar_blind.png Making iPDB faster again is issue536. Should I open a separate issue for making blind search faster again?
msg4238 (view)	Author: malte	Date: 2015-06-04.12:36:19
I suggest opening a separate issue for iPDB. We know where the performance degradation is coming from and that it can be reversed, so it should not keep us up here. The new issue should mention that we should do a comparison to a revision of the default branch that does not yet include the issue77 code. Before we close this, I would also like to have a better overview of the performance impact on blind search. Coverage remains the same because we always run out of memory before running out of memory, but there is still a significant slowdown that we should measure and perhaps do something about later. (Even if it's not a problem with IPC settings, it could be a problem for other people with shorter timeouts or more generally who case about speeds.) Can we add a scatter plot with total time for blind search? All other numbers look OK to me: there are slowdowns across the board, but if we can make blind search faster again, the other ones will be faster, too.
msg4237 (view)	Author: jendrik	Date: 2015-06-04.12:25:24
The comments from the latest code review have been applied and the first experiment is finished: http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v7-opt-issue77-v7-base-issue77-v7-compare.html Coverage drops significantly for ipdb on pegsol (due to hill-climbing taking longer). Should we try to fix this in this issue or open a new one?
msg4228 (view)	Author: jendrik	Date: 2015-06-01.12:00:48
Malte, I have incorporated your comments from the code review into the evaluation context classes and the lazy search. The experiment comparing the old (last revision on "default" that we merged into "issue77") and new (tip on "issue77") lazy search looks good, I think: http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v5-sat-lazy-issue77-v5-base-issue77-v5-compare.html You can find the diff of the compared revisions at https://bitbucket.org/jendrikseipp/downward/branches/compare/issue77-v5..issue77-v5-base#diff I have also finished adapting EHC to the new evaluation contexts. The corresponding experiment results are at http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v6-sat-ehc-issue77-v6-base-issue77-v6-compare.html and the diff is at https://bitbucket.org/jendrikseipp/downward/branches/compare/issue77-v6..issue77-v6-base#diff The only difference between issue77-v5 and issue77-v6 are some fixes to EHC: https://bitbucket.org/jendrikseipp/downward/branches/compare/issue77-v6..issue77-v5
msg4097 (view)	Author: malte	Date: 2015-03-19.17:56:28
The branch contains a working implementation of the evaluation contexts, and it has enabled a large number of simplifications related to the open list and evaluator interfaces. A first performance test looks alright, but at the moment all search algorithms other than eager search are disabled. (They have not been converted to the new interfaces yet.) I also had to disable some heuristics that used unusual features of the old interfaces. I'll work on making these disabled features available again, after which I hope we'll be able to merge this soon (if performance numbers remain alright).
msg4042 (view)	Author: malte	Date: 2015-03-09.13:37:13
I've started working on this again, but now with a slightly different intention. The idea is that the evaluation context can be used to store heuristic information and preferred operators for the current state, which helps a lot in simplifying our current rather complex and fragile information flow between search, open lists and evaluators/heuristics. The work in progress is here: https://bitbucket.org/malte/downward/branch/issue77 It already has much simpler code for the open lists and (most) evaluators, but it is still in an intermediate state and needs work. Many parts of the code are removed for the Makefile for the time being because they are not converted yet. The only search algorithm available at the moment is eager search.
msg2306 (view)	Author: florian	Date: 2012-07-30.11:39:05
I have an idea for that (second half of msg2297) but I think we could do this in two steps. First make the assumptions explicit so we() get a better overview of where they are needed and then try to avoid having to make them in the first place. () Ok, you probably have that overview already, so that "we" would mostly be myself.
msg2305 (view)	Author: erez	Date: 2012-07-30.11:21:29
Thanks, now I understand what it's supposed to do. However, I'm still not convinced that this is the right thing to do - I think a bigger change might actually simplify the code, rather than just making the same assumptions we have now explicit. I don't have any concrete ideas, but I think we should discuss this at some point.
msg2304 (view)	Author: malte	Date: 2012-07-30.11:14:09
> My approach would be to take one or two tasks of medium difficulty from each of > the 2011 domains to avoid testing only one kind of problem structure. Any > suggestions? We're not utilizing the grid heavily, so I suggest simply doing what we usually do: test on all applicable tasks.
msg2303 (view)	Author: florian	Date: 2012-07-30.11:12:38
Yes, I also planned on running some tasks for each of the searches to check for errors I might have introduced (e.g. wrong evaluation context). We should probably wait with tests until the general idea of the patch has stabilized a bit more. The current status is more that of a suggestion. My approach would be to take one or two tasks of medium difficulty from each of the 2011 domains to avoid testing only one kind of problem structure. Any suggestions?
msg2302 (view)	Author: malte	Date: 2012-07-30.10:58:43
> change the efficiency check the efficiency
msg2301 (view)	Author: malte	Date: 2012-07-30.10:58:25
In addition to a code review, we should also change the efficiency of whatever changes we make. For the proposed change so far, a before/after comparison with A* + blind heuristic would be the best test.
msg2300 (view)	Author: florian	Date: 2012-07-30.10:37:10
I didn't mean to press you for a review with this upload, Erez (although I don't mind if you do one, of course :-)). Gabi asked me to upload the patch to Rietveld so she could comment on it there. As for the description, this is everything we discussed for issue 344 + a change in the signature of the methods of Evaluator, ScalarEvaluator and Heuristic. All methods that actually do an evaluation or set the values for a state, start a context. This context is identified by state.get_id(). All other functions (is_dead_end, get_value, get_heuristic, get_preferred_operators) are passed this context and check whether it matches the last evaluated context. If it does, the call is forwarded to methods like is_last_evaluated_dead_end. Basically it is the same as before, but the assumption that a state is first evaluated and then queried for its properties is made explicit and checked during execution time. The rest of the patch just adds the (hopefully) correct state id as a parameter to all occurrences of the changed methods.
msg2299 (view)	Author: erez	Date: 2012-07-29.11:03:28
It's kind of hard to comment on this, without knowing what it's supposed to do. Do you have some kind of high level description of what this does?
msg2298 (view)	Author: florian	Date: 2012-07-27.15:12:15
The full patch is now on Rietveld for review: http://codereview.appspot.com/6454058
msg2297 (view)	Author: florian	Date: 2012-07-24.13:43:38
I started implementing a patch at https://bitbucket.org/flogo/fast-downward/compare/issue77..default this includes the patch for issue344. If you only want to see the changes relative to the patch we discussed over there take a look at the last changeset https://bitbucket.org/flogo/fast-downward/changeset/35ed4f4733be There are still a couple of open questions: * Right now I startet a new context in a heuristic, whenever a call to evaluate() or set_evaluator_value() was made. An alternative would be to start the context when the node is reached (i.e. in Heuristic::reach_state()) but that would require a call to evaluate afterwards, so the values are actually calculated. * In the enforced hill climbing search, I'm not so sure about which state to use as a context when iterating over the operators. Could someone take a look at the patch there? * AlternationOpenList::insert() uses the current context right now, which assumes that insert() is always called after an evaluate on the respective state. Should we add a context to the insert method's signature? ---- An alternative to the patch would be to completely remove the evaluate() method from the classes Evaluator and Heuristic and let the Evaluators decide if/when they need to do the evaluation. The get_heuristic method could then look something like this: if validate_context(state.get_id()) { return last_evaluated_value; } else { start_evaluation_context(state.get_id()); last_evaluated_value = calculate_heuristic(state) return last_evaluated_value; } This sounds much cleaner to me, but I have to admit that I didn't get the distinction between Heuristic, ScalarEvaluator and Evaluator completely. Since the Evaluator's evaluate() method does not have a state parameter, it doesn't have the necessary information to evaluate its contained heuristics. As far as I can see it all Evaluators except GEvaluator and PrefEvaluator depend on the state directly or indirectly. Those that do not depend on the state directly (e.g. the open lists) depend on evaluate(state) to be called for their contained heuristics before they are evaluated themselves.
msg2296 (view)	Author: florian	Date: 2012-07-20.17:39:18
I'll start working on this using state ids as an evaluation context (see issue 344)
msg261 (view)	Author: malte	Date: 2010-03-03.16:09:19
OK, let's keep this issue open to keep track of the evaluation context idea. I've changed the issue title.
msg260 (view)	Author: erez	Date: 2010-03-03.15:50:59
Reverted
msg259 (view)	Author: malte	Date: 2010-03-03.13:24:43
Maybe that's the best solution for now, in terms of keeping the code as simple as possible. A method for dumping the search space would be generally useful, I think, so that could be added to the search space class directly.
msg258 (view)	Author: erez	Date: 2010-03-03.13:21:56
It's not really NEEDED, but I do have something that dumps the search space, including the g-value of each state. I also have the option to use the g-value of a state as a feature for selective max, but I don't use it since it doesn't really help, and it's also ugly code. I would not mind at all if we just forgot about this completely, and sometime later think about the evaluation context you suggested.
msg257 (view)	Author: malte	Date: 2010-03-03.13:14:32
Not nice indeed. :-( If it has a different signature, it should probably also have a different name, but two similar methods that duplicate functionality is also not nice. (On the other hand, it may be comparable to the map access methods, where there are also two methods, one that only works if the element exists and one that always works and returns a pair like in your case.) Where is this method needed? It might be easier to come up with a good interface for this if we know the use cases.
msg256 (view)	Author: erez	Date: 2010-03-03.13:08:46
I did this, but since get_node() can create a SearchNode, and get_node() const can not, I returned a pair of <bool, SearchNode> to handle the case of a non-existing node. It came out pretty ugly, so I'm attaching the svn diff. Please take a look and let me know if you think I should check this in or not.
msg255 (view)	Author: malte	Date: 2010-03-03.00:55:22
Right, the usual solution in cases like that is to have two methods called get_node, one that is const and one that isn't. That's a very common pattern; all STL containers do that.
msg254 (view)	Author: erez	Date: 2010-03-03.00:52:59
This might be a problem, since get_node(state) is not const. However, we could create a similar const method, and use that in the heuristic.
msg253 (view)	Author: malte	Date: 2010-03-02.13:03:20
I had another idea for giving the heuristics access to that information, which was to define an "evaluation context" class that would be passed to the evaluator and would include things such as the g value. The advantage I see for that over passing the whole SearchSpace is that it would make for a leaner interface, i.e., less coupling. I was hoping that with such an evaluation context we might also be able to implement lazy vs. eager search by just using slightly different evaluators, reusing the same search code. But I haven't really thought this through properly yet, and in the meantime I'm fine with passing a pointer to the search space to the heuristic as long as it is const. I'd be against passing a non-const pointer.
msg252 (view)	Author: erez	Date: 2010-03-01.10:21:12
Now that the SearchSpace is located in the SearchEngine, we can add a pointer to the SearchSpace object from the Heuristic, which would: (a) allow us to get rid of the global SearchSpace object (b) allow a heuristic to have access to node information (g, etc.), which was the reason for the global search space from the beginning. What does everyone think?

History
Date	User	Action	Args
2015-06-05 13:03:42	malte	set	messages: + msg4259
2015-06-05 12:12:40	jendrik	set	messages: + msg4258
2015-06-05 12:04:20	jendrik	set	status: chatting -> resolved messages: + msg4256
2015-06-05 12:03:23	jendrik	set	messages: + msg4255 title: Add "evaluation contexts", to be passed into the evaluators by the search methods? -> Add evaluation contexts
2015-06-05 11:12:32	malte	set	messages: + msg4254
2015-06-05 10:35:30	florian	set	messages: + msg4253
2015-06-05 00:19:42	malte	set	messages: + msg4252
2015-06-05 00:11:05	jendrik	set	messages: + msg4251
2015-06-04 16:33:16	florian	set	files: + relativescatter.py messages: + msg4243
2015-06-04 15:12:18	jendrik	set	messages: + msg4242
2015-06-04 15:07:20	malte	set	messages: + msg4241
2015-06-04 14:54:57	jendrik	set	messages: + msg4240
2015-06-04 12:36:19	malte	set	messages: + msg4238
2015-06-04 12:25:24	jendrik	set	messages: + msg4237
2015-06-01 12:00:48	jendrik	set	nosy: + jendrik messages: + msg4228
2015-03-19 17:56:29	malte	set	messages: + msg4097
2015-03-09 13:37:13	malte	set	assignedto: florian -> malte messages: + msg4042
2012-07-30 11:39:05	florian	set	messages: + msg2306
2012-07-30 11:21:30	erez	set	messages: + msg2305
2012-07-30 11:14:09	malte	set	messages: + msg2304
2012-07-30 11:12:38	florian	set	messages: + msg2303
2012-07-30 10:58:43	malte	set	messages: + msg2302
2012-07-30 10:58:25	malte	set	messages: + msg2301
2012-07-30 10:37:11	florian	set	messages: + msg2300
2012-07-29 11:03:29	erez	set	messages: + msg2299
2012-07-27 15:12:15	florian	set	messages: + msg2298
2012-07-24 13:43:39	florian	set	messages: + msg2297
2012-07-20 17:39:19	florian	set	assignedto: erez -> florian messages: + msg2296 nosy: + florian
2010-03-03 16:09:19	malte	set	messages: + msg261 title: Poll: should a heuristic know the SearchSpace? -> Add "evaluation contexts", to be passed into the evaluators by the search methods?
2010-03-03 15:50:59	erez	set	messages: + msg260
2010-03-03 13:24:43	malte	set	messages: + msg259
2010-03-03 13:21:56	erez	set	messages: + msg258
2010-03-03 13:14:32	malte	set	messages: + msg257
2010-03-03 13:08:46	erez	set	files: + svn.diff messages: + msg256
2010-03-03 00:55:22	malte	set	messages: + msg255
2010-03-03 00:52:59	erez	set	messages: + msg254
2010-03-02 13:03:20	malte	set	messages: + msg253
2010-03-01 10:21:12	erez	create

Issue77