Issue77

Title Add evaluation contexts
Priority wish Status resolved
Superseder Nosy List erez, florian, gabi, jendrik, malte, silvia
Assigned To malte Keywords
Optional summary

Created on 2010-03-01.10:21:12 by erez, last changed by malte.

Files
File name Uploaded Type Edit Remove
relativescatter.py florian, 2015-06-04.16:33:16 text/x-python
svn.diff erez, 2010-03-03.13:08:46 application/octet-stream
Messages
msg4259 (view) Author: malte Date: 2015-06-05.13:03:42
Great! :-)
msg4258 (view) Author: jendrik Date: 2015-06-05.12:12:40
Making blind search faster is issue537.
msg4256 (view) Author: jendrik Date: 2015-06-05.12:04:20
I'll make a small announcement on the mailing list.
msg4255 (view) Author: jendrik Date: 2015-06-05.12:03:23
I converted your notes into 
https://bitbucket.org/jendrikseipp/lab/issue/18/add-relative-scatter-plot-report

issue77 is merged and pushed :)
msg4254 (view) Author: malte Date: 2015-06-05.11:12:32
I see, so 0.0 is the baseline rather than 1.0 as I thought? Then the slowdown is
much less dramatic than I thought (which makes sense -- I didn't remember it to
be this severe from my previous experiment).

If we add the report to lab, can we change the y labelling to something a bit
more intuitive? I think we had misunderstandings over the correct interpretation
before, and if I recall correctly, there is also currently something asymmetric
about values above vs. below the baseline. I'll write up some suggestions about
the plot here, but feel free to move them somewhere more appropriate, e.g. if
you want to open a lab issue:

- Show the ratio between the two values on the y axis, so 1.0 means "the same",
> 1 means "larger than base" and < 1 means "smaller than base". (As usual, we
need to do something about zero values.)

- Use a logarithmic scale for the y axis. Then smaller changes are more visible
(which is what we want here) and changes that are equally dramatic up or down
are at the same distance from the baseline. With a non-logarithmic scale, the
image of the ratios would be optically misleading in the sense that random data
would not cluster evenly around the baseline.

- Make sure that outliers don't dominate the range of the scale. If we
automatically set the y range to fit the largest data point, then a single
extreme value means that all the "normal" data points get tightly squashed
around the baseline. One way to achieve this is to fix the y range to something
like [0.5, 2.0]; another to set the y range in such a way that it fits a certain
percentage of the data points (say 95%). In either case, outliers should be
shown too in the same way we show unsolved tasks as outliers in our scatter
plots in papers.

- Always make the y range symmetric so that the baseline ends up in the middle.
Then it's visually immediately clear if something is good, bad or neutral.
msg4253 (view) Author: florian Date: 2015-06-05.10:35:30
I think a point on y = 0.4 means 40% more time (140% of the x-coordinate), not a
factor of 4.
msg4252 (view) Author: malte Date: 2015-06-05.00:19:42
Thanks! If I read this correctly, things tend to be a factor 2-4 slower in blind
search, which is quite significant, but I still think we can merge this and
worry about optimization later. I also didn't see anything terrifying in the
eager satisficing search results.

If everybody else is happy with this, I think this can be merged. Thanks a lot
for all the work, Jendrik! :-)
msg4251 (view) Author: jendrik Date: 2015-06-05.00:11:05
Thanks, Florian! Here is the relative scatter plot:
http://ai.cs.unibas.ch/_tmp_files/seipp/astar_blind-total_time.png 

Also, the results for eager search with satisficing heuristics are in:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v7-sat-eager-issue77-v7-base-issue77-v7-compare.html
msg4243 (view) Author: florian Date: 2015-06-04.16:33:16
I can't find it in the experiments either (where have we used that so far?)

I'm attaching the version of the file that I found on my harddisk. We should
merge this into lab sometime, preferably after updating the scatter plot class.
msg4242 (view) Author: jendrik Date: 2015-06-04.15:12:18
Florian, can you add a note about the report that Malte mentions to the 
experiments/README file, please? I can't find it in the experiments.
msg4241 (view) Author: malte Date: 2015-06-04.15:07:20
> Here is the scatter plot:

Thanks, but for things like these, can we make it a default to use the modified
scatter plot we discussed a while ago that makes it easier to see differences
that are in the factor of 1-2 range? (The one where equal results end up in a
horizontal line.) Perhaps we can come up with a name for this kind of plot to
avoid confusion in the future.
msg4240 (view) Author: jendrik Date: 2015-06-04.14:54:57
Here is the scatter plot: 
http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v7-opt-issue77-v7-base-issue77-v7-total_time-astar_blind.png

Making iPDB faster again is issue536. Should I open a separate issue for making blind search faster again?
msg4238 (view) Author: malte Date: 2015-06-04.12:36:19
I suggest opening a separate issue for iPDB. We know where the performance
degradation is coming from and that it can be reversed, so it should not keep us
up here. The new issue should mention that we should do a comparison to a
revision of the default branch that does not yet include the issue77 code.

Before we close this, I would also like to have a better overview of the
performance impact on blind search. Coverage remains the same because we always
run out of memory before running out of memory, but there is still a significant
slowdown that we should measure and perhaps do something about later. (Even if
it's not a problem with IPC settings, it could be a problem for other people
with shorter timeouts or more generally who case about speeds.) Can we add a
scatter plot with total time for blind search?

All other numbers look OK to me: there are slowdowns across the board, but if we
can make blind search faster again, the other ones will be faster, too.
msg4237 (view) Author: jendrik Date: 2015-06-04.12:25:24
The comments from the latest code review have been applied and the first experiment is finished: 
http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v7-opt-issue77-v7-base-issue77-v7-compare.html

Coverage drops significantly for ipdb on pegsol (due to hill-climbing taking longer). Should we try to fix this in this 
issue or open a new one?
msg4228 (view) Author: jendrik Date: 2015-06-01.12:00:48
Malte, I have incorporated your comments from the code review into the evaluation context classes and the lazy search. The experiment comparing the old (last 
revision on "default" that we merged into "issue77") and new (tip on "issue77") lazy search looks good, I think:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v5-sat-lazy-issue77-v5-base-issue77-v5-compare.html 

You can find the diff of the compared revisions at
https://bitbucket.org/jendrikseipp/downward/branches/compare/issue77-v5..issue77-v5-base#diff

I have also finished adapting EHC to the new evaluation contexts. The corresponding experiment results are at
http://ai.cs.unibas.ch/_tmp_files/seipp/issue77-issue77-v6-sat-ehc-issue77-v6-base-issue77-v6-compare.html
and the diff is at 
https://bitbucket.org/jendrikseipp/downward/branches/compare/issue77-v6..issue77-v6-base#diff

The only difference between issue77-v5 and issue77-v6 are some fixes to EHC:
https://bitbucket.org/jendrikseipp/downward/branches/compare/issue77-v6..issue77-v5
msg4097 (view) Author: malte Date: 2015-03-19.17:56:28
The branch contains a working implementation of the evaluation contexts, and it
has enabled a large number of simplifications related to the open list and
evaluator interfaces. A first performance test looks alright, but at the moment
all search algorithms other than eager search are disabled. (They have not been
converted to the new interfaces yet.) I also had to disable some heuristics that
used unusual features of the old interfaces.

I'll work on making these disabled features available again, after which I hope
we'll be able to merge this soon (if performance numbers remain alright).
msg4042 (view) Author: malte Date: 2015-03-09.13:37:13
I've started working on this again, but now with a slightly different intention.
The idea is that the evaluation context can be used to store heuristic
information and preferred operators for the current state, which helps a lot in
simplifying our current rather complex and fragile information flow between
search, open lists and evaluators/heuristics.

The work in progress is here: https://bitbucket.org/malte/downward/branch/issue77

It already has much simpler code for the open lists and (most) evaluators, but
it is still in an intermediate state and needs work. Many parts of the code are
removed for the Makefile for the time being because they are not converted yet.
The only search algorithm available at the moment is eager search.
msg2306 (view) Author: florian Date: 2012-07-30.11:39:05
I have an idea for that (second half of msg2297) but I think we could do this in
two steps. First make the assumptions explicit so we(*) get a better overview of
where they are needed and then try to avoid having to make them in the first place.

(*) Ok, you probably have that overview already, so that "we" would mostly be
myself.
msg2305 (view) Author: erez Date: 2012-07-30.11:21:29
Thanks, now I understand what it's supposed to do.
However, I'm still not convinced that this is the right thing to do - I think a 
bigger change might actually simplify the code, rather than just making the same 
assumptions we have now explicit.
I don't have any concrete ideas, but I think we should discuss this at some 
point.
msg2304 (view) Author: malte Date: 2012-07-30.11:14:09
> My approach would be to take one or two tasks of medium difficulty from each of
> the 2011 domains to avoid testing only one kind of problem structure. Any
> suggestions?

We're not utilizing the grid heavily, so I suggest simply doing what we usually
do: test on all applicable tasks.
msg2303 (view) Author: florian Date: 2012-07-30.11:12:38
Yes, I also planned on running some tasks for each of the searches to check for
errors I might have introduced (e.g. wrong evaluation context). We should
probably wait with tests until the general idea of the patch has stabilized a
bit more. The current status is more that of a suggestion.

My approach would be to take one or two tasks of medium difficulty from each of
the 2011 domains to avoid testing only one kind of problem structure. Any
suggestions?
msg2302 (view) Author: malte Date: 2012-07-30.10:58:43
> change the efficiency

*check* the efficiency
msg2301 (view) Author: malte Date: 2012-07-30.10:58:25
In addition to a code review, we should also change the efficiency of whatever
changes we make. For the proposed change so far, a before/after comparison with
A* + blind heuristic would be the best test.
msg2300 (view) Author: florian Date: 2012-07-30.10:37:10
I didn't mean to press you for a review with this upload, Erez (although I don't
mind if you do one, of course :-)). Gabi asked me to upload the patch to
Rietveld so she could comment on it there.

As for the description, this is everything we discussed for issue 344 + a change
in the signature of the methods of Evaluator, ScalarEvaluator and Heuristic. All
methods that actually do an evaluation or set the values for a state, start a
context. This context is identified by state.get_id(). All other functions
(is_dead_end, get_value, get_heuristic, get_preferred_operators) are passed this
context and check whether it matches the last evaluated context. If it does, the
call is forwarded to methods like is_last_evaluated_dead_end.

Basically it is the same as before, but the assumption that a state is first
evaluated and then queried for its properties is made explicit and checked
during execution time.

The rest of the patch just adds the (hopefully) correct state id as a parameter
to all occurrences of the changed methods.
msg2299 (view) Author: erez Date: 2012-07-29.11:03:28
It's kind of hard to comment on this, without knowing what it's supposed to do.
Do you have some kind of high level description of what this does?
msg2298 (view) Author: florian Date: 2012-07-27.15:12:15
The full patch is now on Rietveld for review: http://codereview.appspot.com/6454058
msg2297 (view) Author: florian Date: 2012-07-24.13:43:38
I started implementing a patch at
https://bitbucket.org/flogo/fast-downward/compare/issue77..default this includes
the patch for issue344. If you only want to see the changes relative to the
patch we discussed over there take a look at the last changeset
https://bitbucket.org/flogo/fast-downward/changeset/35ed4f4733be

There are still a couple of open questions:

* Right now I startet a new context in a heuristic, whenever a call to
evaluate() or set_evaluator_value() was made. An alternative would be to start
the context when the node is reached (i.e. in Heuristic::reach_state()) but that
would require a call to evaluate afterwards, so the values are actually calculated.
* In the enforced hill climbing search, I'm not so sure about which state to use
as a context when iterating over the operators. Could someone take a look at the
patch there?
* AlternationOpenList::insert() uses the current context right now, which
assumes that insert() is always called after an evaluate on the respective
state. Should we add a context to the insert method's signature?

----

An alternative to the patch would be to completely remove the evaluate() method
from the classes Evaluator and Heuristic and let the Evaluators decide if/when
they need to do the evaluation. The get_heuristic method could then look
something like this:

  if validate_context(state.get_id()) {
    return last_evaluated_value;
  } else {
    start_evaluation_context(state.get_id());
    last_evaluated_value = calculate_heuristic(state)
    return last_evaluated_value;
  }

This sounds much cleaner to me, but I have to admit that I didn't get the
distinction between Heuristic, ScalarEvaluator and Evaluator completely. Since
the Evaluator's evaluate() method does not have a state parameter, it doesn't
have the necessary information to evaluate its contained heuristics.
As far as I can see it all Evaluators except GEvaluator and PrefEvaluator depend
on the state directly or indirectly. Those that do not depend on the state
directly (e.g. the open lists) depend on evaluate(state) to be called for their
contained heuristics before they are evaluated themselves.
msg2296 (view) Author: florian Date: 2012-07-20.17:39:18
I'll start working on this using state ids as an evaluation context (see issue 344)
msg261 (view) Author: malte Date: 2010-03-03.16:09:19
OK, let's keep this issue open to keep track of the evaluation context idea.
I've changed the issue title.
msg260 (view) Author: erez Date: 2010-03-03.15:50:59
Reverted
msg259 (view) Author: malte Date: 2010-03-03.13:24:43
Maybe that's the best solution for now, in terms of keeping the code as simple
as possible.

A method for dumping the search space would be generally useful, I think, so
that could be added to the search space class directly.
msg258 (view) Author: erez Date: 2010-03-03.13:21:56
It's not really NEEDED, but I do have something that dumps the search space,
including the g-value of each state.
I also have the option to use the g-value of a state as a feature for selective
max, but I don't use it since it doesn't really help, and it's also ugly code.

I would not mind at all if we just forgot about this completely, and sometime
later think about the evaluation context you suggested.
msg257 (view) Author: malte Date: 2010-03-03.13:14:32
Not nice indeed. :-( If it has a different signature, it should probably also
have a different name, but two similar methods that duplicate functionality is
also not nice. (On the other hand, it may be comparable to the map access
methods, where there are also two methods, one that only works if the element
exists and one that always works and returns a pair like in your case.)

Where is this method needed?  It might be easier to come up with a good
interface for this if we know the use cases.
msg256 (view) Author: erez Date: 2010-03-03.13:08:46
I did this, but since get_node() can create a SearchNode, and get_node() const
can not, I returned a pair of <bool, SearchNode> to handle the case of a
non-existing node.
It came out pretty ugly, so I'm attaching the svn diff. Please take a look and
let me know if you think I should check this in or not.
msg255 (view) Author: malte Date: 2010-03-03.00:55:22
Right, the usual solution in cases like that is to have two methods called
get_node, one that is const and one that isn't. That's a very common pattern;
all STL containers do that.
msg254 (view) Author: erez Date: 2010-03-03.00:52:59
This might be a problem, since get_node(state) is not const.
However, we could create a similar const method, and use that in the heuristic.
msg253 (view) Author: malte Date: 2010-03-02.13:03:20
I had another idea for giving the heuristics access to that information, which
was to define an "evaluation context" class that would be passed to the
evaluator and would include things such as the g value. The advantage I see for
that over passing the whole SearchSpace is that it would make for a leaner
interface, i.e., less coupling. I was hoping that with such an evaluation
context we might also be able to implement lazy vs. eager search by just using
slightly different evaluators, reusing the same search code.

But I haven't really thought this through properly yet, and in the meantime I'm
fine with passing a pointer to the search space to the heuristic as long as it
is const.

I'd be against passing a non-const pointer.
msg252 (view) Author: erez Date: 2010-03-01.10:21:12
Now that the SearchSpace is located in the SearchEngine, we can add a pointer to
the SearchSpace object from the Heuristic, which would:
(a) allow us to get rid of the global SearchSpace object
(b) allow a heuristic to have access to node information (g, etc.), which was
the reason for the global search space from the beginning.

What does everyone think?
History
Date User Action Args
2015-06-05 13:03:42maltesetmessages: + msg4259
2015-06-05 12:12:40jendriksetmessages: + msg4258
2015-06-05 12:04:20jendriksetstatus: chatting -> resolved
messages: + msg4256
2015-06-05 12:03:23jendriksetmessages: + msg4255
title: Add "evaluation contexts", to be passed into the evaluators by the search methods? -> Add evaluation contexts
2015-06-05 11:12:32maltesetmessages: + msg4254
2015-06-05 10:35:30floriansetmessages: + msg4253
2015-06-05 00:19:42maltesetmessages: + msg4252
2015-06-05 00:11:05jendriksetmessages: + msg4251
2015-06-04 16:33:16floriansetfiles: + relativescatter.py
messages: + msg4243
2015-06-04 15:12:18jendriksetmessages: + msg4242
2015-06-04 15:07:20maltesetmessages: + msg4241
2015-06-04 14:54:57jendriksetmessages: + msg4240
2015-06-04 12:36:19maltesetmessages: + msg4238
2015-06-04 12:25:24jendriksetmessages: + msg4237
2015-06-01 12:00:48jendriksetnosy: + jendrik
messages: + msg4228
2015-03-19 17:56:29maltesetmessages: + msg4097
2015-03-09 13:37:13maltesetassignedto: florian -> malte
messages: + msg4042
2012-07-30 11:39:05floriansetmessages: + msg2306
2012-07-30 11:21:30erezsetmessages: + msg2305
2012-07-30 11:14:09maltesetmessages: + msg2304
2012-07-30 11:12:38floriansetmessages: + msg2303
2012-07-30 10:58:43maltesetmessages: + msg2302
2012-07-30 10:58:25maltesetmessages: + msg2301
2012-07-30 10:37:11floriansetmessages: + msg2300
2012-07-29 11:03:29erezsetmessages: + msg2299
2012-07-27 15:12:15floriansetmessages: + msg2298
2012-07-24 13:43:39floriansetmessages: + msg2297
2012-07-20 17:39:19floriansetassignedto: erez -> florian
messages: + msg2296
nosy: + florian
2010-03-03 16:09:19maltesetmessages: + msg261
title: Poll: should a heuristic know the SearchSpace? -> Add "evaluation contexts", to be passed into the evaluators by the search methods?
2010-03-03 15:50:59erezsetmessages: + msg260
2010-03-03 13:24:43maltesetmessages: + msg259
2010-03-03 13:21:56erezsetmessages: + msg258
2010-03-03 13:14:32maltesetmessages: + msg257
2010-03-03 13:08:46erezsetfiles: + svn.diff
messages: + msg256
2010-03-03 00:55:22maltesetmessages: + msg255
2010-03-03 00:52:59erezsetmessages: + msg254
2010-03-02 13:03:20maltesetmessages: + msg253
2010-03-01 10:21:12erezcreate