Issue 69: A* with LM-CUT needs performance comparison between r3612/r3613 and HEAD

Title	A* with LM-CUT needs performance comparison between r3612/r3613 and HEAD
Priority	bug	Status	resolved
Superseder		Nosy List	erez, jendrik, malte
Assigned To	malte	Keywords	1.0
Optional summary

Created on 2010-01-13.14:07:19 by erez, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
exp-js-issue69-eval-abs-d.html	jendrik, 2011-08-12.17:39:04	text/html
exp-js-issue69-eval-abs-p.html	jendrik, 2011-08-12.17:43:23	text/html
lmcut.tar.gz	erez, 2010-01-21.19:17:31	application/x-gzip
plots-v2.tar.gz	jendrik, 2011-08-12.21:21:07	application/x-gzip

Messages
msg1610 (view)	Author: malte	Date: 2011-08-13.22:10:52
Thanks a lot, everyone! I think we're done here; the scatter plots tell the story quite well. r3613 is better than r3612, so it's sufficient to compare r3613 to the newer revision. This comparison shows a bit of a loss of quality (measured in expansions) but a bit of gain in speed in most domains when using the newer code. Overall the new code is a bit worse due to the loss of coverage in Airport. At this point, I wouldn't roll back the change because it's only a single domain and it's not clear at all whether the better performance of the old code there isn't only an accident. But I've opened a new issue, issue263, to make a note of the fact that it'd be good to study different tie-breaking methods for LM-Cut at some point.
msg1598 (view)	Author: jendrik	Date: 2011-08-12.22:00:03
Done.
msg1597 (view)	Author: malte	Date: 2011-08-12.21:28:12
Wonderful! How do I generate my own scatter plots? I'd like to create some "all problems in one plot" plots that exclude the problematic domains for the respective comparison. Can you add the instructions to the wiki?
msg1595 (view)	Author: jendrik	Date: 2011-08-12.21:21:07
Here are the new plots. Missing values are now rendered on their own line and the bug has been fixed.
msg1593 (view)	Author: malte	Date: 2011-08-12.19:52:46
Thanks! Some preliminary notes on the results after looking at the logs: - The 3612-ou planner ignored action costs, so its results in the 2008 domains should be disregarded. In domains with zero-cost actions (openstacks, sokoban) they are also occasionally suboptimal. The results already show this directly in Sokoban, where other planners find better plans; one example in Openstacks 2008 is Opt #06, where a cost 3 plan is found but a cost 2 plan exists. - The 3613-ou planner segfaults on Openstacks 2008 and PegSol, and occasionally on Sokoban. So it should not be considered on those domains. It doesn't attempt to solve ParcPrinter because of the high action costs. - The 5272-ou planner doesn't attempt to solve ParcPrinter because of the high action costs. - 3612-ou doesn't appear to have a meaningful advantage over 3613-ou, but I'd still like to have a look at the expansions and time plots for these. - The main comparison should be 3613-ou vs. 5272-ou, though.
msg1589 (view)	Author: jendrik	Date: 2011-08-12.17:57:43
Sure, the experiment is located on habakuk: /home/downward/jendrik/downward/new- scripts/exp-js-issue69*
msg1588 (view)	Author: malte	Date: 2011-08-12.17:55:34
Can you give me access to the full logs? Some of the results look suspicious; the 0s probably make sense because the relevant configs refuse to work on some of these domains, but I'd like to investigate a few things (e.g. the 30 problems solved by r3612 in openstacks-opt08-strips -- probably using an inadmissible heuristic?).
msg1587 (view)	Author: jendrik	Date: 2011-08-12.17:49:00
Yes, done.
msg1586 (view)	Author: malte	Date: 2011-08-12.17:48:15
Should the attached old issue69-all-plots.tar.gz (version from 2011-08-11.21:23:52) be deleted? I assume they're affected by the bug?
msg1585 (view)	Author: jendrik	Date: 2011-08-12.17:43:23
Here are the html reports.
msg1583 (view)	Author: jendrik	Date: 2011-08-12.17:36:32
I found a bug in the scatter plot code. Will fix it and upload a new set of plots soon.
msg1574 (view)	Author: malte	Date: 2011-08-12.11:19:03
> coverage: > - same set of tasks solved Actually, that is probably not true. That was based on the information in file issue69ouSTRIPSeval-p-abs.html, but I guess that was a different experiment? It doesn't include data for the 2008 domains. Do we have the complete coverage information for the recent experiment somewhere? I'd be interested in both the high-res (-p-abs.html) and low-res (-d-abs.html) version.
msg1573 (view)	Author: malte	Date: 2011-08-12.11:10:42
comparison r3612 vs. r3613 ========================== coverage: - same set of tasks solved expansions: - no tasks with differences or tiny, ignorable differences: blocks, depots, driverlog, freecell, grid, gripper, logistics00, logistics98, miconic, openstacks-strips, pathways-noneg, pipesworld-notankage, rovers, satellite, tpp, trucks, zenotravel - tiny, ignorable differences: freecell - few tasks with difference, hence no clear trend: airport, elevators-sat08, mprime, mystery, pipesworld-tankage, psr-small - differences, no clear winner: sokoban-sat08 - r3612 appears to be somewhat better: sokoban-opt08 - r3613 appears to be somewhat better: scanalyzer-08 - r3613 clearly better: elevators-opt08, transport-opt08, transport-sat08, woodworking-opt08, woodworking-sat08 time: picture very similar to expansions, but advantage of r3613 over r3612 is greater in domains where it is already clearly better, and it seems to have a slight speed advantage across the board (i.e. in all domains)
msg1572 (view)	Author: malte	Date: 2011-08-12.11:06:46
Wonderful! That I can work with. :-) One more feature request though: the plots would be more useful if they also included data points for the tasks that are solved by one, but not both of the configurations. Here's one way of doing this: Assume that the range of values is 0...10^6. Then pick something suitably larger (say, 10^7) and treat "unsolved" as that value. On the axis, don't write 10^7 but "fail" or some such. If the "fail" point can be set such that it is exactly the rightmost/topmost line of the plot, which would be great, then I think no axis label is necessary. Tasks where neither configuration finds a solution should not be included. To warm up, I compared the expansions for r3612 to r3613 in all-plots, although that is not terribly interesting. The results are as expected: r3613 more or less dominates r3612 on the tasks that have action costs > 1, and otherwise they behave essentially the same. One thing is strange though (or I don't understand it). If I extract issue69-all-plots.tar.gz and do $ display expansions36123613.png to go through the plots, you see that there are very few points above the diagonal in any domain. However, the file "exp-js-issue69-eval-scatter-expansions-3612-ou+3613-ou-p.png" in the middle gives a qualitatively very different picture. Is that the plot for all tasks combined? If yes, then I think the data it uses is wrong -- there's no way this is the union of the per-domain plots. But or misunderstanding?
msg1567 (view)	Author: jendrik	Date: 2011-08-11.21:23:52
Here you go.
msg1559 (view)	Author: jendrik	Date: 2011-08-11.13:55:53
> I haven't checked everything, but I would go by unit. It it measures time, nodes > or states, it should be logarithmic. If it measures length or cost, it should be > linear. Do we have any other units of measurement? For now I set linear_attributes = ['cost', 'coverage', 'plan_length'] > >> With the two attributes search_time and expansions and the three code >> versions ('3612-ou', '3613-ou', '5272-ou') generating all possible plots >> would yield 35 * 3 * 2 = 210 images. Do you want them all (7.5 MB)? > Yes, please! Plus a plot with all data points, with one data point per task (not > per domain). Or do we have that already? I'm a bit confused at the moment > regarding what is what. (Actually, it'd be great if the plots could be more > self-explanatory, including info like "Each data point represents XYZ", > basically what you wrote in the issue comment.) That's true. I already added that. > > Related to this, there was a video in the IJCAI video competition that > highlighted a result exploration tool with some very neat capabilities. Check > video 15 ("Using Multiple Models to Understand Data") on > http://ijcai-11.iiia.csic.es/program/video_track. The "Interactive > Visualizations" section (about halfway in) is impressive and looks really useful. You're right, that looks very useful indeed. The author doesn't provide any links to binaries or sourcecode on his homepage though. I'll prepare the new plots ASAP.
msg1557 (view)	Author: malte	Date: 2011-08-11.12:08:20
> I made logarithmic scaling the default for now. Can you list the attributes > that should not have logarithmic scaling? I haven't checked everything, but I would go by unit. It it measures time, nodes or states, it should be logarithmic. If it measures length or cost, it should be linear. Do we have any other units of measurement? > With the two attributes search_time and expansions and the three code > versions ('3612-ou', '3613-ou', '5272-ou') generating all possible plots > would yield 35 * 3 * 2 = 210 images. Do you want them all (7.5 MB)? Yes, please! Plus a plot with all data points, with one data point per task (not per domain). Or do we have that already? I'm a bit confused at the moment regarding what is what. (Actually, it'd be great if the plots could be more self-explanatory, including info like "Each data point represents XYZ", basically what you wrote in the issue comment.) Related to this, there was a video in the IJCAI video competition that highlighted a result exploration tool with some very neat capabilities. Check video 15 ("Using Multiple Models to Understand Data") on http://ijcai-11.iiia.csic.es/program/video_track. The "Interactive Visualizations" section (about halfway in) is impressive and looks really useful. Maybe we should set aside some time to discuss the analysis/summarization/visualization aspect of our scripts, and where we want to go with it? It's a lot of work to reinvent the wheel here, and I think there are a few tools around that we should learn about. (Even if we don't end up using one of them, it should be inspirational.)
msg1551 (view)	Author: jendrik	Date: 2011-08-11.02:25:02
For which domains do you want plots? The experiment has values for 35 domains: domains = ['airport', 'blocks', 'depot', 'driverlog', 'elevators-opt08-strips', 'elevators-sat08-strips', 'freecell', 'grid', 'gripper', 'logistics00', 'logistics98', 'miconic', 'mprime', 'mystery', 'openstacks-opt08-strips', 'openstacks-sat08-strips', 'openstacks-strips', 'parcprinter-08-strips', 'pathways-noneg', 'pegsol-08-strips', 'pipesworld-notankage', 'pipesworld-tankage', 'psr-small', 'rovers', 'satellite', 'scanalyzer-08-strips', 'sokoban-opt08-strips', 'sokoban-sat08-strips', 'tpp', 'transport-opt08-strips', 'transport-sat08-strips', 'trucks-strips', 'woodworking-opt08-strips', 'woodworking-sat08-strips', 'zenotravel'] With the two attributes search_time and expansions and the three code versions ('3612-ou', '3613-ou', '5272-ou') generating all possible plots would yield 35 * 3 * 2 = 210 images. Do you want them all (7.5 MB)?
msg1550 (view)	Author: jendrik	Date: 2011-08-11.01:44:45
> One minor niggle: > > - The 0s on the axes appear to be set in a different font and at a different > height than the other numbers. It should be the same font and location as the 10 > in 10^1, 10^2 etc. (Not important for us, but would need to be changed if we > want to eventually use these plots in papers.) I found a rather elegant hack to do this. > One more important niggle: > > - I assume that these are for a single domain now? If so, can the domain be > mentioned somewhere, e.g. "evaluations (parcprinter)" or some such instead of > "evaluations" in the caption at the top? Ah, I added a different functionality: Each scatter plot point can now be the sum (or average for scores or geometric mean for times) of a domain's values. This behaviour can be changed by setting --res problem or --res domain. I guess --res problem should be the default? > > Runtimes should be displayed on a logarithmic > scale even if we run an experiment where all runtimes are less than 10 seconds. I made logarithmic scaling the default for now. Can you list the attributes that should not have logarithmic scaling?
msg1539 (view)	Author: malte	Date: 2011-08-10.13:21:23
Very pretty! One minor niggle: - The 0s on the axes appear to be set in a different font and at a different height than the other numbers. It should be the same font and location as the 10 in 10^1, 10^2 etc. (Not important for us, but would need to be changed if we want to eventually use these plots in papers.) One more important niggle: - I assume that these are for a single domain now? If so, can the domain be mentioned somewhere, e.g. "evaluations (parcprinter)" or some such instead of "evaluations" in the caption at the top? > I added the requested features and made the scales logarithmic if the max > value is bigger than 10^5. Magnitude is not a good criterion. It depends on what is measured, not what the values are. Plan costs in ParcPrinter should not be displayed on a logarithmic scale despite their being >= 10^6. Runtimes should be displayed on a logarithmic scale even if we run an experiment where all runtimes are less than 10 seconds. If you check out the runtime scatter plots, you see why you don't want a linear scale here: almost always all the data points are crowded in the far bottom left corner, so the space is not used well.
msg1537 (view)	Author: jendrik	Date: 2011-08-10.13:12:23
I added the requested features and made the scales logarithmic if the max value is bigger than 10^5.
msg1526 (view)	Author: malte	Date: 2011-08-09.19:56:42
PS: http://www.scipy.org/Cookbook/Matplotlib has an example ("Custom log plot labels") that has logarithmic axes for a scatter plot with nice axis labels starting at 0 (so no need for transforming values to 1? Although I'd test that value pairs like (0, 8) or (0, 0) actually show up in the plot.) Regarding the diagonal line, the next example does that, but I'm not sure if and how they could be overlaid. If there's a way to access the max x (and y) value that is displayed inside the plot, drawing a line from (0,0) to (max, max) would do the trick, of course.
msg1525 (view)	Author: malte	Date: 2011-08-09.19:51:39
> 3) I have searched for but couldn't find an easy way to add an infinite > diagonal line. In other plotting packages, you'd do that by plotting the function f(x) = x. Is that difficult to do in matplotlib? Or is it difficult to combine a scatter plot with a function plot?
msg1524 (view)	Author: jendrik	Date: 2011-08-09.19:48:50
> What plotting package/library do you use to generate these? I used matplotlib. 3) I have searched for but couldn't find an easy way to add an infinite diagonal line.
msg1523 (view)	Author: malte	Date: 2011-08-09.18:35:25
Very nice! What plotting package/library do you use to generate these? Some wishes: 1) labels for expansions should not start at -100000 or some such -- they won't go below 0. ;-) 2) for things growing exponentially like expansions, it's more common (and much more useful) to make both scales logarithmic. If that causes a problem because of zero entries, one simple fix is to change all 0 entries to 1. 3) Scatter plots that compain two comparable things along the same dimension often contain the y=x line to easily distinguish the "good" from the "bad" region, like here: http://www.cawcr.gov.au/projects/verification/scatterplot.gif. That would be useful here, too, although the gray grid lines already help. 4) most importantly: per-domain results would be very very useful!
msg1522 (view)	Author: jendrik	Date: 2011-08-09.17:53:00
I got the hint and added a ScatterPlotReport class ;) Currently only the scatter plots can only report by suite and not by domain. Results are attached.
msg1420 (view)	Author: malte	Date: 2011-07-18.11:29:32
I think we should still do a definitive comparison of coverage, expansion count and solution speed. I don't have time to prepare it myself, though. The old experiment is probably already fine in terms of what it measures, but we need to prepare the data in a way that makes it more accessible (e.g. scatter-plots per domain).
msg1419 (view)	Author: erez	Date: 2011-07-18.08:01:46
I doubt we're going to change the LM-CUT implementation back to the way it was, but it's Malte's call.
msg1418 (view)	Author: jendrik	Date: 2011-07-17.21:10:07
I would like to remove issue69.py from the scripts directory. Do you still need a comparison experiment for this issue or is it fixed?
msg611 (view)	Author: malte	Date: 2010-10-28.00:00:36
I still haven't completely analyzed the code, but it looks like this might indeed be a tie-breaking issue. I can try switching the code back to the old way of breaking ties, but it might easily break again when implementing proper action cost support for LM-cut, so my preference is to address this at the same time as implementing proper action cost support for LM-cut. For the action cost support, it would be good to run some before/after experiments on the IPC-2008 instances, which I've now added to the repository (apart from cybersec, since this domain alone would more than double the repo size, I think).
msg608 (view)	Author: jendrik	Date: 2010-10-26.20:00:03
The easiest way would be to pass the desired config on the commandline cd /home/downward/{trunk/master}/new-scripts ./downward-reports.py issue69-ou-STRIPS-{trans3613-}eval/ -c 5038-5038-3613-ou --res problem
msg607 (view)	Author: malte	Date: 2010-10-26.19:35:16
That was indeed the problem. Is there a way to find out the search time (or total time, or number of expansions) for particular instances of particular configurations where the instance was not solved by all?
msg606 (view)	Author: jendrik	Date: 2010-10-26.19:30:49
The experiment yielded the same results as earlier. Maybe you have looked at the wrong subtable? For HEAD-HEAD-3612-ou and HEAD-HEAD-3613-ou Airport 25 is solved and appears in the solved table correctly. It doesn't appear in the tables where only commonly solved problems are shown.
msg605 (view)	Author: jendrik	Date: 2010-10-26.17:45:03
I just ran an experiment with HEAD-HEAD-3612-ou and HEAD-HEAD-3613-ou and airport 25 and now it seems to work. I'll submit the whole domain for another experiment and get back to you. Maybe there were some errors in the older version of the new-scripts. Am 26.10.2010 13:22, schrieb Malte Helmert: > Malte Helmert<helmert@informatik.uni-freiburg.de> added the comment: > > Small correction: it looks like the old code solved tasks #1-#38 without any > gaps (which is a bit strange, since the paper reports 37 solved tasks?) > > I made a few more tests with #25 with translator and preprocessor from HEAD and > various versions of the search component. Up to r3636, it it solved fine. In > r3637, I get a bug leading to an invalid plan, but this is easy to fix and after > fixing it it again works fine. In r3638, the "getting stuck" behaviour starts, > which makes some sense since this is the revision that actually changed the > behaviour of the heuristic. > > I'll look into this more thoroughly and see if there is a way to avoid these > problems without losing the performance advantages of the r3638 (and subsequent) > algorithm changes. > > One thing is strange: as far as I can tell, versions HEAD-HEAD-3612-ou and > HEAD-HEAD-3613-ou should easily solve Airport #25, but they don't in Jendrik's > data. (These names do mean HEAD for translate and preprocess and 3612/3613 for > release-search, right?) Jendrik, can you check that these results were indeed > generated with the proper code versions etc.? > > _______________________________________________________ > Fast Downward issue tracker<downward.issues@gmail.com> > <http://issues.fast-downward.org/issue69> > _______________________________________________________
msg604 (view)	Author: malte	Date: 2010-10-26.13:22:05
Small correction: it looks like the old code solved tasks #1-#38 without any gaps (which is a bit strange, since the paper reports 37 solved tasks?) I made a few more tests with #25 with translator and preprocessor from HEAD and various versions of the search component. Up to r3636, it it solved fine. In r3637, I get a bug leading to an invalid plan, but this is easy to fix and after fixing it it again works fine. In r3638, the "getting stuck" behaviour starts, which makes some sense since this is the revision that actually changed the behaviour of the heuristic. I'll look into this more thoroughly and see if there is a way to avoid these problems without losing the performance advantages of the r3638 (and subsequent) algorithm changes. One thing is strange: as far as I can tell, versions HEAD-HEAD-3612-ou and HEAD-HEAD-3613-ou should easily solve Airport #25, but they don't in Jendrik's data. (These names do mean HEAD for translate and preprocess and 3612/3613 for release-search, right?) Jendrik, can you check that these results were indeed generated with the proper code versions etc.?
msg603 (view)	Author: malte	Date: 2010-10-26.12:11:09
OK, I looked at the airport results, old (= the ICAPS paper) and new (= Jendrik's results) a bit more closely. The old results solved all tasks from #1-#38 except #18 and #20. The new ones solve all tasks from #1-#24 except #18 and they solve #36. So there's a big gap in the range #25-#35. I tried the smallest of these, #25, with everything@2449 and the current hg tip. In both cases, we get h = 212 for the initial state, which is also the optimal plan length. With the old code, we immediately zip down to a solution with no real search, i.e., there are exactly 213 expansions which is optimal since it's the length of the solution. With the new code, we zip down immediately to h=93/g=118 in the first 118 expansions and then get stuck there. There are various possible explanations for this; it could be a bug, or it could be worse luck with tie-breaking decisions. The new code shows the same behaviour no matter whether we use the "old" translator and preprocess or the "new" translator or preprocessor, so it looks like the search code is the culprit. That narrows it down sufficiently for now, so I don't think we'll need these experiments at the moment, Erez.
msg602 (view)	Author: malte	Date: 2010-10-26.11:50:01
I think we used svn+ssh://downward-svn/branches/everything@2449.
msg601 (view)	Author: erez	Date: 2010-10-26.11:16:45
I'll see what I can do when the current running experiment finishes.
msg600 (view)	Author: malte	Date: 2010-10-26.10:57:18
Would there be enough time available to run only Airport maybe? Unfortunately I'm not sure what exactly was run for the ICAPS paper, since Carmel ran those experiments and the logs are not available any more. (I asked him about them last week.) I can check my notes for some best guesses.
msg599 (view)	Author: erez	Date: 2010-10-26.10:29:38
I'm afraid the Technion machines are going to be busy until the ICAPS deadline. Summarizing the difference in solved tasks between the latest version and what was reported in the paper: Airport: -10 Driverlog: -1 Gripper: +1 Miconic: +1 Mprime: -2 pw-tankage: -1 So except for airport, the differences are fairly small. Malte - do you know which translator version was used for the experiments reported in the paper?
msg598 (view)	Author: malte	Date: 2010-10-26.03:00:27
Hmmm, very close in terms of solved problems to the current translator version. I guess this is good because it means that our good results were not solely due to a buggy translator illicitly dropping actions, but it means we still don't know what is going on here. This is much worse than either Erez's results or what we report in the ICAPS 2009 paper. Both Erez's results and the ICAPS paper results were run on Technion machines which I know are faster than what we are using here, so this might explain it. Still, we should double-check this -- it'd be a pity for the trunk to have so much worse coverage than what we report in the ICAPS paper. Erez, would it be possible for you to repeat Jendrik's experiments on the Technion machines? I think it'd be enough just to drop the 3613 version from the experiment, but it'd be good to have both translator versions. So we want: * 3612-HEAD-3612-ou * 3612-HEAD-HEAD-ou * HEAD-HEAD-3612-ou * HEAD-HEAD-HEAD-ou
msg597 (view)	Author: jendrik	Date: 2010-10-26.02:42:09
To distract you from the fishy code, I present to you the latest grid results ;) We should definitely have a look at the run code at the next meeting.
msg588 (view)	Author: malte	Date: 2010-10-25.02:52:09
Hmmm, this looks a bit fishy. I don't think that time.clock() takes into account the time used in the called processes, so my guess would be that the "passed_time" computation does not make a lot of sense. We should probably log that. About setrlimit(), I don't really know -- it might depend on what the RUN_COMMAND is -- but I would assume that it's a per-process limit. We could test that at our next meeting. Also, subprocess.call with shell=True should generally be avoided in clean code, since it opens all kinds of cans of works w.r.t. shell syntax and quoting.
msg587 (view)	Author: jendrik	Date: 2010-10-25.02:15:02
The gkigrid jobs get a default timeout of 1830 seconds. The default timeout for the run of 1800 seconds is set in the file data/run-template.py. I am guessing that the timeout is set for the whole run (including preprocess command), but I'm unsure about the workings of resource.setrlimit(). It would be good if you could have a look at the file and see what can be improved there. That file is probably the most important bit of the scripts, but hasn't gotten much attention... The CHECK_INTERVAL is as low as 0.5 sec, because that makes local testing a little faster, but I should probably increase that.
msg586 (view)	Author: malte	Date: 2010-10-24.17:14:14
Thanks! By the way, how do the experiments manage the timeouts? Since the vast majority of our experiments care only about the search component, we usually only apply the 30 minute timeout to search. Is that what is happening here, or is there an overall 30 minute timeout? Shouldn't make a huge difference since translation/preprocessing tends to be fast for the tasks that we can solve with an optimal planner, but still this should not be neglected.
msg585 (view)	Author: jendrik	Date: 2010-10-24.16:11:25
I have just submitted a job to the grid with the following revs: combinations = [ (TranslatorSvnCheckout(rev=3613), PreprocessorSvnCheckout(), PlannerSvnCheckout(rev=3612)), (TranslatorSvnCheckout(rev=3613), PreprocessorSvnCheckout(), PlannerSvnCheckout(rev=3613)), (TranslatorSvnCheckout(rev=3613), PreprocessorSvnCheckout(), PlannerSvnCheckout(rev='HEAD')), ]
msg584 (view)	Author: malte	Date: 2010-10-24.12:35:33
Yes, a run with an equally old translator version would be useful. The preprocessor version can/should be the most recent one; all that happened there since r3612 are some bug fixes that shouldn't affect LM-cut.
msg583 (view)	Author: erez	Date: 2010-10-24.10:33:45
The logs from my runs are in the lmcut.tar.gz attached here. I'm afraid I ran these on my old computer, so I have no more data, and no way to recreate these. It's very possible that there is also some connection to the translator here, so we might want to try to use the same version of the translator as in r3612, and run this comparison again.
msg582 (view)	Author: jendrik	Date: 2010-10-24.04:27:01
I'm attaching the detailed results for a better analysis.
msg581 (view)	Author: malte	Date: 2010-10-24.03:57:08
Hmm... the results for the trunk version are best and the difference between r3612 and r3613 are negligible, but the results are consistently worse than what Erez reports in msg226. Erez, do you still have the logs for these runs available or can you reproduce them in some way?
msg580 (view)	Author: jendrik	Date: 2010-10-24.03:39:06
I have taken this issue as an example for the new comparisons module and ran some tests: ou on all STRIPS domains for r3612/r3613 and HEAD (HEAD being ~5038, one of the latest SVN revisions). The results are attached. BTW, the scripts now convert HEAD etc. to the actual revision number.
msg228 (view)	Author: erez	Date: 2010-01-21.19:17:31
The results are attached
msg227 (view)	Author: malte	Date: 2010-01-21.15:13:37
If you still have the raw data, I'd need to have a look at the runtime and expansion numbers for the individual instances.
msg226 (view)	Author: erez	Date: 2010-01-21.08:58:33
I ran a comparison on: airport, blocks, depot, freecell, logistics, mprime, pathways, psr-small, pipesworld-tankage, pipesworld-notankage and zenotravel. In terms of solved problems, these are the differences (for the new LM-CUT): airport: -11 (38 old, 27 new) mprime: -1 (25 old, 24 new) pipesworld-notankage: +1 (17 old, 18 new) pipesworld-tankage: +1 (11 old, 12 new) zenotravel: +1 (12 old, 13 new) So overall there was an improvement in 3 domains, and a decrease in 2. However, the decrease in airport is huge. I ran it using the scripts, so if there are some additional reports you want me to run, just let me know.
msg225 (view)	Author: malte	Date: 2010-01-14.01:06:04
I had a little bit of time left so checked the behaviour. It's not 100% clear to me there is a fixable problem. Revision r3638 made some algorithmic changes to LM-cut that probably affect which h_max supporter is chosen in case of ties. Since LM-cut is only well-defined up to tie breaking, this can adversely affect h values. It could just be the case that we were lucky with tie-breaking before and are not so lucky anymore, and unfortunately in domains like Airport this can mean all or nothing. But it's equally possible that we're much better in other domains, or even other Airport instances. So I think the right approach here is a thorough performance comparison between r3612 (for unit-cost problems) or r3613 (for general cost problems) and HEAD across all domains. If there's a systematic decrease of heuristic quality, we can look into possible causes for this -- maybe there is a bug, but then the performance study will hopefully tell us where to look.
msg224 (view)	Author: malte	Date: 2010-01-13.15:10:33
PS: If anyone wants to do any performance experiments with h^LM-cut for a paper, don't use the current trunk. I made some major unpublished experimental algorithmic changes there recently, and the reference version for use in papers should be the version described in the LM-cut paper. I think r3304 would be a good version to use at least w.r.t. the heuristic implementation. (I don't know about the search algorithms without checking).
msg223 (view)	Author: erez	Date: 2010-01-13.15:09:40
I found this by running selective-max again (after the changes I made to selective-max). Airport was the most obvious example, but other domains also suffered in performance (both pipesworld domains for example). I'm not sure if this is all due to lm-cut, but it's worth looking into those as well.
msg222 (view)	Author: malte	Date: 2010-01-13.15:07:25
That's what I feared. The problem is that the code now computes the cuts slightly differently, which can affect the h values (since h^LM-cut, like h^FF, has some arbitary choice points). So it's possible that we're now simply unlucky and get slightly worse cuts for obscure reasons, or it may be a genuine bug. Will be a bit of work to find out which one it is. Oh well. :-( Anyway, thanks! I'll look into it.
msg221 (view)	Author: erez	Date: 2010-01-13.15:04:15
Here it is: r3636 - no bug r3637 - different bug (wrong solution of length 1) r3638 - bug
msg220 (view)	Author: malte	Date: 2010-01-13.14:47:07
Thanks for reporting! We should try to isolate the revision in which the performance first degraded (e.g. using some kind of binary search). Looks like I won't have time for this in the near future, so if you could do it, great! Otherwise we can just let this one sit for a while, of course.
msg219 (view)	Author: erez	Date: 2010-01-13.14:07:19
There is a major difference in performance when running A* + LM-CUT to the first results we have with LM-CUT. Example: Airport 25 - before was solved with in 96 seconds with 213 expanded states. Now it times out after 30 minutes. I'm not sure if this is because of changes in LM-CUT, search engine, or maybe something else. I don't think it's the recent translator changes, since I did not re-run the translator after these changes.

History
Date	User	Action	Args
2011-08-13 22:10:52	malte	set	status: chatting -> resolved messages: + msg1610
2011-08-12 22:00:03	jendrik	set	messages: + msg1598
2011-08-12 21:28:12	malte	set	messages: + msg1597
2011-08-12 21:21:07	jendrik	set	files: + plots-v2.tar.gz messages: + msg1595
2011-08-12 19:52:46	malte	set	messages: + msg1593
2011-08-12 17:57:43	jendrik	set	messages: + msg1589
2011-08-12 17:55:34	malte	set	messages: + msg1588
2011-08-12 17:49:00	jendrik	set	messages: + msg1587
2011-08-12 17:48:46	jendrik	set	files: - issue69-all-plots.tar.gz
2011-08-12 17:48:15	malte	set	messages: + msg1586
2011-08-12 17:43:24	jendrik	set	files: + exp-js-issue69-eval-abs-p.html messages: + msg1585
2011-08-12 17:39:56	jendrik	set	files: - issue69-scatter-domain.tar.gz
2011-08-12 17:39:51	jendrik	set	files: - issue69-scatter.tar.gz
2011-08-12 17:39:39	jendrik	set	files: - issue69ouSTRIPSeval-d-abs.html
2011-08-12 17:39:36	jendrik	set	files: - issue69ouSTRIPSeval-p-abs.html
2011-08-12 17:39:23	jendrik	set	files: - issue69ouSTRIPStrans3613eval-d-abs.html
2011-08-12 17:39:04	jendrik	set	files: + exp-js-issue69-eval-abs-d.html
2011-08-12 17:36:32	jendrik	set	messages: + msg1583
2011-08-12 11:19:03	malte	set	messages: + msg1574
2011-08-12 11:10:42	malte	set	messages: + msg1573
2011-08-12 11:06:47	malte	set	messages: + msg1572
2011-08-11 21:23:52	jendrik	set	files: + issue69-all-plots.tar.gz messages: + msg1567
2011-08-11 13:55:53	jendrik	set	messages: + msg1559
2011-08-11 12:08:20	malte	set	messages: + msg1557
2011-08-11 02:25:02	jendrik	set	messages: + msg1551
2011-08-11 01:44:45	jendrik	set	messages: + msg1550
2011-08-10 13:21:23	malte	set	messages: + msg1539
2011-08-10 13:12:23	jendrik	set	files: + issue69-scatter-domain.tar.gz messages: + msg1537
2011-08-09 19:56:42	malte	set	messages: + msg1526
2011-08-09 19:51:39	malte	set	messages: + msg1525
2011-08-09 19:48:50	jendrik	set	messages: + msg1524
2011-08-09 18:35:25	malte	set	messages: + msg1523
2011-08-09 17:53:00	jendrik	set	files: + issue69-scatter.tar.gz messages: + msg1522
2011-07-18 11:29:32	malte	set	messages: + msg1420
2011-07-18 08:01:46	erez	set	messages: + msg1419
2011-07-17 21:10:07	jendrik	set	messages: + msg1418
2010-10-28 00:00:36	malte	set	messages: + msg611
2010-10-26 20:00:03	jendrik	set	messages: + msg608
2010-10-26 19:35:16	malte	set	messages: + msg607
2010-10-26 19:30:49	jendrik	set	messages: + msg606
2010-10-26 17:45:04	jendrik	set	messages: + msg605
2010-10-26 13:22:05	malte	set	messages: + msg604
2010-10-26 12:11:09	malte	set	messages: + msg603
2010-10-26 11:50:01	malte	set	messages: + msg602
2010-10-26 11:16:45	erez	set	messages: + msg601
2010-10-26 10:57:18	malte	set	messages: + msg600
2010-10-26 10:29:38	erez	set	messages: + msg599
2010-10-26 03:00:27	malte	set	messages: + msg598
2010-10-26 02:42:09	jendrik	set	files: + issue69ouSTRIPStrans3613eval-d-abs.html messages: + msg597
2010-10-25 02:52:09	malte	set	messages: + msg588
2010-10-25 02:15:03	jendrik	set	messages: + msg587
2010-10-24 17:14:15	malte	set	messages: + msg586
2010-10-24 16:11:25	jendrik	set	messages: + msg585
2010-10-24 12:35:33	malte	set	messages: + msg584
2010-10-24 10:33:45	erez	set	messages: + msg583
2010-10-24 04:27:03	jendrik	set	files: + issue69ouSTRIPSeval-p-abs.html messages: + msg582
2010-10-24 03:57:08	malte	set	messages: + msg581
2010-10-24 03:39:06	jendrik	set	files: + issue69ouSTRIPSeval-d-abs.html nosy: + jendrik messages: + msg580
2010-03-22 14:34:28	malte	set	keyword: + 1.0
2010-01-21 19:17:31	erez	set	files: + lmcut.tar.gz messages: + msg228
2010-01-21 15:13:38	malte	set	messages: + msg227
2010-01-21 08:58:34	erez	set	messages: + msg226
2010-01-14 01:06:04	malte	set	messages: + msg225 title: A* with LM-CUT degradation -> A* with LM-CUT needs performance comparison between r3612/r3613 and HEAD
2010-01-13 15:10:33	malte	set	messages: + msg224
2010-01-13 15:09:40	erez	set	messages: + msg223
2010-01-13 15:07:25	malte	set	assignedto: malte messages: + msg222
2010-01-13 15:04:15	erez	set	messages: + msg221
2010-01-13 14:47:07	malte	set	status: unread -> chatting messages: + msg220
2010-01-13 14:07:19	erez	create

Issue69