Issue 594: lab: use correct revisions in FastDownwardExperiment

Title	lab: use correct revisions in FastDownwardExperiment
Priority	bug	Status	resolved
Superseder		Nosy List	jendrik, malte, silvan
Assigned To	jendrik	Keywords
Optional summary

Created on 2015-11-11.22:38:03 by jendrik, last changed by malte.

Messages
msg4788 (view)	Author: malte	Date: 2015-11-12.18:10:14
Thanks for the quick fix, Jendrik! :-) I assume this requires no changes in common_setup.py, right? (If it doesn't, no need to reply.) Given that this was a lab issue, we won't need the branch I started in my bitbucket repository for this, so I will strip it to avoid accidentally pushing it to master later. So if you pulled this, you'll have to strip it, too.
msg4787 (view)	Author: jendrik	Date: 2015-11-12.15:47:38
Rerunning the test experiment showed no unexpected errors and the correct revisions are compared. A new lab version has been released.
msg4786 (view)	Author: jendrik	Date: 2015-11-12.14:00:51
Yes, this only affects experiments comparing mutiple revisions.
msg4785 (view)	Author: silvan	Date: 2015-11-12.13:03:55
Jendrik, does this only affect experiments where we use more than one revision? I already ran lots of experiments on research branches, and at least after adding a new option for example, the correct revision must have been used, because otherwise, the option would not have been accepted by Fast Downward.
msg4784 (view)	Author: jendrik	Date: 2015-11-12.12:30:45
The bug is now fixed in lab. I'll rerun the issue481 experiment and tag a new lab bugfix release afterwards.
msg4783 (view)	Author: jendrik	Date: 2015-11-12.12:13:32
Thanks to your investigation I could pinpoint the error to the new whole- planner experiment class. There's a bug that causes all "revisions" to use the same (random) revision. I'll report back once I fixed the bug. I've changed the title and added Silvan to the nosy list since he's already using the new experiment class for comparing revisions.
msg4782 (view)	Author: malte	Date: 2015-11-12.11:17:42
I've run a few more tests with issue481, and I get the following curious result: - If I run the baseline revision and the issue branch revision in the experiment, then I get the unexplained errors for both revisions. For the issue branch, this is not so surprising because it currently comments out the signal handler for debug reasons. However, for the baseline revision it is surprising. - If I run the same experiment, but only using the baseline revision (i.e. the only change is I remove one of the revisions), then the baseline revision doesn't produce errors any more. That is, the baseline revision only produces errors if the issue branch revision is also part of the experiment. Of course, there may also be random failures involved, and the above observation might be the outcome of a random process. But I suspect there is something more going on there. One possible explanation is that the wrong code is run or that the results are somehow jumbled up during fetching or report generation. I don't have time to look into this more at the moment, but I'll try to look at it again later.
msg4781 (view)	Author: malte	Date: 2015-11-12.10:34:04
OK, the issue481 experiment seems to fail reproducibly, or at least it failed similarly on second attempt. I made a smaller version of it that only considers the floortile domain, and I got similar errors as before (in the issue481 branch under v2-*.py). I'll try to look into this a bit more over the next days if I can find the time.
msg4780 (view)	Author: malte	Date: 2015-11-12.09:41:09
No problems with the grid experiment either, so this cannot be reproduced for now. I'll try to repeat the original experiment from issue481 to see if the errors there are reproducible. If not, I'd close this for now, since it might just be a sporadic grid issue (although I don't really know how that could be the case).
msg4777 (view)	Author: malte	Date: 2015-11-12.07:05:59
Let's wait for the outcome of the experiments. But I think one difference is that we used to set the memory limit within lab, whereas now we rely on setting the memory limit in the driver script (with the driver option for setting memory limits). My understanding is that the new code doesn't set a memory limit within lab at all. Is that right? (If it does set one, with which method is it set and how high is it?)
msg4776 (view)	Author: jendrik	Date: 2015-11-12.00:27:42
Hmm, I've looked at the way the memory limit is set by the old and new experiment classes and couldn't find a meaningful difference.
msg4775 (view)	Author: malte	Date: 2015-11-11.23:58:03
I've started an experiment on maia, but the queue is very full, so I don't expect it to run soon. In the meantime I've also tried to reproduce this manually with the mentioned revision by running ./fast-downward.py --search-memory-limit=2G seq-p04-007.pddl --search "eager_greedy(ff())" manually on different machines. (This is from the floortile-sat11-strips domain; I had copied the PDDL files to the current directory.) I've also tried with 128M instead of 2G. So far, none of the manual attempts could reproduce this. I tried on my home desktop, on maia, and on ase01. In all six cases (three machines, two memory limits), the planner shut down cleanly after hitting the memory limit. So it looks like either we can't reproduce it at all, or we can only reproduce it when running within a grid job. If the latter is the case, it may be due to some interaction with the latest version of lab, since the main thing that has changed recently in this department is the lab upgrade and usage of whole-planner experiments. I'll send another update when the grid experiment is done.
msg4774 (view)	Author: malte	Date: 2015-11-11.22:56:37
I'm not sure if I have time to really work on this, but I can try to reproduce it and find out when this was introduced. I've started a pull request here for this: https://bitbucket.org/malte/downward/pull-requests/5/issue594-dont-let-bad_alloc-escape-from/diff
msg4773 (view)	Author: jendrik	Date: 2015-11-11.22:38:03
Working on issue481 we noticed that the planner is often aborted when it runs out of memory without our out-of-memory handler being called. This happens e.g. in revision 6642b246b180 using the configuration ["--search", "eager_greedy(ff())"]. The "floortile-sat11-strips" domain should provide a good test suite since the error happens very often there. We should try to find out where this regression happened and fix it.

History
Date	User	Action	Args
2015-11-12 18:10:14	malte	set	messages: + msg4788
2015-11-12 15:47:38	jendrik	set	status: in-progress -> resolved assignedto: jendrik messages: + msg4787
2015-11-12 14:00:51	jendrik	set	messages: + msg4786
2015-11-12 13:03:56	silvan	set	messages: + msg4785
2015-11-12 12:30:45	jendrik	set	messages: + msg4784
2015-11-12 12:13:32	jendrik	set	status: chatting -> in-progress nosy: + silvan messages: + msg4783 title: fix catching out-of-memory errors -> lab: use correct revisions in FastDownwardExperiment
2015-11-12 11:17:42	malte	set	messages: + msg4782
2015-11-12 10:34:04	malte	set	messages: + msg4781
2015-11-12 09:41:09	malte	set	messages: + msg4780
2015-11-12 07:05:59	malte	set	messages: + msg4777
2015-11-12 00:27:42	jendrik	set	messages: + msg4776
2015-11-11 23:58:03	malte	set	messages: + msg4775
2015-11-11 22:56:37	malte	set	status: unread -> chatting messages: + msg4774
2015-11-11 22:38:03	jendrik	create

Issue594