Issue 391: Coverage drop in Pegsol with iPDB - Fast Downward issue tracker

Title	Coverage drop in Pegsol with iPDB
Priority	bug	Status	resolved
Superseder		Nosy List	florian, gabi, malte, silvan
Assigned To		Keywords
Optional summary

Created on 2013-09-16.18:06:29 by gabi, last changed by malte.

Messages
msg2917 (view)	Author: malte	Date: 2014-01-28.16:14:32
It seems that everybody seems to be at least OK with closing this and Silvan and I are strongly in favour of closing it. :-) So I'm closing it; if I'm interpreting the old messages incorrectly, just reopen.
msg2916 (view)	Author: silvan	Date: 2014-01-28.14:03:24
At least, we did not find any obvious problem in the code and the recent performance fix for iPDB could be somewhat related to the pegsol performance, although it is unclear if the two cases are really related.
msg2915 (view)	Author: gabi	Date: 2014-01-28.14:02:20
It seems that there is no problem in the code and we can simply close this issue. Right?
msg2914 (view)	Author: silvan	Date: 2014-01-28.13:59:22
I agree, the most likely explanation seems to be that the reported values in the SoCS paper are incorrect. Gabi, do you have any opinion on this? I would like to get rid of this open issue at some time... ;)
msg2893 (view)	Author: malte	Date: 2014-01-06.21:19:30
I see -- but this doesn't really solve the mystery why the numbers in the SoCS paper are quite good (18 solved tasks), but when we try to reproduce them with what we think should be the correct version, we only solve 3. So I would treat these numbers as suspect w.r.t. the SoCS paper version.
msg2892 (view)	Author: florian	Date: 2014-01-06.21:17:16
Yes, I was looking at the 2008 benchmark. As for the detailed info, I believe this is from a per-problem report, I looked at with Silvan some time ago. It contains some information about number of iterations, initial h value and (max?) improvement but not the full logs.
msg2891 (view)	Author: malte	Date: 2014-01-06.21:08:49
I think you're looking at different benchmark sets. The SOCS paper uses the 2011 benchmark suite, which has 20 peg solitaire instances. The paper shows a coverage of 18, and issue402 shows a coverage of 19. That's close enough considering that we made other improvements since then and are likely running on a faster machine. Regarding msg2656, I'm not really sure where the detailed information there comes from, since the main problem here is that we don't have the logs for the actual experiments run back then and can't reproduce them.
msg2890 (view)	Author: florian	Date: 2014-01-06.21:00:24
I re-read the messages here and to me it looks like issue402 was not responsible for the original change in coverage. Gabi mentioned (msg2656) that the reported coverage was 18 and with issue402 it is 29. This could be due to the other changes we made, but still seems like a large difference. Also, we can guess from the reports that the SoCS version stopped hill-climbing earlier (also msg2656) and issue402 should not influence this. I still am with Silvan on this one: without knowing the exact code and having a way to reproduce the results it will be hard to find out anything new, so I would be ok with closing this issue.
msg2889 (view)	Author: malte	Date: 2014-01-06.20:42:43
Florian and Gabi, what do you think? BTW, regarding the earlier comment on hg meld: I always run this as "hg meld -r rev1:rev2", i.e., with a colon instead of two -r arguments, when I want to compare two specific revisions. In my past experience, this always seemed to work. The four main forms of hg diff/meld I use regularly are: $ hg diff Compare working directory to parent. $ hg diff -r 10 Compare working directory to revision 10. $ hg diff -c 10 Compare revision 10 to its parent. $ hg diff -r 10:20 Compare revision 10 to revision 20.
msg2888 (view)	Author: silvan	Date: 2014-01-06.20:36:25
I'd be happy to be lazy and close this issue :) But maybe others have different opinions?
msg2885 (view)	Author: malte	Date: 2014-01-06.18:39:53
What is your preference?
msg2880 (view)	Author: silvan	Date: 2014-01-06.11:51:12
To answer your older questions: We do not have any logs from the runs, if not in some kind of archived form on a DVD in Freiburg (I remember you asked us to pack some experiment data of the experiments for the paper to give it to Uli for archiving it). I still have two old repositories: - one from the teamprojekt, where the last revision is 232de6d0ff7c (which is after you merged in our pdb code). - one for the socs paper where we back-integrated our two variants of pdb (base and efficient) and where we have many experiment scripts we used for the socs experiments. I therefore believe we also used the downward version from this repository for the experiments. The last "merge from master" revision is 1246ebf3408f. Anyway, when I compare those two revisions, all changes I find that are related to pdbs are changes for more statistics and options parsing related things. I think it is safe to assume these revisions to behave the same/unless some changes in the landmarks code, the mas code or the lmcut heuristic could have some impact on the pdbs). I then compared the newer of those versions (1246ebf3408f) against the version used in the experiments below (64c3312cf51f), which dates September 17 2013. The changes there include the addition of the dominance pruner and some other smallish changes which I cannot believe to be a reason for the observed behavior (but of course many changes outside the pdbs, e.g. in the landmarks and the state representation). (Btw., if you try to have a look at the diffs yourself: for me, hg meld -r <rev> -r <rev> did not work, it always compared against the revision the repository was currently updated to. So you would possibly need two clones, update to the respective revisions and diff manually.) So, to conclude and to answer your latest question: I am not sure at all what caused the observed behavior and I am not sure if the issue is resolved by the increased coverage from issue402. But I would be happy to "accept" it as resolved because I cannot think of any other, better reasons where the coverage in pegsol could have been lost.
msg2867 (view)	Author: malte	Date: 2013-12-30.19:45:15
It looks like issue402 might be related to this. Do you think we can close this as resolved by issue402?
msg2676 (view)	Author: malte	Date: 2013-09-26.19:11:33
OK, I guess both need a bit of implementation. Maybe the 7 is less strange than the 3 because we made a number of iPDB-related changes in the last months, but it would still be good to find out at which point we jumped from 0 to 7. Regarding the 3, I guess this means we need to find out what exactly we ran for the SoCS experiment, and maybe also where. Is there a good way to find this out? For example, do we still have the log files from these runs?
msg2675 (view)	Author: silvan	Date: 2013-09-26.19:09:02
A bit late, there you have the results: http://ai.cs.unibas.ch/_tmp_files/sieverss/ipdb-old-new-revisions-d.html http://ai.cs.unibas.ch/_tmp_files/sieverss/ipdb-old-new-revisions-p.html Interestingly, the results do not reflect the ones from the papers (at least for the used default ipdb config). The left column shows the old socs-version code and achieves a coverage of 3 (!= 18) and the right column with the newst downward version achieves a coverage of 7 (!= 0). This is still strange...
msg2669 (view)	Author: silvan	Date: 2013-09-20.15:25:29
Experiment is running. For now I only took the default ipdb-config; if you wanted more divers configurations (e.g. like in the socs paper), let me know.
msg2666 (view)	Author: malte	Date: 2013-09-19.17:50:37
Great! Can we get started by making an experiment that compares the old and new code on peg solitaire?
msg2663 (view)	Author: silvan	Date: 2013-09-19.10:35:01
We do have the old code and it is in the repository. The last revision I ever pushed to our Teamprojekt-repository is this one: 232de6d0ff7c I am very sure that the experiments for the SoCS-paper have been run on this revision. Furthermore, I could reproduce the behavior on the first pegsol-instance which is solved in about 60s with the old code. As there were some issues with running the old revision, I pushed a fixed version to a ai-repos-repository, I've granted Malte access. If anyone else is interested, let me know.
msg2657 (view)	Author: malte	Date: 2013-09-16.18:15:10
Is there a way for us to reproduce these results, i.e., is the old SoCS code in the repository? If yes, what revision?
msg2656 (view)	Author: gabi	Date: 2013-09-16.18:06:29
In their PDB paper, Sievers, Ortlieb and Helmert (SoCS 2012) report 18 solved pegsol instances with iPDB. In their IJCAI 13 paper, Pommerening, Röger and Helmert report a coverage of 0, albeit there should be no difference. Florian and Silvan already had a deeper look at the first instance: It seems that in the first four hill-climbing iterations, similar patterns are found: we do not have detailed logs for the SoCS results. According to the iPDB output we find patterns of the same size but we cannot know whether they are actually the same. However, with the old results these 4 iterations took 86 seconds in contrast to more than 330 seconds with the new results. Afterwards, the hill-climbing search in the SoCS results stops with an h-value of 1, but at the newer results the hill-climbing search continues because it finds new patterns with an improvement of 32. It runs the hill-climing until it times out and finds larger and larger patterns, increasing the h-value to 2 (h* is 3). It is unclear why we observe this different behaviour.

History
Date	User	Action	Args
2014-01-28 16:14:32	malte	set	status: chatting -> resolved messages: + msg2917
2014-01-28 14:03:24	silvan	set	messages: + msg2916
2014-01-28 14:02:20	gabi	set	messages: + msg2915
2014-01-28 13:59:22	silvan	set	messages: + msg2914
2014-01-06 21:19:30	malte	set	messages: + msg2893
2014-01-06 21:17:16	florian	set	messages: + msg2892
2014-01-06 21:08:49	malte	set	messages: + msg2891
2014-01-06 21:00:24	florian	set	messages: + msg2890
2014-01-06 20:42:43	malte	set	messages: + msg2889
2014-01-06 20:36:25	silvan	set	messages: + msg2888
2014-01-06 18:39:53	malte	set	messages: + msg2885
2014-01-06 11:51:12	silvan	set	messages: + msg2880
2013-12-30 19:45:15	malte	set	messages: + msg2867
2013-09-26 19:11:33	malte	set	messages: + msg2676
2013-09-26 19:09:02	silvan	set	messages: + msg2675
2013-09-20 15:25:29	silvan	set	messages: + msg2669
2013-09-19 17:50:37	malte	set	messages: + msg2666
2013-09-19 10:35:01	silvan	set	messages: + msg2663
2013-09-16 18:15:10	malte	set	status: unread -> chatting messages: + msg2657
2013-09-16 18:06:29	gabi	create

Issue391