Issue 869: skip hard-to-translate tasks in daily/weekly tests

Title	skip hard-to-translate tasks in daily/weekly tests
Priority	feature	Status	resolved
Superseder		Nosy List	florian, jendrik, malte
Assigned To	jendrik	Keywords
Optional summary

Created on 2018-11-26.13:52:37 by jendrik, last changed by malte.

Messages
msg8122 (view)	Author: malte	Date: 2018-11-28.12:51:31
Agreed, such a comparison would be nice. Ideally with different levels of granularity: something fast, something moderately comprehensive, and everything. I suggest we open an issue for more tests iff someone wants to work on it in the near or medium future.
msg8121 (view)	Author: florian	Date: 2018-11-28.12:49:27
I see, I thought we would compare that the translator output doesn't change from one revision to the next. I think this would be a useful test to add to the nightly/weekly build and could also take a bit longer. The local tests should remain fast.
msg8119 (view)	Author: malte	Date: 2018-11-28.12:39:13
The original intention was that these tests can be run in 10 seconds, so that everyone can quickly run this test before they push. We should also be clear what we are testing here: unless I'm mistaken, we only test that Python2.7 and Python 3 result in the same translator output, not that this translator output matches any reference translator output or similar. So we're not comparing to something objective, but only looking out for Python version dependencies. For these, I think one task in (almost) every domain should give us sufficient coverage. So I guess what I'm saying is: before we extend this test to more tasks, I would prefer us to think about: 1) What do we want to test? 2) What is the purpose of these tests? 3) Who should run these tests and in which situations?
msg8117 (view)	Author: florian	Date: 2018-11-28.12:29:51
I'm a bit late to the party but I would suggest that if we use a specific selection of tasks for the translator tests anyway, we test a few more tasks than just the first in each domain. How about testing all tasks that take less than 10 seconds and 2GB or some similar selection?
msg8102 (view)	Author: jendrik	Date: 2018-11-27.08:21:30
I had pushed only to the master repo, not my own repo. The new revisions are now in both repos. Regarding your earlier question about memory: I have just overseen that tasks with runtime=None may have used too much memory or too much time.
msg8099 (view)	Author: malte	Date: 2018-11-27.00:53:51
Awesome, thanks! I see no new commits on the pull request, though?
msg8097 (view)	Author: jendrik	Date: 2018-11-26.19:17:43
Thanks for the comments on Bitbucket! I took care of them and merged this. The nightly test is green now and the weekly test should be in a few minutes.
msg8096 (view)	Author: malte	Date: 2018-11-26.17:55:09
Thanks, Jendrik! I've gone through the experimental results, and this list of domains looks good to me. Pull requests looks good, I left two small comments. > Memory usage is not a problem as all first tasks from IPC 2018 > use less than 2 GB. As I wrote, I think the first task of organic-synthesis-sat18-strips needs more than 10 GiB; I aborted it at that point. I've now run it a bit longer, and it used roughly 15 GiB within 7 minutes. At that point I had to abort because the machine started swapping too much. In your experiments it looks like this one failed. Or perhaps you meant all first tasks apart from the ones in the domains you mentioned?
msg8095 (view)	Author: jendrik	Date: 2018-11-26.15:18:34
I made a pull request at https://bitbucket.org/jendrikseipp/downward/pull-requests/108 .
msg8093 (view)	Author: jendrik	Date: 2018-11-26.14:45:19
Here are the times and peak memory usages for translating all IPC benchmarks up to 2018: https://ai.dmi.unibas.ch/_tmp_files/seipp/issue869-base-translate-all.html Based on the runtime results, I propose to ignore the first tasks from the following directories: agricola-sat18-strips organic-synthesis-sat18-strips organic-synthesis-split-opt18-strips organic-synthesis-split-sat18-strips Memory usage is not a problem as all first tasks from IPC 2018 use less than 2 GB.
msg8089 (view)	Author: jendrik	Date: 2018-11-26.13:52:37
Quoting Malte: "The problem seems to be that the translator test is supposed to only use tasks that are quick to complete. We have implemented this requirement as "use the first task in each domain", but since we've added the IPC 2018 benchmarks, this no longer works. For example, the task "organic-synthesis-sat18-strips-p01.pddl" uses more than 18 GiB in the translator on my machine, and I aborted it after 10 minutes. I think the solution for this would be to define a different suite for these tests, for example something as simple as "The first task in all domains except X, Y and Z.""

History
Date	User	Action	Args
2018-11-28 12:51:31	malte	set	messages: + msg8122
2018-11-28 12:49:27	florian	set	messages: + msg8121
2018-11-28 12:39:13	malte	set	messages: + msg8119
2018-11-28 12:29:51	florian	set	nosy: + florian messages: + msg8117
2018-11-27 08:21:30	jendrik	set	messages: + msg8102
2018-11-27 00:53:51	malte	set	messages: + msg8099
2018-11-26 19:17:43	jendrik	set	status: reviewing -> resolved messages: + msg8097
2018-11-26 17:55:10	malte	set	messages: + msg8096
2018-11-26 15:18:34	jendrik	set	status: in-progress -> reviewing messages: + msg8095
2018-11-26 14:45:19	jendrik	set	status: unread -> in-progress messages: + msg8093
2018-11-26 13:52:37	jendrik	create

Issue869