|
Created on 2018-11-26.13:52:37 by jendrik, last changed by malte.
msg8122 (view) |
Author: malte |
Date: 2018-11-28.12:51:31 |
|
Agreed, such a comparison would be nice. Ideally with different levels of
granularity: something fast, something moderately comprehensive, and everything.
I suggest we open an issue for more tests iff someone wants to work on it in the
near or medium future.
|
msg8121 (view) |
Author: florian |
Date: 2018-11-28.12:49:27 |
|
I see, I thought we would compare that the translator output doesn't change from
one revision to the next. I think this would be a useful test to add to the
nightly/weekly build and could also take a bit longer. The local tests should
remain fast.
|
msg8119 (view) |
Author: malte |
Date: 2018-11-28.12:39:13 |
|
The original intention was that these tests can be run in 10 seconds, so that
everyone can quickly run this test before they push.
We should also be clear what we are testing here: unless I'm mistaken, we only
test that Python2.7 and Python 3 result in the same translator output, not that
this translator output matches any reference translator output or similar. So
we're not comparing to something objective, but only looking out for Python
version dependencies. For these, I think one task in (almost) every domain
should give us sufficient coverage.
So I guess what I'm saying is: before we extend this test to more tasks, I would
prefer us to think about:
1) What do we want to test?
2) What is the purpose of these tests?
3) Who should run these tests and in which situations?
|
msg8117 (view) |
Author: florian |
Date: 2018-11-28.12:29:51 |
|
I'm a bit late to the party but I would suggest that if we use a specific
selection of tasks for the translator tests anyway, we test a few more tasks
than just the first in each domain. How about testing all tasks that take less
than 10 seconds and 2GB or some similar selection?
|
msg8102 (view) |
Author: jendrik |
Date: 2018-11-27.08:21:30 |
|
I had pushed only to the master repo, not my own repo. The new revisions are now
in both repos.
Regarding your earlier question about memory: I have just overseen that tasks
with runtime=None may have used too much memory or too much time.
|
msg8099 (view) |
Author: malte |
Date: 2018-11-27.00:53:51 |
|
Awesome, thanks! I see no new commits on the pull request, though?
|
msg8097 (view) |
Author: jendrik |
Date: 2018-11-26.19:17:43 |
|
Thanks for the comments on Bitbucket! I took care of them and merged this. The
nightly test is green now and the weekly test should be in a few minutes.
|
msg8096 (view) |
Author: malte |
Date: 2018-11-26.17:55:09 |
|
Thanks, Jendrik!
I've gone through the experimental results, and this list of domains looks good
to me.
Pull requests looks good, I left two small comments.
> Memory usage is not a problem as all first tasks from IPC 2018
> use less than 2 GB.
As I wrote, I think the first task of organic-synthesis-sat18-strips needs more
than 10 GiB; I aborted it at that point. I've now run it a bit longer, and it
used roughly 15 GiB within 7 minutes. At that point I had to abort because the
machine started swapping too much. In your experiments it looks like this one
failed. Or perhaps you meant all first tasks apart from the ones in the domains
you mentioned?
|
msg8095 (view) |
Author: jendrik |
Date: 2018-11-26.15:18:34 |
|
I made a pull request at https://bitbucket.org/jendrikseipp/downward/pull-requests/108 .
|
msg8093 (view) |
Author: jendrik |
Date: 2018-11-26.14:45:19 |
|
Here are the times and peak memory usages for translating all IPC benchmarks up
to 2018:
https://ai.dmi.unibas.ch/_tmp_files/seipp/issue869-base-translate-all.html
Based on the runtime results, I propose to ignore the first tasks from the
following directories:
agricola-sat18-strips
organic-synthesis-sat18-strips
organic-synthesis-split-opt18-strips
organic-synthesis-split-sat18-strips
Memory usage is not a problem as all first tasks from IPC 2018 use less than 2
GB.
|
msg8089 (view) |
Author: jendrik |
Date: 2018-11-26.13:52:37 |
|
Quoting Malte:
"The problem seems to be that the translator test is supposed to only use
tasks that are quick to complete. We have implemented this requirement
as "use the first task in each domain", but since we've added the IPC
2018 benchmarks, this no longer works. For example, the task
"organic-synthesis-sat18-strips-p01.pddl" uses more than 18 GiB in the
translator on my machine, and I aborted it after 10 minutes.
I think the solution for this would be to define a different suite for
these tests, for example something as simple as "The first task in all
domains except X, Y and Z.""
|
|
Date |
User |
Action |
Args |
2018-11-28 12:51:31 | malte | set | messages:
+ msg8122 |
2018-11-28 12:49:27 | florian | set | messages:
+ msg8121 |
2018-11-28 12:39:13 | malte | set | messages:
+ msg8119 |
2018-11-28 12:29:51 | florian | set | nosy:
+ florian messages:
+ msg8117 |
2018-11-27 08:21:30 | jendrik | set | messages:
+ msg8102 |
2018-11-27 00:53:51 | malte | set | messages:
+ msg8099 |
2018-11-26 19:17:43 | jendrik | set | status: reviewing -> resolved messages:
+ msg8097 |
2018-11-26 17:55:10 | malte | set | messages:
+ msg8096 |
2018-11-26 15:18:34 | jendrik | set | status: in-progress -> reviewing messages:
+ msg8095 |
2018-11-26 14:45:19 | jendrik | set | status: unread -> in-progress messages:
+ msg8093 |
2018-11-26 13:52:37 | jendrik | create | |
|