Issue398

Title invariant synthesis: determinism and speed issues
Priority wish Status resolved
Superseder Nosy List gabi, jendrik, malte
Assigned To gabi Keywords translator
Optional summary

Created on 2013-11-22.21:22:57 by malte, last changed by malte.

Files
File name Uploaded Type Edit Remove
analyze-invariant-generation-alldomains-eval.tar.gz jendrik, 2013-12-03.12:05:39 application/gzip
analyze-invariant-generation-eval.tar.gz jendrik, 2013-12-01.11:37:27 application/gzip
Messages
msg11153 (view) Author: malte Date: 2023-07-21.14:43:52
[We looked at this as part of a triaging round. Opinions below agreed at least by the group that was present. ;-)]

Marking this as resolved because it's a wish issue with no specific implementation plan and there is nothing in the discussion that we don't know well.

Specifically, we know that our invariant synthesis algorithm is unsuitable for grounded tasks and not a great invariant synthesis algorithm in general. The solution for this would be to implement something better; I don't think we should expend the effort trying to doctor with the existing algorithm much more to make it more reproducible in the cases where it does poorly. (There would be some value in this, but I think it isn't good value for effort.)

The real solution would be to implement something like Jussi's lifted invariant synthesis algorithm, perhaps with some extra stuff to deal with language fragments he doesn't cover. If someone is interested in actively working on this, this would be worth adding a new issue, but we generally don't create these without a concrete intention to follow through.
msg2757 (view) Author: jendrik Date: 2013-12-03.17:43:06
I removed pathways from suite_satisficing_with_ipc11 and suite_all.
msg2756 (view) Author: malte Date: 2013-12-03.13:39:04
OK, we should make sure not to use this in our papers then. Maybe remove
pathways from the suite? (pathways-noneg is a semantically identical
reformulation with negative preconditions compiled away)
msg2755 (view) Author: jendrik Date: 2013-12-03.13:27:16
ALL includes both pathways and pathways-noneg.
msg2754 (view) Author: malte Date: 2013-12-03.12:56:29
Thanks! It's the custom that if a domain has multiple alternative formulations
of alternative problems, planners can pick their formulation. That's why ALL
only includes one formulation per domain. (Does it include pathways +
pathways-neg, for example? I'm not sure if we'd want that.)

If I didn't overlook anything, it seems there are no further problematic cases.
msg2753 (view) Author: jendrik Date: 2013-12-03.12:05:39
I didn't know that the suite ALL does not include airport-adl, no-mprime and no-
mystery. These are included in the attached results now. Also I fixed a markup 
error for the empty lists in assembly.
msg2744 (view) Author: gabi Date: 2013-12-02.17:44:53
Using the action name to get the heavy action is indeed a bug. I split this up
as issue399.
msg2741 (view) Author: malte Date: 2013-12-02.14:27:35
> Also, while looking at the source code of BalanceChecker.get_heavy_action, it
> seems to rely on the assumption that action names are unique, which I think
> isn't necessarily true after normalization. What do you think?

@Gabi: what are your thoughts on this?
msg2740 (view) Author: malte Date: 2013-12-02.14:25:32
Observations:

* It seems we only have data for our "regular" domains? (See msg2719.)

* There seems to be something wrong in the parser used to generate the
"taskwise-invariant-attributes" report. The empty lists as in Assembly look odd.

* There are two domains where runtimes are very large: openstacks-strips and
trucks-strips. Interestingly, we never seem to time out. Instead, we hit the
MAX_CANDIDATES limit of 100000. This is very different from what we see in
msg2717 and suggests that there is indeed a severe speed degradation with the
version tested in that message.

* Apart from the two critical domains and a third one, psr-small, we never look
at more than 1000 candidates. In psr-small, where we get up to 2574 candidates.

* Also, apart from these three domains (openstacks-strips, trucks-strips,
psr-small), the maximal invariant size before grounding is 4, whereas in these
domains we find invariants of size up to 24, and probably would find even larger
ones if we didn't cut the generation short after a certain time/number of
candidates.

* These three domains are the only ones that are fully grounded. At least that's
what 'grep -L '?' benchmarks/*/*dom*' suggests.

In conclusion, it looks like our invariant synthesis algorithm is not that
suitable for problems that are fully grounded. This isn't a surprise, but it's
interesting to see this brought out so nicely in the data.

There are different ways in which this could be addressed. If we want to
preserve the old performance *and* not lose any invariants *and* be
deterministic, I think the only real solution is to detect grounded problems and
use a more suitable algorithm for them. Of course this would be a lot of work. I
suggest we discuss this further offline.
msg2738 (view) Author: malte Date: 2013-12-01.12:27:44
Interesting, looks like we don't find any invariants in Schedule. Shouldn't we?
I thought we used to in the past.
msg2737 (view) Author: jendrik Date: 2013-12-01.11:37:27
Here are the changes I made to log the requested values, please check that I 
made the correct measurements: 
https://bitbucket.org/jendrikseipp/downward/commits/94d34322b477ca5cf90943ba22c2
8899765133f8?at=default

The results are attached.
msg2736 (view) Author: malte Date: 2013-11-30.18:06:17
Yes, please.

I just checked a code: it's a BFS, as it's using a deque where candidates are
popped from the beginning and pushed to the end.
msg2735 (view) Author: jendrik Date: 2013-11-30.18:04:43
I don't know how the code corresponds to a BFS or DFS, but I could try to produce 
the numbers you requested in msg2719. Should I do that?
msg2723 (view) Author: malte Date: 2013-11-23.19:59:42
Ah, thanks, I was misled by bitbucket's diff context, which only shows the
too-heavy test. I should have expanded the view.

Still, if I see it correctly, all that happens here is that we modify the order
of successor generation in a breadth-first search. Right? (Or do we perform a
depth-first search?)

Whatever the successor order, in a BFS we proceed layer-by-layer through the
space of candidates, and we abort if we have explored the whole tree (which
doesn't depend on the order) or time out (which does depend on the order, but
isn't reproducible either way).

So I don't think the added code and time complexity offsets the benefits here.
msg2722 (view) Author: jendrik Date: 2013-11-23.19:50:10
operator_unbalanced() has side effects since it adds new candidates.
msg2721 (view) Author: malte Date: 2013-11-23.19:44:09
Hmmm, this doesn't affect the outcome of the algorithm (except that it may be
faster or slower, but not in a predictable way). In a loop like

for x in y:
    if f(x):
        return False

the order in which y is traversed doesn't matter unless f has side effects
(which it doesn't here).

I would recommend to revert this change.

Also, while looking at the source code of BalanceChecker.get_heavy_action, it
seems to rely on the assumption that action names are unique, which I think
isn't necessarily true after normalization. What do you think?
msg2720 (view) Author: jendrik Date: 2013-11-23.19:37:21
The change that I made and Gabi tested can be seen here (if you click on the 
"Diff" button): 
https://bitbucket.org/jendrikseipp/downward/branch/issue22
msg2719 (view) Author: malte Date: 2013-11-22.21:35:03
Because it involves a timeout, invariant synthesis will never be deterministic
in a strong sense. For cases that don't involve the timeout or cutoff, I'm not
sure if the change discussed in msg2693 actually makes a difference. Can I have
a look at the changeset somewhere?

Maybe we should add some stronger cutoff criteria so that we don't hit the time
limit as often as we currently do. This might also help with determinism.

I suggest the following experiment: for every task that the translator can
handle (including domain formulations we don't have in our standard benchmark
suites, such as airport-adl), report the following data:

- did invariant synthesis terminate normally
- how much time did invariant synthesis take
- how many candidates were seen
- how many invariants were found
- what is the length of each of these invariants, *before* grounding (i.e.,
don't convert to "fact groups", stay with the first-order representation)

Simple prints are enough; we don't need a lab report for this.
msg2717 (view) Author: malte Date: 2013-11-22.21:22:57
Moved over from issue22. This is about the part of that issue described in
msg2685 and later.

Summary: invariant synthesis in the translator isn't always deterministic, and
we'd like to do something about this. It can also be quite slow.

Quoting from msg2707:
========================================================================
There is some significant performance degeneration in openstacks-strips.
I assume that the problem is that the operator description in this domain are 
already grounded, so there are lots of actions to be processed by the
invariance analysis.

Jendrik, I think we should try to catch this non-determinism earlier so that we 
do not need to sort the threatening actions so often. Can you check whether 
the difference occurs first with the actions_to_check set or already earlier
in the sets with the threatening actions for a predicate?

If someone is interested, hre are the problematic translator times:

<task> <old translator time> -> <new translator time>

openstacks-strips/p06.pddl 40.6s -> 60.8s
openstacks-strips/p07.pddl 44.4s -> 60.8s
openstacks-strips/p08.pddl 56.8s -> 300.6s
openstacks-strips/p09.pddl 53.6s -> 300.6s
openstacks-strips/p10.pddl 41.0s -> 300.8s
openstacks-strips/p11.pddl 40.0s -> 300.7s
openstacks-strips/p12.pddl 45.3s -> 300.8s
openstacks-strips/p13.pddl 36.9s -> 300.8s
openstacks-strips/p14.pddl 41.0s -> 300.8s
openstacks-strips/p15.pddl 44.9s -> 300.8s
openstacks-strips/p16.pddl 40.3s -> 300.8s
openstacks-strips/p17.pddl 45.3s -> 301.0s
openstacks-strips/p18.pddl 45.6s -> 302.2s
openstacks-strips/p19.pddl 38.1s -> 301.2s
openstacks-strips/p20.pddl 38.4s -> 301.7s
openstacks-strips/p21.pddl 40.1s -> 301.7s
openstacks-strips/p22.pddl 38.7s -> 301.8s
openstacks-strips/p23.pddl 38.4s -> 301.8s
openstacks-strips/p24.pddl 51.6s -> 309.5s
openstacks-strips/p25.pddl 54.3s -> 308.9s
openstacks-strips/p26.pddl 42.9s -> 305.2s
openstacks-strips/p27.pddl 47.9s -> 312.2s
openstacks-strips/p28.pddl 116.7s -> 357.8s
openstacks-strips/p29.pddl 107.8s -> 352.1s
openstacks-strips/p30.pddl 62.5s -> 321.3s
========================================================================
History
Date User Action Args
2023-07-21 14:43:52maltesetstatus: chatting -> resolved
messages: + msg11153
2014-10-04 19:57:57maltesetkeyword: + translator
2013-12-03 17:43:06jendriksetmessages: + msg2757
2013-12-03 13:39:04maltesetmessages: + msg2756
2013-12-03 13:27:16jendriksetmessages: + msg2755
2013-12-03 12:56:29maltesetmessages: + msg2754
2013-12-03 12:05:39jendriksetfiles: + analyze-invariant-generation-alldomains-eval.tar.gz
messages: + msg2753
2013-12-02 17:44:53gabisetmessages: + msg2744
2013-12-02 14:27:35maltesetmessages: + msg2741
2013-12-02 14:25:33maltesetmessages: + msg2740
2013-12-01 12:27:44maltesetmessages: + msg2738
2013-12-01 11:37:27jendriksetfiles: + analyze-invariant-generation-eval.tar.gz
messages: + msg2737
2013-11-30 18:06:17maltesetmessages: + msg2736
2013-11-30 18:04:43jendriksetmessages: + msg2735
2013-11-23 19:59:42maltesetmessages: + msg2723
2013-11-23 19:50:10jendriksetmessages: + msg2722
2013-11-23 19:44:09maltesetmessages: + msg2721
2013-11-23 19:37:21jendriksetmessages: + msg2720
2013-11-22 21:35:03maltesetmessages: + msg2719
2013-11-22 21:22:57maltecreate