Issue 398: invariant synthesis: determinism and speed issues

Title	invariant synthesis: determinism and speed issues
Priority	wish	Status	resolved
Superseder		Nosy List	gabi, jendrik, malte
Assigned To	gabi	Keywords	translator
Optional summary

Created on 2013-11-22.21:22:57 by malte, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
analyze-invariant-generation-alldomains-eval.tar.gz	jendrik, 2013-12-03.12:05:39	application/gzip
analyze-invariant-generation-eval.tar.gz	jendrik, 2013-12-01.11:37:27	application/gzip

Messages
msg11153 (view)	Author: malte	Date: 2023-07-21.14:43:52
[We looked at this as part of a triaging round. Opinions below agreed at least by the group that was present. ;-)] Marking this as resolved because it's a wish issue with no specific implementation plan and there is nothing in the discussion that we don't know well. Specifically, we know that our invariant synthesis algorithm is unsuitable for grounded tasks and not a great invariant synthesis algorithm in general. The solution for this would be to implement something better; I don't think we should expend the effort trying to doctor with the existing algorithm much more to make it more reproducible in the cases where it does poorly. (There would be some value in this, but I think it isn't good value for effort.) The real solution would be to implement something like Jussi's lifted invariant synthesis algorithm, perhaps with some extra stuff to deal with language fragments he doesn't cover. If someone is interested in actively working on this, this would be worth adding a new issue, but we generally don't create these without a concrete intention to follow through.
msg2757 (view)	Author: jendrik	Date: 2013-12-03.17:43:06
I removed pathways from suite_satisficing_with_ipc11 and suite_all.
msg2756 (view)	Author: malte	Date: 2013-12-03.13:39:04
OK, we should make sure not to use this in our papers then. Maybe remove pathways from the suite? (pathways-noneg is a semantically identical reformulation with negative preconditions compiled away)
msg2755 (view)	Author: jendrik	Date: 2013-12-03.13:27:16
ALL includes both pathways and pathways-noneg.
msg2754 (view)	Author: malte	Date: 2013-12-03.12:56:29
Thanks! It's the custom that if a domain has multiple alternative formulations of alternative problems, planners can pick their formulation. That's why ALL only includes one formulation per domain. (Does it include pathways + pathways-neg, for example? I'm not sure if we'd want that.) If I didn't overlook anything, it seems there are no further problematic cases.
msg2753 (view)	Author: jendrik	Date: 2013-12-03.12:05:39
I didn't know that the suite ALL does not include airport-adl, no-mprime and no- mystery. These are included in the attached results now. Also I fixed a markup error for the empty lists in assembly.
msg2744 (view)	Author: gabi	Date: 2013-12-02.17:44:53
Using the action name to get the heavy action is indeed a bug. I split this up as issue399.
msg2741 (view)	Author: malte	Date: 2013-12-02.14:27:35
> Also, while looking at the source code of BalanceChecker.get_heavy_action, it > seems to rely on the assumption that action names are unique, which I think > isn't necessarily true after normalization. What do you think? @Gabi: what are your thoughts on this?
msg2740 (view)	Author: malte	Date: 2013-12-02.14:25:32
Observations: * It seems we only have data for our "regular" domains? (See msg2719.) * There seems to be something wrong in the parser used to generate the "taskwise-invariant-attributes" report. The empty lists as in Assembly look odd. * There are two domains where runtimes are very large: openstacks-strips and trucks-strips. Interestingly, we never seem to time out. Instead, we hit the MAX_CANDIDATES limit of 100000. This is very different from what we see in msg2717 and suggests that there is indeed a severe speed degradation with the version tested in that message. * Apart from the two critical domains and a third one, psr-small, we never look at more than 1000 candidates. In psr-small, where we get up to 2574 candidates. * Also, apart from these three domains (openstacks-strips, trucks-strips, psr-small), the maximal invariant size before grounding is 4, whereas in these domains we find invariants of size up to 24, and probably would find even larger ones if we didn't cut the generation short after a certain time/number of candidates. * These three domains are the only ones that are fully grounded. At least that's what 'grep -L '?' benchmarks//dom' suggests. In conclusion, it looks like our invariant synthesis algorithm is not that suitable for problems that are fully grounded. This isn't a surprise, but it's interesting to see this brought out so nicely in the data. There are different ways in which this could be addressed. If we want to preserve the old performance and* not lose any invariants and be deterministic, I think the only real solution is to detect grounded problems and use a more suitable algorithm for them. Of course this would be a lot of work. I suggest we discuss this further offline.
msg2738 (view)	Author: malte	Date: 2013-12-01.12:27:44
Interesting, looks like we don't find any invariants in Schedule. Shouldn't we? I thought we used to in the past.
msg2737 (view)	Author: jendrik	Date: 2013-12-01.11:37:27
Here are the changes I made to log the requested values, please check that I made the correct measurements: https://bitbucket.org/jendrikseipp/downward/commits/94d34322b477ca5cf90943ba22c2 8899765133f8?at=default The results are attached.
msg2736 (view)	Author: malte	Date: 2013-11-30.18:06:17
Yes, please. I just checked a code: it's a BFS, as it's using a deque where candidates are popped from the beginning and pushed to the end.
msg2735 (view)	Author: jendrik	Date: 2013-11-30.18:04:43
I don't know how the code corresponds to a BFS or DFS, but I could try to produce the numbers you requested in msg2719. Should I do that?
msg2723 (view)	Author: malte	Date: 2013-11-23.19:59:42
Ah, thanks, I was misled by bitbucket's diff context, which only shows the too-heavy test. I should have expanded the view. Still, if I see it correctly, all that happens here is that we modify the order of successor generation in a breadth-first search. Right? (Or do we perform a depth-first search?) Whatever the successor order, in a BFS we proceed layer-by-layer through the space of candidates, and we abort if we have explored the whole tree (which doesn't depend on the order) or time out (which does depend on the order, but isn't reproducible either way). So I don't think the added code and time complexity offsets the benefits here.
msg2722 (view)	Author: jendrik	Date: 2013-11-23.19:50:10
operator_unbalanced() has side effects since it adds new candidates.
msg2721 (view)	Author: malte	Date: 2013-11-23.19:44:09
Hmmm, this doesn't affect the outcome of the algorithm (except that it may be faster or slower, but not in a predictable way). In a loop like for x in y: if f(x): return False the order in which y is traversed doesn't matter unless f has side effects (which it doesn't here). I would recommend to revert this change. Also, while looking at the source code of BalanceChecker.get_heavy_action, it seems to rely on the assumption that action names are unique, which I think isn't necessarily true after normalization. What do you think?
msg2720 (view)	Author: jendrik	Date: 2013-11-23.19:37:21
The change that I made and Gabi tested can be seen here (if you click on the "Diff" button): https://bitbucket.org/jendrikseipp/downward/branch/issue22
msg2719 (view)	Author: malte	Date: 2013-11-22.21:35:03
Because it involves a timeout, invariant synthesis will never be deterministic in a strong sense. For cases that don't involve the timeout or cutoff, I'm not sure if the change discussed in msg2693 actually makes a difference. Can I have a look at the changeset somewhere? Maybe we should add some stronger cutoff criteria so that we don't hit the time limit as often as we currently do. This might also help with determinism. I suggest the following experiment: for every task that the translator can handle (including domain formulations we don't have in our standard benchmark suites, such as airport-adl), report the following data: - did invariant synthesis terminate normally - how much time did invariant synthesis take - how many candidates were seen - how many invariants were found - what is the length of each of these invariants, before grounding (i.e., don't convert to "fact groups", stay with the first-order representation) Simple prints are enough; we don't need a lab report for this.
msg2717 (view)	Author: malte	Date: 2013-11-22.21:22:57
Moved over from issue22. This is about the part of that issue described in msg2685 and later. Summary: invariant synthesis in the translator isn't always deterministic, and we'd like to do something about this. It can also be quite slow. Quoting from msg2707: ======================================================================== There is some significant performance degeneration in openstacks-strips. I assume that the problem is that the operator description in this domain are already grounded, so there are lots of actions to be processed by the invariance analysis. Jendrik, I think we should try to catch this non-determinism earlier so that we do not need to sort the threatening actions so often. Can you check whether the difference occurs first with the actions_to_check set or already earlier in the sets with the threatening actions for a predicate? If someone is interested, hre are the problematic translator times: <task> <old translator time> -> <new translator time> openstacks-strips/p06.pddl 40.6s -> 60.8s openstacks-strips/p07.pddl 44.4s -> 60.8s openstacks-strips/p08.pddl 56.8s -> 300.6s openstacks-strips/p09.pddl 53.6s -> 300.6s openstacks-strips/p10.pddl 41.0s -> 300.8s openstacks-strips/p11.pddl 40.0s -> 300.7s openstacks-strips/p12.pddl 45.3s -> 300.8s openstacks-strips/p13.pddl 36.9s -> 300.8s openstacks-strips/p14.pddl 41.0s -> 300.8s openstacks-strips/p15.pddl 44.9s -> 300.8s openstacks-strips/p16.pddl 40.3s -> 300.8s openstacks-strips/p17.pddl 45.3s -> 301.0s openstacks-strips/p18.pddl 45.6s -> 302.2s openstacks-strips/p19.pddl 38.1s -> 301.2s openstacks-strips/p20.pddl 38.4s -> 301.7s openstacks-strips/p21.pddl 40.1s -> 301.7s openstacks-strips/p22.pddl 38.7s -> 301.8s openstacks-strips/p23.pddl 38.4s -> 301.8s openstacks-strips/p24.pddl 51.6s -> 309.5s openstacks-strips/p25.pddl 54.3s -> 308.9s openstacks-strips/p26.pddl 42.9s -> 305.2s openstacks-strips/p27.pddl 47.9s -> 312.2s openstacks-strips/p28.pddl 116.7s -> 357.8s openstacks-strips/p29.pddl 107.8s -> 352.1s openstacks-strips/p30.pddl 62.5s -> 321.3s ========================================================================

History
Date	User	Action	Args
2023-07-21 14:43:52	malte	set	status: chatting -> resolved messages: + msg11153
2014-10-04 19:57:57	malte	set	keyword: + translator
2013-12-03 17:43:06	jendrik	set	messages: + msg2757
2013-12-03 13:39:04	malte	set	messages: + msg2756
2013-12-03 13:27:16	jendrik	set	messages: + msg2755
2013-12-03 12:56:29	malte	set	messages: + msg2754
2013-12-03 12:05:39	jendrik	set	files: + analyze-invariant-generation-alldomains-eval.tar.gz messages: + msg2753
2013-12-02 17:44:53	gabi	set	messages: + msg2744
2013-12-02 14:27:35	malte	set	messages: + msg2741
2013-12-02 14:25:33	malte	set	messages: + msg2740
2013-12-01 12:27:44	malte	set	messages: + msg2738
2013-12-01 11:37:27	jendrik	set	files: + analyze-invariant-generation-eval.tar.gz messages: + msg2737
2013-11-30 18:06:17	malte	set	messages: + msg2736
2013-11-30 18:04:43	jendrik	set	messages: + msg2735
2013-11-23 19:59:42	malte	set	messages: + msg2723
2013-11-23 19:50:10	jendrik	set	messages: + msg2722
2013-11-23 19:44:09	malte	set	messages: + msg2721
2013-11-23 19:37:21	jendrik	set	messages: + msg2720
2013-11-22 21:35:03	malte	set	messages: + msg2719
2013-11-22 21:22:57	malte	create

Issue398