Title Non-deterministic invariant synthesis
Priority bug Status chatting
Superseder Nosy List florian, gabi, guillem, jendrik, malte
Assigned To Keywords translator
Optional summary

Created on 2018-12-13.11:17:10 by gabi, last changed by malte.

File name Uploaded Type Edit Remove
cprofile.log florian, 2018-12-13.20:07:19 text/x-log
msg8453 (view) Author: malte Date: 2019-01-08.12:00:50
(Gabi and my previous message overlapped, hence the repetition.)
msg8452 (view) Author: malte Date: 2019-01-08.12:00:21
Gabi and I have talked on the phone regarding how to proceed with this issue. We
believe that the current situation is not satisfactory enough to close the
issue, so we will leave it open for now.

To summarize where we are now for the time we will continue working on this
issue, I think it would be good to better understand what causes the nondeterminism.

On the obvious level, the cause of nondeterminism is that certain facts and
actions are considered in an unpredictable order. But that was intentional
because for the algorithm as designed originally for STRIPS, the order should
not matter to the end result unless one of the limits is reached. The algorithm
was "complete" in a certain sense, or at least we thought it was.

This seems to be no longer true with the enhancements we have made since then
(some ADL support, filtering out of certain things based on the outcome of the
relaxed exploration). We can make it deterministic again by sorting actions and
preconditions, but I think that would only patch the problem but not really fix
it. It looks like one of the basic design assumptions of the algorithms is no
longer valid, and I think it would be useful to improve our understanding of the
algorithm if we want to address the nondeterminism in a principled way.

That, or replace the algorithm by another one, as mentioned earlier.

In any case, I don't think it's a problem to let this one rest a bit. It's not
causing any real harm except making some of our ADL experiments a bit less
msg8451 (view) Author: gabi Date: 2019-01-08.11:55:07
Also there is no large pressure, I still would like to keep the issue open.

Unlike other known non-determinism, which is caused by fixed time and candidate
limits in the invariant synthesis, the problem here is inherent in the algorithm.

To fix it properly, we first need to better understand the problem. My educated
guess would be that it might be resolved by a stronger refinement method for
unbalanced actions.
msg8450 (view) Author: malte Date: 2019-01-08.11:52:40
Ah, yes, this must be it. The failure was for citycar-opt14-adl.

I assume the sat vs. opt domains are identical (or essentially identical;
invariant synthesis depends on the instance, but only in very small ways), so it
was probably just luck that we previously only saw nondeterminism for one of the
two test domains.

For the others: Gabi has added the domain to the blacklist.
msg8449 (view) Author: gabi Date: 2019-01-08.11:51:28
I checked this. The problem also occurs on citycar-opt14-adl. I removed it from
the tests.
msg8448 (view) Author: gabi Date: 2019-01-08.11:02:16
Maybe we have to remove all citycar domains. I only removed citycar-sat14-adl,
where the test failed earlier.
msg8447 (view) Author: gabi Date: 2019-01-08.10:59:51
I did the change directly in the default branch (dd9fbb86dff9, Thu, 20 Dec 2018
16:58:51 +0100). Otherwise, we had seen a daily email from the build bot.
msg8446 (view) Author: gabi Date: 2019-01-08.10:51:00
Malte, here is the plot you asked for in msg8408:
msg8445 (view) Author: malte Date: 2019-01-08.10:42:58
I know, but that's not merged. :-)
msg8444 (view) Author: gabi Date: 2019-01-08.10:40:19
I already removed the domain from the tests (msg8405). I don't see that I will
have time to further work on this in the near future.
msg8441 (view) Author: malte Date: 2019-01-08.07:38:42
Hi all, are there plans to work on this in the next weeks? If not, I suggest we
remove the citycar domain from the translator tests for now.
msg8408 (view) Author: malte Date: 2018-12-21.00:59:35
Would it be possible to see the same kind of plot for invariant synthesis time
rather than overall translator runtime? I think there should already be an
attribute for this, but I don't know what it's called.
msg8405 (view) Author: gabi Date: 2018-12-20.17:05:13
I started from a fresh branch and did not include the sorting of threatening
actions but only the other changes. We still get slower on some domains but more
mildly than in the other versions:

How should we proceed? For the moment, I removed the failing test (which would
still fail with these changes).
msg8403 (view) Author: malte Date: 2018-12-20.13:25:32
Fully agreed on the < and <= operators, of course. These are just wrong. I think
the reason why this worked before is that we never have invariant parts with the
same predicate anyway in a given collection. I don't mind sorting the initial
state, although it seems unrelated to this issue.

Regarding the other changes in and, I'm less
convinced of the utility.
msg8402 (view) Author: florian Date: 2018-12-20.10:52:51
I see some small advantage in making changes that reduce the places where the
algorithm is non-deterministic even if we don't get rid of all causes.

Maybe we should keep some of the changes that don't affect the speed. I think
sorting the initial state didn't make a performance difference for example. I
would also merge the fix of the < and <= operators as they didn't define a total
order before. Basically, we could merge everything except for sorting the
actions to check.
msg8377 (view) Author: malte Date: 2018-12-18.12:34:23

The invariant synthesis already isn't deterministic (because of the invariant
synthesis timeout); we have seen this for a long time in trucks-strips. I'm not
sure if the diff we have reduces the number of tasks on which we get
non-deterministic behaviour or rather increases it, because we now also get
timeouts on openstacks-strips, which we didn't get before. (On the positive
side, it addresses the new non-determinism in citycar.)

If we add the speed disadvantage, I think I would rather stay with the current
implementation. We can make the buildbot green again by reducing the set of
tasks on which we test the translator. I wouldn't see this as major cheating. We
already had to reduce the benchmark suite for the translator test because of the
(overall translator) timeouts in some of the new IPC 2018 domains, and the
promise of nondetermism was already violated for trucks-strips.

So one easy way to address this for now would be to remove the failing task(s)
from the translator test suite, but keep a note to add it/them back later once
we have a better invariant synthesis algorithm.
msg8376 (view) Author: gabi Date: 2018-12-18.12:20:03
The question is what we do before trying these things, in particular to get rid
of the build bot failures.

The way I see it, we can either  significantly slow down the translator for
openstacks-strips and trucks-strips or for the moment give up on a deterministic
translator behavior. I am not really happy with either "solution" but given the
Christmas break I don't see that anyone will implement alternative methods
within the next few days/weeks.
msg8375 (view) Author: malte Date: 2018-12-18.12:11:03
I was thinking of trying out Jussi's most recent first-order invariant synthesis
algorithm. In the case of grounded tasks, I think it is equivalent to his older
propositional algorithm (need to verify the details), so hopefully we can get
away with only using one algorithm for all tasks.

One problem is that we don't just need invariants. We need a finite-domain
variable representation that is based on these invariants. Our current algorithm
essentially generates them for free, at the cost of not being a very good
invariant synthesis algorithm.

This is essentially about finding cliques in the mutex graph, which is not
trivial to do for the very large graphs. I think Jussi had to solve a similar
problem in at least one of his papers, so maybe we can borrow ideas from there
or ask him how he addressed it. (He doesn't need a finite-domain encoding, but I
remember that he has at least one paper which is about encoding mutexes
compactly as cliques or semi-cliques because they can otherwise blow up the SAT
encoding too much.)
msg8374 (view) Author: jendrik Date: 2018-12-18.11:02:14
I like your suggestion.
msg8373 (view) Author: gabi Date: 2018-12-18.10:58:43
I tried a few more variants:

- first checking all actions whether they are too heavy and only testing balance
afterwards does not change the overall picture with the exception that we get
significantly slower in organic-synthesis-split and significantly faster in
- sorting actions based on the number of add and delete effects speeds up some
tasks and slows down others. There is no clear pattern and no large outliers. I
tested two versions, both consider actions with many add effects early, one
breaks ties preferring few delete effects, the other many.

At the end of the story, it are always the pre-grounded tasks that are causing
problems. So I propose that we merge the version where we just sort the action
by their index (= the version v2 Malte reviewed + his suggested changes).
In the long term we should consider to implement another invariant algorithm
(Jussi's algorithm for ground instances?) to special case pre-grounded tasks.

msg8339 (view) Author: florian Date: 2018-12-14.14:36:05
I'm not too familiar with the details but would it make sense to explicitly sort
actions by something like the number of threats or number of delete effects? It
seems like breaking out of the loop early is the main trick here, so if we have
an unbalanced action early in the order, this might already speed things up.

To keep things deterministic, just sorting by the number of threats is probably
not enough but we can always sort by index as a secondary criteria.
msg8336 (view) Author: gabi Date: 2018-12-14.14:21:36
Ups, my previous post unnosied Guillem. Re-adding him...

How should we proceed here? I tend to give up on openstacks-strips and use the
deterministic version with the sorted action order.

If we really want to mess around with the action order, it might make more sense
to do so with a broader perspective, shuffling them for example after normalization.
msg8335 (view) Author: gabi Date: 2018-12-14.14:17:13
The randomization tends to speed up the process. Here is a scatter plot that
compares the previous deterministic version (v2) to the (deterministically)
randomized version (v4):

Overall, the randomized action ordering tends to have a positive impact on the

Regarding the result of the computation, it again does not make a difference for
most of the domains.
Only on the citycar domain, we are now rather on the unlucky side and get more
variables (but extremely lucky on one trucks-strips task):
Also with this randomized version, the runtime on openstacks-strips is worse
than with the current non-deterministic one. It is just not so bad.
msg8334 (view) Author: malte Date: 2018-12-14.14:13:03
Hi Guillem, we're aware of PYTHONHASHSEED, but relying on deterministic hashing
is strongly discouraged in Python. The translator tests explicitly set
PYTHONHASHSEED to "random" to make sure we don't rely on any specific hash
values accidentally.

In any case, I think in this issue we use hashing with the default hash value of
an object, which is its memory address, so even with non-randomized hashing,
hash values will differ because memory addresses will differ between runs.
msg8333 (view) Author: guillem Date: 2018-12-14.14:07:42
A bit late into the discussion, but a small aside: have you found exactly what is the cause of the non-determinism in those instances where this 
300s time limit is not hit? For instance, why is the order of atoms in the initial state  non-deterministic?

If this is already solved, ignore the following, I probably missed it in the discussion;
if it's not: the description of the whole issue would seem related to Python3's random salting of its hash function?

Could this be the cause? This has given me some strange non-determinism issues in the past, so just in case it helps.
msg8321 (view) Author: gabi Date: 2018-12-14.11:51:30
This does not count because the nightly test just considers small instances. ;-)

I will run some tests with randomized action orders, just to see the effect.
msg8320 (view) Author: malte Date: 2018-12-14.11:45:21
> I think we need the changes we made here to have a deterministic translator,
> so I would suggest to merge this anyway.

Note that Florian's numbers in msg8314 mean that we will still become
nondeterministic in openstacks-strips with this change because we hit the 300s
time limit (which is inherently unpredictable).
msg8318 (view) Author: gabi Date: 2018-12-14.11:24:59
This still cannot help (because if all predicates have arity 0, considering the
predicate name is equivalent to considering the atom).

Anyone having any idea how we should proceed here? I don't see how we can work
us out without doing silly (and wrong) things (such as reverse-ordering actions
if it works better on the benchmark set). Maybe we could shuffle the actions
(with a fixed seed to stay deterministic) to average the effects out.

I think we need the changes we made here to have a deterministic translator, so
I would suggest to merge this anyway.
msg8317 (view) Author: gabi Date: 2018-12-14.11:05:19
Ah, maybe I should have a coffee first. Of course in a ground task the
candidates are ground (because predicates always have arity 0).
msg8316 (view) Author: gabi Date: 2018-12-14.11:02:00
Hm, wrong thought. The invariant candidates that we generate are not ground, so
we cannot eliminate actions on this basis.
msg8315 (view) Author: gabi Date: 2018-12-14.10:51:19
Ah, excellent observation Florian!

The problem with the pre-grounded tasks is that we only consider the predicate
name when identifying potentially threatening actions. If for grounded actions,
we would consider the entire atom instead, we could decrease the number of
actions that we need to check significantly.
msg8314 (view) Author: florian Date: 2018-12-14.10:30:26
I think I have narrowed it down a bit: the function check_balance loops trough
actions that need to be checked but stops as soon as it finds an unbalanced of
too heavy action. Before making this deterministic, we were lucky with these
early exits from the loop.

Without sorting
Calls to check_balance: 100000
Actions to check: 31740999
Actions checked:   1153714 (3.634775%)
Time: 62 seconds

With sorting:
Calls to check_balance: 56224
Actions to check: 29503176
Actions checked:   5411207 (18.341100%)
Time: 300 seconds

Even tough we call check_balance less often when sorting, the number of actions
we potentially have to check is roughly the same. However, since we detect the
unbalance later, we need to check more actions before finding an unbalanced one
which increases the number of calls to operator_unbalanced.

I'm not sure why this is so systematic in openstacks-strips or if we can do
anything about it.
msg8311 (view) Author: florian Date: 2018-12-13.23:08:06
I tried around with sorting in different places and found that the sorting we
introduced in a3b64455 made a large difference

With sorting the actions, we go through ~57000 candidates in 300 seconds in the
main loop of find_invariants. Without sorting, we go through ~10000 candidates
in ~65 seconds. The number of candidates changed, because without sorting we are
back to the non-determinism this issue addresses.

To test whether the slow-down comes from the effort of sorting or from a
secondary effect, I sorted the list to a dummy variable and then iterated over
the unsorted list. This took 69 seconds, so about 4 seconds for the actual
sorting but still far from the 300 seconds we got when using the sorted data in
the main loop. Somehow the order of actions influences which candidates we see
in which order and this influences the speed. However, the number of candidates
we check is not the important factor.
msg8305 (view) Author: malte Date: 2018-12-13.20:25:00
I have had a look and see nothing that immediately struck the eye, though when
sorting I guess it's always important to pay attention to the magic comparison
functions involved, which I did not do thoroughly.

Certainly, if we previously hashed by address (using the C-implemented default
implementation) and now sort with a user-defined comparison function, that can
easily slow down these operations by a factor of 10-100 just because of the
function call overhead. So if performance is problematic, profiling as Florian
already started doing seems the natural thing to do.
msg8303 (view) Author: florian Date: 2018-12-13.20:07:19
I profiled the translation of openstacks-strips/p08 with cprofile (output
attached). I'm not quite sure about the result: the overall process took around
300 seconds, and "sorted" shows up with a cumulative time of 300.205s. However,
following the functions with the most cumulative time goes down a path where I
don't see how sorted would play a role (pddl_to_sas > compute_groups >
get_groups > find_invariants > check_balance, operator_unbalanced >
add_effect_unbalanced > unbalanced_renamings > lhs_satisfiable > is_solvable).

>>> import pstats
>>> p = pstats.Stats('cprofile.log')
>>> p.sort_stats('cumulative').print_stats(15)
Thu Dec 13 19:47:13 2018    cprofile.log

         510322622 function calls (508610275 primitive calls) in 301.092 seconds

   Ordered by: cumulative time
   List reduced from 912 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.009    0.009  301.092  301.092 ./<module>)
        1    0.003    0.003  301.053  301.053 ./
        1    0.001    0.001  300.626  300.626 ./
118569/22363    5.971    0.000  300.205    0.013 {sorted}
        1    0.000    0.000  300.187  300.187 ./
        1    0.000    0.000  300.184  300.184 ./
       71    0.294    0.004  300.072    4.226
    37307    5.802    0.000  299.698    0.008 ./
  3234757   15.563    0.000  262.598    0.000
  3234757   23.980    0.000  235.433    0.000
  4375209   17.737    0.000  168.147    0.000
  4375209   12.663    0.000   77.253    0.000 ./
  7574397   11.079    0.000   39.761    0.000 ./
  4376043   22.182    0.000   36.227    0.000
  4375209   14.820    0.000   30.778    0.000 ./
msg8295 (view) Author: gabi Date: 2018-12-13.17:55:10
Getting back to the actual issue: the changes seem to have this resolved.

Experimental results are available from

The changes only affect citycar (and trucks-strips) and there we are on the
lucky side, getting the larger variable domains.

A scatter plot of the translator runtime can be found at:
There is a set of tasks with a much larger runtime (openstacks and trucks-strips).

Malte, can you have a look at the change?
msg8293 (view) Author: gabi Date: 2018-12-13.16:19:13
I think it is because we could equally refine the candidate for destroy_road but
our refinement method just does not see this:

destroy_road(?xy_initial: junction, ?xy_final: junction, ?r1: road)
    Atom road_connect(?r1, ?xy_initial, ?xy_final)
    Atom in_place(?r1)
  NegatedAtom in_place(?r1)
  NegatedAtom road_connect(?r1, ?xy_initial, ?xy_final)
  forall ?c1: car
      Atom at_car_road(?c1, ?r1)
      NegatedAtom at_car_road(?c1, ?r1)
  forall ?c11: car
      Atom at_car_road(?c11, ?r1)
      Atom at_car_jun(?c11, ?xy_initial)
msg8292 (view) Author: malte Date: 2018-12-13.16:01:39
Because we can infer that the action is never applicable?
msg8291 (view) Author: gabi Date: 2018-12-13.15:56:35
Ok, now I think I understood what's going on. Here is an example:

Consider invariant candidate checking invariant candidate
{arrived 0 [1], starting 0 [1], at_car_jun 0 [1]}

It is unbalanced by two actions - destroy_road and move_car_out_road, whereas
the latter has a delete effect that allows refinement and the first one does not.

If move_car_out_road gets processed first, we refine the candidate to 
enqueued {at_car_road 0 [1], arrived 0 [1], at_car_jun 0 [1], starting 0 [1]},
which is indeed an invariant.

If destroy_road is processed first, we cannot refine but directly refuse the

Quiz question: Why can {at_car_road 0 [1], arrived 0 [1], at_car_jun 0 [1],
starting 0 [1]} be accepted as an invariant if a subset is irreparably unbalance
by action destroy_road?
msg8290 (view) Author: gabi Date: 2018-12-13.15:36:49
With different orders of processing the action, I am pretty sure that we did see
different sets of invariants. It's not clear to me, why.

I am not aware of any dominance check during the invariant synthesis.

Independent from this, I wonder whether we could save some time by first testing
all actions for being too heavy and only afterwards testing balance. Otherwise,
we might unnecessarily generate refined candidates that are doomed to fail the
heaviness test.
msg8289 (view) Author: malte Date: 2018-12-13.15:15:17
The nondeterministic order should not matter because the algorithm is a complete
breadth-first search. Even if the "hits" are found in a different order, they
should be the same set. (It matters if the algorithm is cut off early because of
the time or iteration limit, and we know this happens in trucks-strips. But that
doesn't seem to be what is happening in citycar?)

It looks like one of the invariants is dominated by another one in one of the
runs. Are we supposed to detect that dominance and remove the dominated one, and
do we somehow mess this up in one of the cases?
msg8286 (view) Author: gabi Date: 2018-12-13.14:38:15
Florian and I had a look into this and we are quite surprised that the test did
not fail previously. We found three sources of non-determinism:

- The order of atoms in the initial state was non-deterministic.
- The invariant detection processed action in a somewhat random order.
- The invariants were instantiated in a non-deterministic order.

Experiments are running. We have created a pull request at
msg8282 (view) Author: gabi Date: 2018-12-13.11:17:10
The nightly buildbot complained that we get different translator outputs with
python 2.7 and python3 but a closer inspection revealed that this is not due to
the python version but some general non-determinism.

With two runs of with python3 on instance
citycar-opt14-adl/p2-2-2-1-2.pddl I found the following sets of invariants:

{starting 0 1}
{starting 1 [0]}
{starting 0 [1]}
{at_car_jun 0 [1], at_car_road 0 [1], starting 0 [1]}
{at_car_jun 0 [1], arrived 0 [1], at_car_road 0 [1], starting 0 [1]}


{starting 0 1}
{starting 1 [0]}
{starting 0 [1]}
{at_car_road 0 [1], starting 0 [1], at_car_jun 0 [1]}

In most runs we find the five invariants. I guess we need some intermediate
sorting at some place of the invariants synthesis.
Date User Action Args
2019-01-08 12:00:50maltesetmessages: + msg8453
2019-01-08 12:00:21maltesetmessages: + msg8452
2019-01-08 11:55:07gabisetmessages: + msg8451
2019-01-08 11:52:40maltesetmessages: + msg8450
2019-01-08 11:51:28gabisetmessages: + msg8449
2019-01-08 11:02:16gabisetmessages: + msg8448
2019-01-08 10:59:51gabisetmessages: + msg8447
2019-01-08 10:51:00gabisetmessages: + msg8446
2019-01-08 10:42:58maltesetmessages: + msg8445
2019-01-08 10:40:19gabisetmessages: + msg8444
2019-01-08 07:38:42maltesetmessages: + msg8441
2018-12-21 00:59:35maltesetmessages: + msg8408
2018-12-20 17:05:13gabisetmessages: + msg8405
2018-12-20 13:25:32maltesetmessages: + msg8403
2018-12-20 10:52:51floriansetmessages: + msg8402
2018-12-18 12:34:23maltesetmessages: + msg8377
2018-12-18 12:20:03gabisetmessages: + msg8376
2018-12-18 12:11:03maltesetmessages: + msg8375
2018-12-18 11:02:14jendriksetmessages: + msg8374
2018-12-18 10:58:43gabisetmessages: + msg8373
2018-12-14 14:36:05floriansetmessages: + msg8339
2018-12-14 14:21:36gabisetnosy: + guillem
messages: + msg8336
2018-12-14 14:17:13gabisetnosy: - guillem
messages: + msg8335
2018-12-14 14:13:03maltesetmessages: + msg8334
2018-12-14 14:07:42guillemsetnosy: + guillem
messages: + msg8333
2018-12-14 11:51:30gabisetmessages: + msg8321
2018-12-14 11:45:21maltesetmessages: + msg8320
2018-12-14 11:24:59gabisetmessages: + msg8318
2018-12-14 11:05:19gabisetmessages: + msg8317
2018-12-14 11:02:00gabisetmessages: + msg8316
2018-12-14 10:51:19gabisetmessages: + msg8315
2018-12-14 10:30:26floriansetmessages: + msg8314
2018-12-13 23:08:06floriansetmessages: + msg8311
2018-12-13 20:25:00maltesetmessages: + msg8305
2018-12-13 20:07:19floriansetfiles: + cprofile.log
messages: + msg8303
2018-12-13 17:55:10gabisetmessages: + msg8295
2018-12-13 16:19:13gabisetmessages: + msg8293
2018-12-13 16:01:39maltesetmessages: + msg8292
2018-12-13 15:56:35gabisetmessages: + msg8291
2018-12-13 15:36:49gabisetmessages: + msg8290
2018-12-13 15:15:17maltesetmessages: + msg8289
2018-12-13 14:38:15gabisetstatus: unread -> chatting
messages: + msg8286
2018-12-13 13:38:18floriansetnosy: + florian
2018-12-13 11:23:17jendriksetnosy: + malte, jendrik
2018-12-13 11:17:10gabicreate