msg10991 (view) |
Author: gabi |
Date: 2023-02-09.12:40:00 |
|
While working on some other issue, we discovered a source of non-determinism we should fix:
Invariants store their parts as frozenset and iterating over the elements is nondeterministic. I
observed this in Invariant.check_balance when collecting the actions_to_check. This causes the
actions to be considered in a different orders in the balance check. Since we refine the invariants
based on the first unbalanced action we encounter, this can overall lead to different final
invariants.
|
msg8453 (view) |
Author: malte |
Date: 2019-01-08.12:00:50 |
|
(Gabi and my previous message overlapped, hence the repetition.)
|
msg8452 (view) |
Author: malte |
Date: 2019-01-08.12:00:21 |
|
Gabi and I have talked on the phone regarding how to proceed with this issue. We
believe that the current situation is not satisfactory enough to close the
issue, so we will leave it open for now.
To summarize where we are now for the time we will continue working on this
issue, I think it would be good to better understand what causes the nondeterminism.
On the obvious level, the cause of nondeterminism is that certain facts and
actions are considered in an unpredictable order. But that was intentional
because for the algorithm as designed originally for STRIPS, the order should
not matter to the end result unless one of the limits is reached. The algorithm
was "complete" in a certain sense, or at least we thought it was.
This seems to be no longer true with the enhancements we have made since then
(some ADL support, filtering out of certain things based on the outcome of the
relaxed exploration). We can make it deterministic again by sorting actions and
preconditions, but I think that would only patch the problem but not really fix
it. It looks like one of the basic design assumptions of the algorithms is no
longer valid, and I think it would be useful to improve our understanding of the
algorithm if we want to address the nondeterminism in a principled way.
That, or replace the algorithm by another one, as mentioned earlier.
In any case, I don't think it's a problem to let this one rest a bit. It's not
causing any real harm except making some of our ADL experiments a bit less
predictable.
|
msg8451 (view) |
Author: gabi |
Date: 2019-01-08.11:55:07 |
|
Also there is no large pressure, I still would like to keep the issue open.
Unlike other known non-determinism, which is caused by fixed time and candidate
limits in the invariant synthesis, the problem here is inherent in the algorithm.
To fix it properly, we first need to better understand the problem. My educated
guess would be that it might be resolved by a stronger refinement method for
unbalanced actions.
|
msg8450 (view) |
Author: malte |
Date: 2019-01-08.11:52:40 |
|
Ah, yes, this must be it. The failure was for citycar-opt14-adl.
I assume the sat vs. opt domains are identical (or essentially identical;
invariant synthesis depends on the instance, but only in very small ways), so it
was probably just luck that we previously only saw nondeterminism for one of the
two test domains.
For the others: Gabi has added the domain to the blacklist.
|
msg8449 (view) |
Author: gabi |
Date: 2019-01-08.11:51:28 |
|
I checked this. The problem also occurs on citycar-opt14-adl. I removed it from
the tests.
|
msg8448 (view) |
Author: gabi |
Date: 2019-01-08.11:02:16 |
|
Maybe we have to remove all citycar domains. I only removed citycar-sat14-adl,
where the test failed earlier.
|
msg8447 (view) |
Author: gabi |
Date: 2019-01-08.10:59:51 |
|
I did the change directly in the default branch (dd9fbb86dff9, Thu, 20 Dec 2018
16:58:51 +0100). Otherwise, we had seen a daily email from the build bot.
|
msg8446 (view) |
Author: gabi |
Date: 2019-01-08.10:51:00 |
|
Malte, here is the plot you asked for in msg8408:
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue879-v1_time_invariants_scatter.png
|
msg8445 (view) |
Author: malte |
Date: 2019-01-08.10:42:58 |
|
I know, but that's not merged. :-)
|
msg8444 (view) |
Author: gabi |
Date: 2019-01-08.10:40:19 |
|
I already removed the domain from the tests (msg8405). I don't see that I will
have time to further work on this in the near future.
|
msg8441 (view) |
Author: malte |
Date: 2019-01-08.07:38:42 |
|
Hi all, are there plans to work on this in the next weeks? If not, I suggest we
remove the citycar domain from the translator tests for now.
|
msg8408 (view) |
Author: malte |
Date: 2018-12-21.00:59:35 |
|
Would it be possible to see the same kind of plot for invariant synthesis time
rather than overall translator runtime? I think there should already be an
attribute for this, but I don't know what it's called.
|
msg8405 (view) |
Author: gabi |
Date: 2018-12-20.17:05:13 |
|
I started from a fresh branch and did not include the sorting of threatening
actions but only the other changes. We still get slower on some domains but more
mildly than in the other versions:
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue879-v1_time_scatter.png
How should we proceed? For the moment, I removed the failing test (which would
still fail with these changes).
|
msg8403 (view) |
Author: malte |
Date: 2018-12-20.13:25:32 |
|
Fully agreed on the < and <= operators, of course. These are just wrong. I think
the reason why this worked before is that we never have invariant parts with the
same predicate anyway in a given collection. I don't mind sorting the initial
state, although it seems unrelated to this issue.
Regarding the other changes in invariants.py and invariant_finder.py, I'm less
convinced of the utility.
|
msg8402 (view) |
Author: florian |
Date: 2018-12-20.10:52:51 |
|
I see some small advantage in making changes that reduce the places where the
algorithm is non-deterministic even if we don't get rid of all causes.
Maybe we should keep some of the changes that don't affect the speed. I think
sorting the initial state didn't make a performance difference for example. I
would also merge the fix of the < and <= operators as they didn't define a total
order before. Basically, we could merge everything except for sorting the
actions to check.
|
msg8377 (view) |
Author: malte |
Date: 2018-12-18.12:34:23 |
|
Agreed.
The invariant synthesis already isn't deterministic (because of the invariant
synthesis timeout); we have seen this for a long time in trucks-strips. I'm not
sure if the diff we have reduces the number of tasks on which we get
non-deterministic behaviour or rather increases it, because we now also get
timeouts on openstacks-strips, which we didn't get before. (On the positive
side, it addresses the new non-determinism in citycar.)
If we add the speed disadvantage, I think I would rather stay with the current
implementation. We can make the buildbot green again by reducing the set of
tasks on which we test the translator. I wouldn't see this as major cheating. We
already had to reduce the benchmark suite for the translator test because of the
(overall translator) timeouts in some of the new IPC 2018 domains, and the
promise of nondetermism was already violated for trucks-strips.
So one easy way to address this for now would be to remove the failing task(s)
from the translator test suite, but keep a note to add it/them back later once
we have a better invariant synthesis algorithm.
|
msg8376 (view) |
Author: gabi |
Date: 2018-12-18.12:20:03 |
|
The question is what we do before trying these things, in particular to get rid
of the build bot failures.
The way I see it, we can either significantly slow down the translator for
openstacks-strips and trucks-strips or for the moment give up on a deterministic
translator behavior. I am not really happy with either "solution" but given the
Christmas break I don't see that anyone will implement alternative methods
within the next few days/weeks.
|
msg8375 (view) |
Author: malte |
Date: 2018-12-18.12:11:03 |
|
I was thinking of trying out Jussi's most recent first-order invariant synthesis
algorithm. In the case of grounded tasks, I think it is equivalent to his older
propositional algorithm (need to verify the details), so hopefully we can get
away with only using one algorithm for all tasks.
One problem is that we don't just need invariants. We need a finite-domain
variable representation that is based on these invariants. Our current algorithm
essentially generates them for free, at the cost of not being a very good
invariant synthesis algorithm.
This is essentially about finding cliques in the mutex graph, which is not
trivial to do for the very large graphs. I think Jussi had to solve a similar
problem in at least one of his papers, so maybe we can borrow ideas from there
or ask him how he addressed it. (He doesn't need a finite-domain encoding, but I
remember that he has at least one paper which is about encoding mutexes
compactly as cliques or semi-cliques because they can otherwise blow up the SAT
encoding too much.)
|
msg8374 (view) |
Author: jendrik |
Date: 2018-12-18.11:02:14 |
|
I like your suggestion.
|
msg8373 (view) |
Author: gabi |
Date: 2018-12-18.10:58:43 |
|
I tried a few more variants:
- first checking all actions whether they are too heavy and only testing balance
afterwards does not change the overall picture with the exception that we get
significantly slower in organic-synthesis-split and significantly faster in
trucks-strips.
- sorting actions based on the number of add and delete effects speeds up some
tasks and slows down others. There is no clear pattern and no large outliers. I
tested two versions, both consider actions with many add effects early, one
breaks ties preferring few delete effects, the other many.
At the end of the story, it are always the pre-grounded tasks that are causing
problems. So I propose that we merge the version where we just sort the action
by their index (= the version v2 Malte reviewed + his suggested changes).
In the long term we should consider to implement another invariant algorithm
(Jussi's algorithm for ground instances?) to special case pre-grounded tasks.
Opinions?
|
msg8339 (view) |
Author: florian |
Date: 2018-12-14.14:36:05 |
|
I'm not too familiar with the details but would it make sense to explicitly sort
actions by something like the number of threats or number of delete effects? It
seems like breaking out of the loop early is the main trick here, so if we have
an unbalanced action early in the order, this might already speed things up.
To keep things deterministic, just sorting by the number of threats is probably
not enough but we can always sort by index as a secondary criteria.
|
msg8336 (view) |
Author: gabi |
Date: 2018-12-14.14:21:36 |
|
Ups, my previous post unnosied Guillem. Re-adding him...
How should we proceed here? I tend to give up on openstacks-strips and use the
deterministic version with the sorted action order.
If we really want to mess around with the action order, it might make more sense
to do so with a broader perspective, shuffling them for example after normalization.
|
msg8335 (view) |
Author: gabi |
Date: 2018-12-14.14:17:13 |
|
The randomization tends to speed up the process. Here is a scatter plot that
compares the previous deterministic version (v2) to the (deterministically)
randomized version (v4):
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue879-v4_time_scatter.png
Overall, the randomized action ordering tends to have a positive impact on the
runtime.
Regarding the result of the computation, it again does not make a difference for
most of the domains.
Only on the citycar domain, we are now rather on the unlucky side and get more
variables (but extremely lucky on one trucks-strips task):
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue879-v4-issue879-base-issue879-v4-compare.html
Also with this randomized version, the runtime on openstacks-strips is worse
than with the current non-deterministic one. It is just not so bad.
|
msg8334 (view) |
Author: malte |
Date: 2018-12-14.14:13:03 |
|
Hi Guillem, we're aware of PYTHONHASHSEED, but relying on deterministic hashing
is strongly discouraged in Python. The translator tests explicitly set
PYTHONHASHSEED to "random" to make sure we don't rely on any specific hash
values accidentally.
In any case, I think in this issue we use hashing with the default hash value of
an object, which is its memory address, so even with non-randomized hashing,
hash values will differ because memory addresses will differ between runs.
|
msg8333 (view) |
Author: guillem |
Date: 2018-12-14.14:07:42 |
|
A bit late into the discussion, but a small aside: have you found exactly what is the cause of the non-determinism in those instances where this
300s time limit is not hit? For instance, why is the order of atoms in the initial state non-deterministic?
If this is already solved, ignore the following, I probably missed it in the discussion;
if it's not: the description of the whole issue would seem related to Python3's random salting of its hash function?
e.g. https://stackoverflow.com/a/27522708/10697400.
Could this be the cause? This has given me some strange non-determinism issues in the past, so just in case it helps.
|
msg8321 (view) |
Author: gabi |
Date: 2018-12-14.11:51:30 |
|
This does not count because the nightly test just considers small instances. ;-)
I will run some tests with randomized action orders, just to see the effect.
|
msg8320 (view) |
Author: malte |
Date: 2018-12-14.11:45:21 |
|
> I think we need the changes we made here to have a deterministic translator,
> so I would suggest to merge this anyway.
Note that Florian's numbers in msg8314 mean that we will still become
nondeterministic in openstacks-strips with this change because we hit the 300s
time limit (which is inherently unpredictable).
|
msg8318 (view) |
Author: gabi |
Date: 2018-12-14.11:24:59 |
|
This still cannot help (because if all predicates have arity 0, considering the
predicate name is equivalent to considering the atom).
Anyone having any idea how we should proceed here? I don't see how we can work
us out without doing silly (and wrong) things (such as reverse-ordering actions
if it works better on the benchmark set). Maybe we could shuffle the actions
(with a fixed seed to stay deterministic) to average the effects out.
I think we need the changes we made here to have a deterministic translator, so
I would suggest to merge this anyway.
|
msg8317 (view) |
Author: gabi |
Date: 2018-12-14.11:05:19 |
|
Ah, maybe I should have a coffee first. Of course in a ground task the
candidates are ground (because predicates always have arity 0).
|
msg8316 (view) |
Author: gabi |
Date: 2018-12-14.11:02:00 |
|
Hm, wrong thought. The invariant candidates that we generate are not ground, so
we cannot eliminate actions on this basis.
|
msg8315 (view) |
Author: gabi |
Date: 2018-12-14.10:51:19 |
|
Ah, excellent observation Florian!
The problem with the pre-grounded tasks is that we only consider the predicate
name when identifying potentially threatening actions. If for grounded actions,
we would consider the entire atom instead, we could decrease the number of
actions that we need to check significantly.
|
msg8314 (view) |
Author: florian |
Date: 2018-12-14.10:30:26 |
|
I think I have narrowed it down a bit: the function check_balance loops trough
actions that need to be checked but stops as soon as it finds an unbalanced of
too heavy action. Before making this deterministic, we were lucky with these
early exits from the loop.
Without sorting
Calls to check_balance: 100000
Actions to check: 31740999
Actions checked: 1153714 (3.634775%)
Time: 62 seconds
With sorting:
Calls to check_balance: 56224
Actions to check: 29503176
Actions checked: 5411207 (18.341100%)
Time: 300 seconds
Even tough we call check_balance less often when sorting, the number of actions
we potentially have to check is roughly the same. However, since we detect the
unbalance later, we need to check more actions before finding an unbalanced one
which increases the number of calls to operator_unbalanced.
I'm not sure why this is so systematic in openstacks-strips or if we can do
anything about it.
|
msg8311 (view) |
Author: florian |
Date: 2018-12-13.23:08:06 |
|
I tried around with sorting in different places and found that the sorting we
introduced in a3b64455 made a large difference
(https://bitbucket.org/roeger/downward/commits/a3b644556783f6652991987d70e878315c6ee57d).
With sorting the actions, we go through ~57000 candidates in 300 seconds in the
main loop of find_invariants. Without sorting, we go through ~10000 candidates
in ~65 seconds. The number of candidates changed, because without sorting we are
back to the non-determinism this issue addresses.
To test whether the slow-down comes from the effort of sorting or from a
secondary effect, I sorted the list to a dummy variable and then iterated over
the unsorted list. This took 69 seconds, so about 4 seconds for the actual
sorting but still far from the 300 seconds we got when using the sorted data in
the main loop. Somehow the order of actions influences which candidates we see
in which order and this influences the speed. However, the number of candidates
we check is not the important factor.
|
msg8305 (view) |
Author: malte |
Date: 2018-12-13.20:25:00 |
|
I have had a look and see nothing that immediately struck the eye, though when
sorting I guess it's always important to pay attention to the magic comparison
functions involved, which I did not do thoroughly.
Certainly, if we previously hashed by address (using the C-implemented default
implementation) and now sort with a user-defined comparison function, that can
easily slow down these operations by a factor of 10-100 just because of the
function call overhead. So if performance is problematic, profiling as Florian
already started doing seems the natural thing to do.
|
msg8303 (view) |
Author: florian |
Date: 2018-12-13.20:07:19 |
|
I profiled the translation of openstacks-strips/p08 with cprofile (output
attached). I'm not quite sure about the result: the overall process took around
300 seconds, and "sorted" shows up with a cumulative time of 300.205s. However,
following the functions with the most cumulative time goes down a path where I
don't see how sorted would play a role (pddl_to_sas > compute_groups >
get_groups > find_invariants > check_balance, operator_unbalanced >
add_effect_unbalanced > unbalanced_renamings > lhs_satisfiable > is_solvable).
>>> import pstats
>>> p = pstats.Stats('cprofile.log')
>>> p.sort_stats('cumulative').print_stats(15)
Thu Dec 13 19:47:13 2018 cprofile.log
510322622 function calls (508610275 primitive calls) in 301.092 seconds
Ordered by: cumulative time
List reduced from 912 to 15 due to restriction <15>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.009 0.009 301.092 301.092 ./translate.py:4(<module>)
1 0.003 0.003 301.053 301.053 ./translate.py:679(main)
1 0.001 0.001 300.626 300.626 ./translate.py:530(pddl_to_sas)
118569/22363 5.971 0.000 300.205 0.013 {sorted}
1 0.000 0.000 300.187 300.187 ./fact_groups.py:109(compute_groups)
1 0.000 0.000 300.184 300.184 ./invariant_finder.py:138(get_groups)
71 0.294 0.004 300.072 4.226
./invariant_finder.py:91(find_invariants)
37307 5.802 0.000 299.698 0.008 ./invariants.py:241(check_balance)
3234757 15.563 0.000 262.598 0.000
./invariants.py:280(operator_unbalanced)
3234757 23.980 0.000 235.433 0.000
./invariants.py:317(add_effect_unbalanced)
4375209 17.737 0.000 168.147 0.000
./invariants.py:350(unbalanced_renamings)
4375209 12.663 0.000 77.253 0.000 ./invariants.py:398(lhs_satisfiable)
7574397 11.079 0.000 39.761 0.000 ./constraints.py:154(is_solvable)
4376043 22.182 0.000 36.227 0.000
./invariants.py:55(ensure_conjunction_sat)
4375209 14.820 0.000 30.778 0.000 ./invariants.py:403(imply_del_effect)
|
msg8295 (view) |
Author: gabi |
Date: 2018-12-13.17:55:10 |
|
Getting back to the actual issue: the changes seem to have this resolved.
Experimental results are available from
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue879-v2-issue879-base-issue879-v2-compare.html
The changes only affect citycar (and trucks-strips) and there we are on the
lucky side, getting the larger variable domains.
A scatter plot of the translator runtime can be found at:
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue879-v2_time_scatter.png
There is a set of tasks with a much larger runtime (openstacks and trucks-strips).
Malte, can you have a look at the change?
|
msg8293 (view) |
Author: gabi |
Date: 2018-12-13.16:19:13 |
|
I think it is because we could equally refine the candidate for destroy_road but
our refinement method just does not see this:
destroy_road(?xy_initial: junction, ?xy_final: junction, ?r1: road)
Precondition:
Conjunction
Atom road_connect(?r1, ?xy_initial, ?xy_final)
Atom in_place(?r1)
Effects:
NegatedAtom in_place(?r1)
NegatedAtom road_connect(?r1, ?xy_initial, ?xy_final)
forall ?c1: car
if
Atom at_car_road(?c1, ?r1)
then
NegatedAtom at_car_road(?c1, ?r1)
forall ?c11: car
if
Atom at_car_road(?c11, ?r1)
then
Atom at_car_jun(?c11, ?xy_initial)
|
msg8292 (view) |
Author: malte |
Date: 2018-12-13.16:01:39 |
|
Because we can infer that the action is never applicable?
|
msg8291 (view) |
Author: gabi |
Date: 2018-12-13.15:56:35 |
|
Ok, now I think I understood what's going on. Here is an example:
Consider invariant candidate checking invariant candidate
{arrived 0 [1], starting 0 [1], at_car_jun 0 [1]}
It is unbalanced by two actions - destroy_road and move_car_out_road, whereas
the latter has a delete effect that allows refinement and the first one does not.
If move_car_out_road gets processed first, we refine the candidate to
enqueued {at_car_road 0 [1], arrived 0 [1], at_car_jun 0 [1], starting 0 [1]},
which is indeed an invariant.
If destroy_road is processed first, we cannot refine but directly refuse the
candidate.
Quiz question: Why can {at_car_road 0 [1], arrived 0 [1], at_car_jun 0 [1],
starting 0 [1]} be accepted as an invariant if a subset is irreparably unbalance
by action destroy_road?
|
msg8290 (view) |
Author: gabi |
Date: 2018-12-13.15:36:49 |
|
With different orders of processing the action, I am pretty sure that we did see
different sets of invariants. It's not clear to me, why.
I am not aware of any dominance check during the invariant synthesis.
Independent from this, I wonder whether we could save some time by first testing
all actions for being too heavy and only afterwards testing balance. Otherwise,
we might unnecessarily generate refined candidates that are doomed to fail the
heaviness test.
|
msg8289 (view) |
Author: malte |
Date: 2018-12-13.15:15:17 |
|
The nondeterministic order should not matter because the algorithm is a complete
breadth-first search. Even if the "hits" are found in a different order, they
should be the same set. (It matters if the algorithm is cut off early because of
the time or iteration limit, and we know this happens in trucks-strips. But that
doesn't seem to be what is happening in citycar?)
It looks like one of the invariants is dominated by another one in one of the
runs. Are we supposed to detect that dominance and remove the dominated one, and
do we somehow mess this up in one of the cases?
|
msg8286 (view) |
Author: gabi |
Date: 2018-12-13.14:38:15 |
|
Florian and I had a look into this and we are quite surprised that the test did
not fail previously. We found three sources of non-determinism:
- The order of atoms in the initial state was non-deterministic.
- The invariant detection processed action in a somewhat random order.
- The invariants were instantiated in a non-deterministic order.
Experiments are running. We have created a pull request at
https://bitbucket.org/roeger/downward/pull-requests/3/issue879/diff
|
msg8282 (view) |
Author: gabi |
Date: 2018-12-13.11:17:10 |
|
The nightly buildbot complained that we get different translator outputs with
python 2.7 and python3 but a closer inspection revealed that this is not due to
the python version but some general non-determinism.
With two runs of find_invariants.py with python3 on instance
citycar-opt14-adl/p2-2-2-1-2.pddl I found the following sets of invariants:
{starting 0 1}
{starting 1 [0]}
{starting 0 [1]}
{at_car_jun 0 [1], at_car_road 0 [1], starting 0 [1]}
{at_car_jun 0 [1], arrived 0 [1], at_car_road 0 [1], starting 0 [1]}
and
{starting 0 1}
{starting 1 [0]}
{starting 0 [1]}
{at_car_road 0 [1], starting 0 [1], at_car_jun 0 [1]}
In most runs we find the five invariants. I guess we need some intermediate
sorting at some place of the invariants synthesis.
|