Issue544

Title non-deterministic behaviour when simplifying the set of unary operators
Priority bug Status resolved
Superseder Nosy List jendrik, malte, manuel, salome
Assigned To manuel Keywords
Optional summary

Created on 2015-06-29.14:57:02 by manuel, last changed by manuel.

Messages
msg4652 (view) Author: manuel Date: 2015-10-13.09:25:56
Changes are merged. Unary operators are now sorted by operator_no, effect, cost,
preconditions (lexicographical).
msg4532 (view) Author: salome Date: 2015-07-29.12:33:38
I made a table which shows the amount of unary operators (before pruning) for
the sat suite: http://ai.cs.unibas.ch/_tmp_files/simon/unary_ops.html (the
summarized value per domain is the maximum). As far as I have seen only two
problems in the two scanalyzer domains (probably even the same problem) have
more unary operators than sattellite #33: 1555200. But it seems that it is
considerably less most of the time (in the order of 10'000-100'000).

I agree that option B should be faster, and also that we should not introduce
unique IDs for sorting. We could theoretically only compare preconditions and
effects since this is the key in the map and thus guaranteed to be unique, but
as you pointed out a comparison of the original operator id is done fast and
will already give a result most of the time.
msg4525 (view) Author: malte Date: 2015-07-28.19:10:13
Salomé: you're right, I should have reacquainted myself better with the code.
:-) The second for loop needs a deterministic iteration order for the algorithm
to be deterministic.

For what it's worth, Satellite #33 has roughly 1 million unary operators (of
which a whopping 20 are pruned).

Let's look at the two candidate algorithms we've considered:

A) Use an unordered_map to perform the pruning, then sort the unary operators at
the end.

B) Use a map to perform the pruning which uses a comparison function that
guarantee a canonical order.

I expect that algorithm A) is much faster than B). Let's simplify a bit and
assume that we have reasonable hash functions and that the "small steps" of the
algorithm such as comparing keys are O(1). This is not quite true, but it's true
if we assume a fixed upper bound on the number of preconditions of a unary
operators, which is often true.

Without the sort, algorithm A is O(n) for n operators. Adding a sort makes it
O(n log n).

Algorithm B is O(n log n) anyway, and the constant factor involved in
manipulating a large map tends to be much worse than the one involved in
sorting. (One way to view this is that Algorithm B, as part of its job, performs
a sort of the unary operators, too: it sorts by inserting lots of things into a
map and then extracting them into a vector. Inserting into a map and then
extracting a vector is an O(n log n) sorting algorithm, but it tends to be much
slower than std::sort in terms of absolute time.)


BTW, unfortunately sorting by operator_no is not enough to give unary operators
a deterministic order because every operator can spawn many unary operators (and
the operator_no is not an ID for the unary operator but the original operator's
number, unless I'm much mistaken).

We would need a comparison function that also takes into account at least the
effect and the preconditions. (operator_no + effect is enough if we don't have
conditional effects, but not if we have them). Once we compare these three, we
might as well compare all four things that define the operator (operator_no,
precondition, effect, cost).

I think it would be good to compare ID first because two unary operators will
have different operator IDs 99% of the time, and then the test is done quickly.

An alternative could be to give unary operators a unique numerical ID, too, but
I'd much rather not. I'd be worried about the memory footprint; this is a hot
data structure in heuristic computation, and keeping it small can be key to
keeping the main data we need in the cache.
msg4508 (view) Author: salome Date: 2015-07-28.10:11:15
If I understand the code correctly, removing equivalent unary operators happens
before dominance testing by inserting them in a map. But by creating the map we
do not have a deterministic order in which the unary operators are iterated over
for dominance testing (and pushing into the final unary operator vector).
What we could do to get the final order of the unary operator deterministic is
to sort the vector (by operator id) at the end. Then we could even use an
unordered_map. However this might be very expensive, my small test problem has
already over 5000 unary operators. If you are interested how many unary
operators can occur in general I can see if I can create a nice table in lab :).
msg4494 (view) Author: malte Date: 2015-07-27.18:25:26
Yes, using the IDs to compare sounds good.

It would be nice to use unordered_map instead of map, but I googled a little
bit, and it seems that there are no guarantees of reproducible behaviour even
when inserting the same elements in the same order etc. (But I could find no
official word on this.)

However, maybe that's a red herring: we don't necessarily *need* a container
with a reproducible iteration order at all. IIRC, what we want to do here is a
dominance test, and for a dominance testing algorithm, ambiguity can only arise
if we have two elements that dominate each other (i.e., are equivalent).

It should be possible to change the "dominance relation" so that two elements
can never dominate each other, e.g. by comparing the ID of the two objects to
decide which one dominates which other one if they are otherwise identical. Not
sure how clear this is; if it's not clear, I can probably more easily elaborate
in person than on the tracker. :-)
msg4493 (view) Author: salome Date: 2015-07-27.17:53:29
I am reopening this issues because the current Map implementation is not
guaranteed to be deterministic. This is because the key for the map contains
Pointers (more precisely the key is the following: pair<vector<Proposition *>,
Proposition *>). Since we do not define a custom comparison function for the key
it will use a standard comparison based on the pointers rather than the actual
objects.

My suggestion is to define a custom comparison function which works on the
actual Propositions and not the pointers (Propositions have an id which should
be deterministic, so we can just compare them).

I was also wondering if we might even use an unordered_map if we define the
hashing and equal function ourselves. However I am not sure that if given 
1) the hashing and equal function are determinstic (and do not work with
pointers) and 
2) the insertion order is the same
then the iterator will iterate over the map in the same order. Does anyone know
more about this?
msg4321 (view) Author: manuel Date: 2015-07-03.16:20:33
The changes are merged.
msg4320 (view) Author: malte Date: 2015-07-03.15:15:19
Latest version looks good to me. I think this can be merged.
msg4311 (view) Author: malte Date: 2015-07-02.15:12:13
Sounds good! If you can give me a link to a pull request, I'll have a look at
the code changes, and if they are fine, we can merge.
msg4310 (view) Author: manuel Date: 2015-07-02.14:46:04
I would recommend to replace the unordered_map with a map in a first step as
even heuristic values of heuristics based on delete relaxations are not
deterministic. It would help to verify the changes made in relation to issue108.

In a later step we could evaluate if there is a more efficient but deterministic
implementation that uses the unordered map.
msg4305 (view) Author: malte Date: 2015-07-02.01:17:14
Hi Manuel, what is your suggested resolution? To use map instead of unordered_map?
msg4299 (view) Author: manuel Date: 2015-07-01.12:46:10
Here are the general results of using the unordered_map vs. using the usual map.
http://ai.cs.unibas.ch/_tmp_files/heusner/issue544-sat-v1-issue544-base-issue544-v1-compare.html

The results are often worse for the usual map.

In order to determine the variance among runs of a single instance, we run 20
times an instance from trucks-strips and 20 times an instance from freecell.
http://ai.cs.unibas.ch/_tmp_files/heusner/issue544-regression-v1-issue544-base-issue544-v1-compare.html

I think the huge variance in the results and the general dominance of the usual
map (at least in these two instances) calls for a fix of this issue.
msg4284 (view) Author: manuel Date: 2015-06-29.14:57:02
FD is not deterministic when using preferred operators of a delete-relaxation
heuristic. This behaviour is caused by iterating through an unordered_map during
the process of simplifying the set of unary operators.

We will replace the unordered_map by the slower but sorted map and empirically
evaluate the consequences.
History
Date User Action Args
2015-10-13 09:25:56manuelsetstatus: in-progress -> resolved
messages: + msg4652
2015-07-29 12:33:38salomesetmessages: + msg4532
2015-07-28 19:10:13maltesetmessages: + msg4525
2015-07-28 10:11:15salomesetmessages: + msg4508
2015-07-27 18:25:26maltesetmessages: + msg4494
2015-07-27 17:53:29salomesetstatus: resolved -> in-progress
messages: + msg4493
2015-07-03 16:34:33maltesetstatus: in-progress -> resolved
2015-07-03 16:20:33manuelsetmessages: + msg4321
2015-07-03 15:15:19maltesetmessages: + msg4320
2015-07-03 10:31:00salomesetnosy: + salome
2015-07-02 15:12:13maltesetmessages: + msg4311
2015-07-02 14:46:04manuelsetmessages: + msg4310
2015-07-02 01:17:14maltesetmessages: + msg4305
2015-07-01 12:46:10manuelsetmessages: + msg4299
2015-06-29 15:07:00jendriksetnosy: + malte, jendrik
2015-06-29 14:57:02manuelcreate