Issue 544: non-deterministic behaviour when simplifying the set of unary operators

Title	non-deterministic behaviour when simplifying the set of unary operators
Priority	bug	Status	resolved
Superseder		Nosy List	jendrik, malte, manuel, salome
Assigned To	manuel	Keywords
Optional summary

Created on 2015-06-29.14:57:02 by manuel, last changed by manuel.

Messages
msg4652 (view)	Author: manuel	Date: 2015-10-13.09:25:56
Changes are merged. Unary operators are now sorted by operator_no, effect, cost, preconditions (lexicographical).
msg4532 (view)	Author: salome	Date: 2015-07-29.12:33:38
I made a table which shows the amount of unary operators (before pruning) for the sat suite: http://ai.cs.unibas.ch/_tmp_files/simon/unary_ops.html (the summarized value per domain is the maximum). As far as I have seen only two problems in the two scanalyzer domains (probably even the same problem) have more unary operators than sattellite #33: 1555200. But it seems that it is considerably less most of the time (in the order of 10'000-100'000). I agree that option B should be faster, and also that we should not introduce unique IDs for sorting. We could theoretically only compare preconditions and effects since this is the key in the map and thus guaranteed to be unique, but as you pointed out a comparison of the original operator id is done fast and will already give a result most of the time.
msg4525 (view)	Author: malte	Date: 2015-07-28.19:10:13
Salomé: you're right, I should have reacquainted myself better with the code. :-) The second for loop needs a deterministic iteration order for the algorithm to be deterministic. For what it's worth, Satellite #33 has roughly 1 million unary operators (of which a whopping 20 are pruned). Let's look at the two candidate algorithms we've considered: A) Use an unordered_map to perform the pruning, then sort the unary operators at the end. B) Use a map to perform the pruning which uses a comparison function that guarantee a canonical order. I expect that algorithm A) is much faster than B). Let's simplify a bit and assume that we have reasonable hash functions and that the "small steps" of the algorithm such as comparing keys are O(1). This is not quite true, but it's true if we assume a fixed upper bound on the number of preconditions of a unary operators, which is often true. Without the sort, algorithm A is O(n) for n operators. Adding a sort makes it O(n log n). Algorithm B is O(n log n) anyway, and the constant factor involved in manipulating a large map tends to be much worse than the one involved in sorting. (One way to view this is that Algorithm B, as part of its job, performs a sort of the unary operators, too: it sorts by inserting lots of things into a map and then extracting them into a vector. Inserting into a map and then extracting a vector is an O(n log n) sorting algorithm, but it tends to be much slower than std::sort in terms of absolute time.) BTW, unfortunately sorting by operator_no is not enough to give unary operators a deterministic order because every operator can spawn many unary operators (and the operator_no is not an ID for the unary operator but the original operator's number, unless I'm much mistaken). We would need a comparison function that also takes into account at least the effect and the preconditions. (operator_no + effect is enough if we don't have conditional effects, but not if we have them). Once we compare these three, we might as well compare all four things that define the operator (operator_no, precondition, effect, cost). I think it would be good to compare ID first because two unary operators will have different operator IDs 99% of the time, and then the test is done quickly. An alternative could be to give unary operators a unique numerical ID, too, but I'd much rather not. I'd be worried about the memory footprint; this is a hot data structure in heuristic computation, and keeping it small can be key to keeping the main data we need in the cache.
msg4508 (view)	Author: salome	Date: 2015-07-28.10:11:15
If I understand the code correctly, removing equivalent unary operators happens before dominance testing by inserting them in a map. But by creating the map we do not have a deterministic order in which the unary operators are iterated over for dominance testing (and pushing into the final unary operator vector). What we could do to get the final order of the unary operator deterministic is to sort the vector (by operator id) at the end. Then we could even use an unordered_map. However this might be very expensive, my small test problem has already over 5000 unary operators. If you are interested how many unary operators can occur in general I can see if I can create a nice table in lab :).
msg4494 (view)	Author: malte	Date: 2015-07-27.18:25:26
Yes, using the IDs to compare sounds good. It would be nice to use unordered_map instead of map, but I googled a little bit, and it seems that there are no guarantees of reproducible behaviour even when inserting the same elements in the same order etc. (But I could find no official word on this.) However, maybe that's a red herring: we don't necessarily need a container with a reproducible iteration order at all. IIRC, what we want to do here is a dominance test, and for a dominance testing algorithm, ambiguity can only arise if we have two elements that dominate each other (i.e., are equivalent). It should be possible to change the "dominance relation" so that two elements can never dominate each other, e.g. by comparing the ID of the two objects to decide which one dominates which other one if they are otherwise identical. Not sure how clear this is; if it's not clear, I can probably more easily elaborate in person than on the tracker. :-)
msg4493 (view)	Author: salome	Date: 2015-07-27.17:53:29
I am reopening this issues because the current Map implementation is not guaranteed to be deterministic. This is because the key for the map contains Pointers (more precisely the key is the following: pair<vector<Proposition >, Proposition >). Since we do not define a custom comparison function for the key it will use a standard comparison based on the pointers rather than the actual objects. My suggestion is to define a custom comparison function which works on the actual Propositions and not the pointers (Propositions have an id which should be deterministic, so we can just compare them). I was also wondering if we might even use an unordered_map if we define the hashing and equal function ourselves. However I am not sure that if given 1) the hashing and equal function are determinstic (and do not work with pointers) and 2) the insertion order is the same then the iterator will iterate over the map in the same order. Does anyone know more about this?
msg4321 (view)	Author: manuel	Date: 2015-07-03.16:20:33
The changes are merged.
msg4320 (view)	Author: malte	Date: 2015-07-03.15:15:19
Latest version looks good to me. I think this can be merged.
msg4311 (view)	Author: malte	Date: 2015-07-02.15:12:13
Sounds good! If you can give me a link to a pull request, I'll have a look at the code changes, and if they are fine, we can merge.
msg4310 (view)	Author: manuel	Date: 2015-07-02.14:46:04
I would recommend to replace the unordered_map with a map in a first step as even heuristic values of heuristics based on delete relaxations are not deterministic. It would help to verify the changes made in relation to issue108. In a later step we could evaluate if there is a more efficient but deterministic implementation that uses the unordered map.
msg4305 (view)	Author: malte	Date: 2015-07-02.01:17:14
Hi Manuel, what is your suggested resolution? To use map instead of unordered_map?
msg4299 (view)	Author: manuel	Date: 2015-07-01.12:46:10
Here are the general results of using the unordered_map vs. using the usual map. http://ai.cs.unibas.ch/_tmp_files/heusner/issue544-sat-v1-issue544-base-issue544-v1-compare.html The results are often worse for the usual map. In order to determine the variance among runs of a single instance, we run 20 times an instance from trucks-strips and 20 times an instance from freecell. http://ai.cs.unibas.ch/_tmp_files/heusner/issue544-regression-v1-issue544-base-issue544-v1-compare.html I think the huge variance in the results and the general dominance of the usual map (at least in these two instances) calls for a fix of this issue.
msg4284 (view)	Author: manuel	Date: 2015-06-29.14:57:02
FD is not deterministic when using preferred operators of a delete-relaxation heuristic. This behaviour is caused by iterating through an unordered_map during the process of simplifying the set of unary operators. We will replace the unordered_map by the slower but sorted map and empirically evaluate the consequences.

History
Date	User	Action	Args
2015-10-13 09:25:56	manuel	set	status: in-progress -> resolved messages: + msg4652
2015-07-29 12:33:38	salome	set	messages: + msg4532
2015-07-28 19:10:13	malte	set	messages: + msg4525
2015-07-28 10:11:15	salome	set	messages: + msg4508
2015-07-27 18:25:26	malte	set	messages: + msg4494
2015-07-27 17:53:29	salome	set	status: resolved -> in-progress messages: + msg4493
2015-07-03 16:34:33	malte	set	status: in-progress -> resolved
2015-07-03 16:20:33	manuel	set	messages: + msg4321
2015-07-03 15:15:19	malte	set	messages: + msg4320
2015-07-03 10:31:00	salome	set	nosy: + salome
2015-07-02 15:12:13	malte	set	messages: + msg4311
2015-07-02 14:46:04	manuel	set	messages: + msg4310
2015-07-02 01:17:14	malte	set	messages: + msg4305
2015-07-01 12:46:10	manuel	set	messages: + msg4299
2015-06-29 15:07:00	jendrik	set	nosy: + malte, jendrik
2015-06-29 14:57:02	manuel	create

Issue544