Issue 1063: Compute indirect subtype relationship more efficiently

Title	Compute indirect subtype relationship more efficiently
Priority	feature	Status	chatting
Superseder		Nosy List	jendrik, malte, mkatz
Assigned To		Keywords
Optional summary

Created on 2022-09-02.21:26:42 by mkatz, last changed by mkatz.

Files
File name	Uploaded	Type	Edit	Remove
dfs-patch.diff	malte, 2022-09-03.16:32:46	text/x-patch
examples.tar.gz	mkatz, 2022-09-03.18:14:05	application/x-gzip
supertypes.patch	mkatz, 2022-09-02.21:27:51	application/octet-stream

Messages
msg10837 (view)	Author: mkatz	Date: 2022-09-09.19:19:55
Thank you for the explanation of the source of the translator output difference in these two domains. I was not worried, since I have seen these domains having difference in another issue (1055). One very minor comment on the patch, I think there is a typo in the comment, "clourse" -> "closure".
msg10836 (view)	Author: malte	Date: 2022-09-09.17:52:01
Thanks a lot, Michael! These results are very useful. Regarding the IPC domains behaving abnormally: - trucks-strips and citycar are known to have translator variation. - For trucks-strips this is because the invariant synthesis reaches a timeout and takes whatever invariants were found until then. If different runs have slightly different speed, they will use different invariants, leading to variation. - For citycar we are aware of the variation but as far as I know didn't investigate it. In any case, this is something we've observed before for identical translator code, so should not be a concern for this issue. - Failures in some "hard to ground" IPC 2018 domains are also known. The runtime results look good. I ran the "deepest" example locally because it failed in the experiment with and without the patch. With the patch, the "parsing" stage completes after 21.210 seconds, but I had to kill the process at the "Generating Datalog..." stage when it reached 16 GB of memory usage. I don't know if this hints at another performance issue or if this task is simply too large. It has roughly 10,000 types, roughly 500,000,000 descendant type relationships and roughly 10000 objects, so perhaps it's OK to choke on it. I aborted the pre-patch version after 60 minutes, when it was still in the parsing stage. We can calculate that the inner loop of Warshall's algorithm will execute 1,000,000,000 times, and this won't be fast in Python. As far as I am concerned, that's enough experimentation. If nobody speaks up in the next days, I'll work on merging it then.
msg10828 (view)	Author: mkatz	Date: 2022-09-06.23:44:55
I have run the experiment on my machines and it seems like there are only two domains where there is a difference: 1. trucks-strips, where there are not types at all and 2. citycar-{opt,sat}14-adl There are also some cases where translator fails (with and without the change), but I suppose that is normal. The full list can be found here: https://github.com/ctpelok77/downward/blob/issue1063- v1/experiments/issue1063/different_output_sas.csv The time comparison can be found here: 1. IPC tasks https://planning-results.github.io/issue1063/issue1063-v1-issue1063- issue1063-v1-compare.html 2. the examples https://planning-results.github.io/issue1063/issue1063-v1- issue1063-issue1063-examples-compare.html
msg10827 (view)	Author: mkatz	Date: 2022-09-03.18:14:05
The patch works on our example, thanks! Also, attaching a new archive, with two more "deep" types: 1. "deeper" with a chain of 1000 types 2. "deepest" with a chain of 10000 types. The code uses tarski, with disabling to handling supertypes in their code to speed things up (otherwise it takes forever for chains of > 30 types), and even with that code disabled, generating the task with 10k types takes some time, so I don't think we can go significantly higher than 10k with tarski.
msg10826 (view)	Author: malte	Date: 2022-09-03.16:45:31
Updating issue title in preparation for the change log entry. Proposed change log entry: - translator: Compute indirect subtype relationship more efficiently. <https://issues.fast-downward.org/issue1063> Use depth-first search instead of Warshall's algorithm to compute indirect subtypes. This can speed up the translator significantly on inputs with very many types. In an informal experiment, runtime improved from 12.88s to 1.80s and from 13.73s to 3.04s on two instances with ~500 types.
msg10825 (view)	Author: malte	Date: 2022-09-03.16:36:23
Can someone do a code review? (The new patch is dfs-patch.diff. You can apply it with "git apply dfs-patch.diff" from within the repository.) Should we run tests? This should not change translator output, so if we run tests, what I'd do is check that translator output remains the same (apart from the cases where we expect variation) and look at translator runtime. I expect no noticeable changes on the IPC suite because this will only kick in for very large numbers of types.
msg10824 (view)	Author: malte	Date: 2022-09-03.16:32:46
I'm attaching a new patch for this that performs a DFS from every node (type). In my tests it's slightly faster than Michael's code on the two examples and substantially faster than the current code, with translator time dropping from 12.88s to 1.80s on the shallow and 13.73s to 3.04s on the deep example. The implementation only changes one function plus some test code when using graph.py as the main file, so it is very local. Michael, can you test this on your real example? Would this change work for you?
msg10823 (view)	Author: malte	Date: 2022-09-03.16:08:33
Thinking about it some more while working on the implementation, precomputing SCCs makes the implementation more complicated but buys us nothing in big-O complexity (unless I'm overlooking something) over doing a straight DFS from every node. I'll try that first. So we may end up not reusing any code after all.
msg10822 (view)	Author: malte	Date: 2022-09-03.14:57:47
Hi Michael, thanks! What you mentioned in your last comment is what I was planning to do. We already have code in sccs.py that implement's Tarjan's SCC algorithm and returns the SCCs in topological sort order. his only returns the SCCs themselves, not the edges between them, so we need to recover them with a separate depth-first traversel of the components, but this should be simple enough given that they form a DAG. One complication is that we cannot implement the traversal recursively because Python limits recursion to around 1000 levels. I think your code will break if you make the "deep" example a bit larger (1000 types deep), perhaps you can test this to confirm. Were the problems generated with a generator, and if yes, can you include larger ones with 1000 and 10000 types so that I can test scaling a bit more? The proposed implementation will include some overhead because of its two-stage nature but I'd prefer it because it would reuse building blocks. But I'd like to see a bit more clearly what the cost is.
msg10821 (view)	Author: mkatz	Date: 2022-09-02.21:26:42
The function set_supertypes() in the parser performs a transitive closure in order to compute the super-types of all types. The algorithm used is Warshall, with n^3 complexity, which is not efficient in most cases. Note that a typical case is a DAG hierarchical structure, most often a tree. In such cases, a much more efficient algorithm is possible. Attached is a patch that implements the computation for DAGs, as well as two example problems with a large amount of types in different tree structures (deep is a chain, shallow is two levels). Some comments on the patch (from Malte verbatim): ----------------------------------- Here's my issues with the patch in its current form: 1. It adds a new algorithm for computing the transitive closure, after which the old one isn't used any more. But the old one still remains as dead code. 2. It adds a second implementation of topological sorting to the translator. It already has one in the variable ordering code. 3. For cases where the subtype digraph isn't a DAG, we currently get a well-defined behaviour (basically, we allow it). The new code doesn't have a well-defined behaviour. I think we should either allow it or report it as an error, but not accept it silently with undefined behaviour. Nobody defined the semantics of PDDL types, so we can only guess what they are supposed to mean. ;-) (Unless someone gave a definition in the meantime that I haven't seen, for example in the PDDL book. But I talked to Patrik at some point and he said they explicitly wanted to stay out of the discussion of precise semantics.) The way that Fast Downward handles subtype relationships is by saying that if X is a subtype of Y, then every X can be used as a Y, i.e., we postulate this logical implication: forall o: (X(o) -> Y(o)) This is all Fast Downward does, and this semantics doesn't require acyclicity. But if you do have X and Y in an inheritance cycle, then they are necessarily equivalent, and hence having both types in the model is redundant. But redundant doesn't mean illegal. I think we used to require in the code that you're only allowed to use a type as a supertype if it has already been introduced, and then cycles are impossible. But people complained about this because they wanted to use types as supertypes before they were defined. Another related discussion was whether something (acyclic) like the following should be allowed: (:types A - object B - object C - A C - B ) We originally didn't allow it; we required a Java-like type structure, where you always end up with a tree. But some people thought that this isn't what the semantics should be, which triggered a discussion between the intended semantics with a few people that cared about PDDL at the time (like Derek) with no convergence on a commonly agreed definition. I think that's how we ended up with the current implementation that simply allows anything (if I recall correctly -- it can also be that we check somewhere that there isn't a cycle and complain if there is one; my memory is hazy). ----------------------------------- From here onwards Michael again: Since all types on a cycle are equivalent, we could first run Tarjan's strongly connected components algorithm, and then perform the computation on the condensed graph (which is DAG).

History
Date	User	Action	Args
2022-09-09 19:19:55	mkatz	set	messages: + msg10837
2022-09-09 17:52:01	malte	set	messages: + msg10836
2022-09-06 23:44:55	mkatz	set	messages: + msg10828
2022-09-03 18:14:21	mkatz	set	files: - examples.tar.gz
2022-09-03 18:14:05	mkatz	set	files: + examples.tar.gz messages: + msg10827
2022-09-03 16:45:31	malte	set	messages: + msg10826 title: Translator is slow when problems have a large amount of object types -> Compute indirect subtype relationship more efficiently
2022-09-03 16:36:23	malte	set	messages: + msg10825
2022-09-03 16:32:46	malte	set	files: + dfs-patch.diff messages: + msg10824
2022-09-03 16:08:33	malte	set	messages: + msg10823
2022-09-03 14:57:47	malte	set	status: unread -> chatting nosy: + malte, jendrik messages: + msg10822
2022-09-02 21:29:11	mkatz	set	files: + examples.tar.gz
2022-09-02 21:27:51	mkatz	set	files: + supertypes.patch
2022-09-02 21:26:42	mkatz	create

Issue1063