Issue1063

Title Compute indirect subtype relationship more efficiently
Priority feature Status chatting
Superseder Nosy List jendrik, malte, mkatz
Assigned To Keywords
Optional summary

Created on 2022-09-02.21:26:42 by mkatz, last changed by mkatz.

Files
File name Uploaded Type Edit Remove
dfs-patch.diff malte, 2022-09-03.16:32:46 text/x-patch
examples.tar.gz mkatz, 2022-09-03.18:14:05 application/x-gzip
supertypes.patch mkatz, 2022-09-02.21:27:51 application/octet-stream
Messages
msg10837 (view) Author: mkatz Date: 2022-09-09.19:19:55
Thank you for the explanation of the source of the translator output difference in 
these two domains. I was not worried, since I have seen these domains having 
difference in another issue (1055).

One very minor comment on the patch, I think there is a typo in the comment, 
"clourse" -> "closure".
msg10836 (view) Author: malte Date: 2022-09-09.17:52:01
Thanks a lot, Michael! These results are very useful.

Regarding the IPC domains behaving abnormally:

- trucks-strips and citycar are known to have translator variation.

- For trucks-strips this is because the invariant synthesis reaches a timeout and takes whatever invariants
  were found until then. If different runs have slightly different speed, they will use different invariants,
  leading to variation.

- For citycar we are aware of the variation but as far as I know didn't investigate it. In any case, this
  is something we've observed before for identical translator code, so should not be a concern for this issue.

- Failures in some "hard to ground" IPC 2018 domains are also known.

The runtime results look good.

I ran the "deepest" example locally because it failed in the experiment with and without the patch. With the patch, the "parsing" stage completes after 21.210 seconds, but I had to kill the process at the "Generating Datalog..." stage when it reached 16 GB of memory usage. I don't know if this hints at another performance issue or if this task is simply too large. It has roughly 10,000 types, roughly 500,000,000 descendant type relationships and roughly 10000 objects, so perhaps it's OK to choke on it. I aborted the pre-patch version after 60 minutes, when it was still in the parsing stage. We can calculate that the inner loop of Warshall's algorithm will execute 1,000,000,000 times, and this won't be fast in Python.

As far as I am concerned, that's enough experimentation. If nobody speaks up in the next days, I'll work on merging it then.
msg10828 (view) Author: mkatz Date: 2022-09-06.23:44:55
I have run the experiment on my machines and it seems like there are only two 
domains where there is a difference: 
1. trucks-strips, where there are not types at all and
2. citycar-{opt,sat}14-adl
There are also some cases where translator fails (with and without the change), 
but I suppose that is normal. The full list can be found here: 
https://github.com/ctpelok77/downward/blob/issue1063-
v1/experiments/issue1063/different_output_sas.csv
The time comparison can be found here:
1. IPC tasks https://planning-results.github.io/issue1063/issue1063-v1-issue1063-
issue1063-v1-compare.html
2. the examples https://planning-results.github.io/issue1063/issue1063-v1-
issue1063-issue1063-examples-compare.html
msg10827 (view) Author: mkatz Date: 2022-09-03.18:14:05
The patch works on our example, thanks!

Also, attaching a new archive, with two more "deep" types: 
1. "deeper" with a chain of 1000 types
2. "deepest" with a chain of 10000 types.
The code uses tarski, with disabling to handling supertypes in their code to speed 
things up (otherwise it takes forever for chains of > 30 types), and even with 
that code disabled, generating the task with 10k types takes some time, so I don't 
think we can go significantly higher than 10k with tarski.
msg10826 (view) Author: malte Date: 2022-09-03.16:45:31
Updating issue title in preparation for the change log entry. Proposed change log entry:

- translator: Compute indirect subtype relationship more efficiently.
  <https://issues.fast-downward.org/issue1063>
  Use depth-first search instead of Warshall's algorithm to compute indirect subtypes.
  This can speed up the translator significantly on inputs with very many types.
  In an informal experiment, runtime improved from 12.88s to 1.80s and from 13.73s to 3.04s
  on two instances with ~500 types.
msg10825 (view) Author: malte Date: 2022-09-03.16:36:23
Can someone do a code review? (The new patch is dfs-patch.diff. You can apply it with "git apply dfs-patch.diff" from within the repository.)

Should we run tests?

This should not change translator output, so if we run tests, what I'd do is check that translator output remains the same (apart from the cases where we expect variation) and look at translator runtime. I expect no noticeable changes on the IPC suite because this will only kick in for very large numbers of types.
msg10824 (view) Author: malte Date: 2022-09-03.16:32:46
I'm attaching a new patch for this that performs a DFS from every node (type). In my tests it's slightly faster than Michael's code on the two examples and substantially faster than the current code, with translator time dropping from 12.88s to 1.80s on the shallow and 13.73s to 3.04s on the deep example. The implementation only changes one function plus some test code when using graph.py as the main file, so it is very local.

Michael, can you test this on your real example? Would this change work for you?
msg10823 (view) Author: malte Date: 2022-09-03.16:08:33
Thinking about it some more while working on the implementation, precomputing SCCs makes the implementation more complicated but buys us nothing in big-O complexity (unless I'm overlooking something) over doing a straight DFS from every node. I'll try that first. So we may end up not reusing any code after all.
msg10822 (view) Author: malte Date: 2022-09-03.14:57:47
Hi Michael, thanks!

What you mentioned in your last comment is what I was planning to do. We already have code in sccs.py that implement's Tarjan's SCC algorithm and returns the SCCs in topological sort order. 

his only returns the SCCs themselves, not the edges between them, so we need to recover them with a separate depth-first traversel of the components, but this should be simple enough given that they form a DAG. One complication is that we cannot implement the traversal recursively because Python limits recursion to around 1000 levels. I think your code will break if you make the "deep" example a bit larger (1000 types deep), perhaps you can test this to confirm.

Were the problems generated with a generator, and if yes, can you include larger ones with 1000 and 10000 types so that I can test scaling a bit more? The proposed implementation will include some overhead because of its two-stage nature but I'd prefer it because it would reuse building blocks. But I'd like to see a bit more clearly what the cost is.
msg10821 (view) Author: mkatz Date: 2022-09-02.21:26:42
The function set_supertypes() in the parser performs a transitive closure in order 
to compute the super-types of all types. The algorithm used is Warshall, with n^3 
complexity, which is not efficient in most cases. 
Note that a typical case is a DAG hierarchical structure, most often a tree. In 
such cases, a much more efficient algorithm is possible.

Attached is a patch that implements the computation for DAGs, as well as two 
example problems with a large amount of types in different tree structures (deep 
is a chain, shallow is two levels).

Some comments on the patch (from Malte verbatim):
-----------------------------------
Here's my issues with the patch in its current form:

1. It adds a new algorithm for computing the transitive closure, after
which the old one isn't used any more. But the old one still remains as
dead code.

2. It adds a second implementation of topological sorting to the
translator. It already has one in the variable ordering code.

3. For cases where the subtype digraph isn't a DAG, we currently get a
well-defined behaviour (basically, we allow it). The new code doesn't
have a well-defined behaviour. I think we should either allow it or
report it as an error, but not accept it silently with undefined behaviour.


Nobody defined the semantics of PDDL types, so we can only guess what
they are supposed to mean. ;-) (Unless someone gave a definition in the
meantime that I haven't seen, for example in the PDDL book. But I talked
to Patrik at some point and he said they explicitly wanted to stay out
of the discussion of precise semantics.)

The way that Fast Downward handles subtype relationships is by saying
that if X is a subtype of Y, then every X can be used as a Y, i.e., we
postulate this logical implication:

forall o: (X(o) -> Y(o))

This is all Fast Downward does, and this semantics doesn't require
acyclicity. But if you do have X and Y in an inheritance cycle, then
they are necessarily equivalent, and hence having both types in the
model is redundant. But redundant doesn't mean illegal.

I think we used to require in the code that you're only allowed to use a
type as a supertype if it has already been introduced, and then cycles
are impossible. But people complained about this because they wanted to
use types as supertypes before they were defined. Another related
discussion was whether something (acyclic) like the following should be
allowed:

(:types
  A - object
  B - object
  C - A
  C - B
)

We originally didn't allow it; we required a Java-like type structure,
where you always end up with a tree. But some people thought that this
isn't what the semantics should be, which triggered a discussion between
the intended semantics with a few people that cared about PDDL at the
time (like Derek) with no convergence on a commonly agreed definition. I
think that's how we ended up with the current implementation that simply
allows anything (if I recall correctly -- it can also be that we check
somewhere that there isn't a cycle and complain if there is one; my
memory is hazy).

-----------------------------------

From here onwards Michael again:

Since all types on a cycle are equivalent, we could first run Tarjan's strongly 
connected components algorithm, and then perform the computation on the condensed 
graph (which is DAG).
History
Date User Action Args
2022-09-09 19:19:55mkatzsetmessages: + msg10837
2022-09-09 17:52:01maltesetmessages: + msg10836
2022-09-06 23:44:55mkatzsetmessages: + msg10828
2022-09-03 18:14:21mkatzsetfiles: - examples.tar.gz
2022-09-03 18:14:05mkatzsetfiles: + examples.tar.gz
messages: + msg10827
2022-09-03 16:45:31maltesetmessages: + msg10826
title: Translator is slow when problems have a large amount of object types -> Compute indirect subtype relationship more efficiently
2022-09-03 16:36:23maltesetmessages: + msg10825
2022-09-03 16:32:46maltesetfiles: + dfs-patch.diff
messages: + msg10824
2022-09-03 16:08:33maltesetmessages: + msg10823
2022-09-03 14:57:47maltesetstatus: unread -> chatting
nosy: + malte, jendrik
messages: + msg10822
2022-09-02 21:29:11mkatzsetfiles: + examples.tar.gz
2022-09-02 21:27:51mkatzsetfiles: + supertypes.patch
2022-09-02 21:26:42mkatzcreate