Issue 735: analyze dominance pruning - Fast Downward issue tracker

Title	analyze dominance pruning
Priority	wish	Status	resolved
Superseder		Nosy List	florian, jendrik, malte, silvan
Assigned To	jendrik	Keywords
Optional summary	After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) ~~Get rid of the hack mentioned in dominance_pruning.cc.~~

Created on 2017-09-01.15:29:33 by jendrik, last changed by malte.

After evaluating the current code:

1) Consider precomputing dominance of individual PDBs by the current
collection. => done in issue735-v3

2) Consider using int-based representations for PDB heuristics; see 
msg6514. 
=> issue755

3) Get rid of the hack mentioned in dominance_pruning.cc.

Files
File name	Uploaded	Type	Edit	Remove
hacky-version-of-dominance_pruning.cc	malte, 2017-09-02.02:24:57	text/x-c++src

Messages
msg6723 (view)	Author: malte	Date: 2017-12-05.12:10:04
Thanks a lot, Jendrik!
msg6717 (view)	Author: jendrik	Date: 2017-12-05.11:31:07
Merged.
msg6714 (view)	Author: malte	Date: 2017-12-05.10:14:56
Looks good!
msg6713 (view)	Author: jendrik	Date: 2017-12-05.10:09:41
The hack is gone. The new "Pruner" code looks good to me. If you're fine with my small changes, I think we can merge this.
msg6706 (view)	Author: jendrik	Date: 2017-12-05.08:59:57
I'll look into passing the number of values, now that we're happy with the experiment results.
msg6705 (view)	Author: malte	Date: 2017-12-05.01:10:49
Many thanks! When combining the data from msg6703 and msg6704, it is interesting to see that before this issue, dominance pruning for cpdbs(systematic(2)) was actually mildly detrimental for coverage. While it would be nicer not to lose coverage in any domain, I think the new code is fast enough to merge. Before we merge, we need to get rid of the hack, though. Anyone willing to look into this? It's "only" about passing the number of variables in the task into the pruning function, but I don't know how readily available this information is at the caller site. Once this is done, all that needs to be done is a code review (if someone wants to).
msg6704 (view)	Author: jendrik	Date: 2017-12-04.23:04:05
Here are the results evaluating the dominance_pruning option for v3: http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-no-pruning.html Coverage drops only in airport and parcprinter when enabling dominance pruning.
msg6703 (view)	Author: jendrik	Date: 2017-12-04.17:26:24
Here are the results: All revisions and configurations in single table (including debug build): http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3.html Comparing sequential revisions: http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-base-issue735-v1-compare.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-v1-issue735-v2-compare.html http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-v2-issue735-v3-compare.html Comparing base and v3: http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-base-issue735-v3-compare.html I started another experiment that compares dominance pruning vs. no dominance pruning for v3.
msg6699 (view)	Author: malte	Date: 2017-12-04.02:12:28
I've tagged the code from msg6697 as issue735-v2. I've also implemented the procomputation mentioned in msg6698 and tagged it as issue735-v3. It would be good to run some experiments comparing base, v1, v2 and v3, and also to do a run of v3 in debug mode. I've updated the summary.
msg6698 (view)	Author: malte	Date: 2017-12-04.01:45:27
Some further thoughts: 1) The new code still potentially recomputes dominance information for the same PDB over and over again, which is not necessary. For the case of cpdb(2), this probably won't matter (or might even be a good idea), but in cases where we have larger patterns, it might be useful to only compute once whether the current pattern collection dominates a given pattern. This should not be a difficult addition to the current code. 2) The new code converts the pattern collection from a pointer-based to an int-based representation. As mentioned in msg6514, this could be useful for computing the actual heuristic values, too. When we close this issue, we should not forget this aspect. I've added these to points to the summary.
msg6697 (view)	Author: malte	Date: 2017-12-03.19:36:41
I looked into the code and eventually found a nicer way to compute this, which I've pushed to Jendrik's repository. On miconic-s15-0, this improves dominance pruning from 22.66 seconds to 0.19 seconds on my machine compared to v1. There is currently one hack in the code that still needs to be removed: the new algorithm needs to know the number of state variables, but this information is currently not available. In the hack, I'm using the number of state variables in the global task, but of course this needs to be changed. The correct solution is for the caller of this code to pass the number of state variables in. Jendrik, can you look into removing the hack? I can also do it if you prefer, but I don't know how soon I'll find the time. Also, can you run an experiment with the new code and compare it to base and v1? There are a bunch of assertions that it would be good to exercise, so I suggest also running an experiment in debug mode. The new code should be more 64-bit-friendly because it uses vector<int>-based data structures in the inner loops rather than pointers.
msg6696 (view)	Author: malte	Date: 2017-12-03.15:31:29
I had a look at the code and left a few comments. The experimental results show that this is a huge improvement, but they also show that this reamins very time-critical, so more could be done.
msg6695 (view)	Author: malte	Date: 2017-12-03.15:01:39
Come to think about it, I think the performance issue here was not so much with 64-bit mode as such but with the new hash functions we want to use. So I'm not sure if a 64-bit experiment would show much. It would be interesting to have data with/without the new hash functions, but I'm not sure how easy that would be.
msg6694 (view)	Author: malte	Date: 2017-12-03.12:35:42
Thanks! Can you also run experiments without dominance pruning for comparison? Can you also run all three configurations in 64-bit mode, since this was what we were originally interested in when looking at this?
msg6693 (view)	Author: jendrik	Date: 2017-12-03.12:19:47
Here are the results: http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v1-issue735-base-issue735-v1-compare.html The results include some dominance-pruning-specific attributes. They all start with the "dominance_pruning_" prefix. "dominance_pruning_failed" is 1 if the PDB collection could be built and the maximum additive subsets could be computed but dominance pruning did not finish successfully. "pdb_collection_construction_time" includes both the time to construct the PDBs and comuting the maximum additive subsets.
msg6692 (view)	Author: jendrik	Date: 2017-12-02.23:44:02
You can find the pull request at https://bitbucket.org/jendrikseipp/downward/pull-requests/78 and experiments are running.
msg6691 (view)	Author: malte	Date: 2017-12-02.22:40:11
Please do. Eight seconds is still a long time, but much better than 55.
msg6690 (view)	Author: jendrik	Date: 2017-12-02.21:11:48
I just implemented option 1) and for woodworking-opt08:p22.pddl and systematic(2) the time for dominance pruning goes down from 55 to 8 seconds. Should I prepare a pull request for this version?
msg6539 (view)	Author: malte	Date: 2017-09-23.13:05:47
A while back, I wrote a little Python/pseudo-code version of the code just to understand what it implements. I'm including it here in case it's useful for further work on the issue. If I remember correctly (it's been a while), it is implemented essentially like the current code except that one function is split into two. # PDBCollection: vector<PDB> # PDBCollectionVector: vector<PDBCollection> # PDBRelation: hash_set<pair<PDB, PDB>> def compute_superset_relation(pdbs): return PDBRelation((pdb1, pdb2) for pdb1 in pdbs for pdb2 in pdbs if pdb1.is_superset_of(pdb2)) def is_pdb_dominated(subset_pdb, superset_pdbs, superset_relation): return any((pdb, subset_pdb) in superset_relation for pdb in superset_pdbs) def does_collection_dominate(supercollection, subcollection, superset_relation): return all(is_pdb_dominated(pdb, supercollection) for pdb in subcollection) def prune_dominated_subsets(pdbs, max_additive_subsets): # pdbs: PDBCollection # max_additive_subsets: PDBCollectionVector nondominated_subsets = [] subset_removed = set() superset_relation = computer_superset_relation(pdbs) for coll1 in max_additive_subsets: useful = True for coll2 in max_additive_subsets: if coll1 is coll2 or coll2 in subset_removed: continue if does_collection_dominate(coll2, coll1, superset_relation): useful = False break if useful: nondominated_subsets.append(coll1) else: subset_removed.add(coll1) # Omitted: compute union of PDBs in the remaining PDB collections. # Is this only used for diagnostic output? return nondominated_subsets
msg6530 (view)	Author: silvan	Date: 2017-09-06.17:47:09
It sounds like this should not wait too long, but if it can, I could work on it starting in roughly a month.
msg6514 (view)	Author: malte	Date: 2017-09-03.17:22:53
Something we do not need to consider for this issue, but is a related thought: Representing the canonical heuristic with int indices rather than PDB pointers might be interesting more generally because it may allow us to avoid hash tables in some other places, too. I haven't looked at the code, but I vaguely remember that some PDB-based configurations were among the ones that suffered more than others when switching to 64 bits. This is often related to the use of hash tables. (I just tried to find evidence for PDB heuristics doing poorly with 64 bits, but couldn't find much, so I may be misremembering this.)
msg6501 (view)	Author: malte	Date: 2017-09-02.02:28:16
Anyone interested in working on this? This would be nice because issue731 is blocked by this. No familiarity with the PDB code is necessary, the dominance pruning code only uses it in trivial ways. This is essentially an algorithm that works with a vector of vectors of vectors of ints. :-) In other words, a fun little self-contained task.
msg6500 (view)	Author: malte	Date: 2017-09-02.02:24:57
I've made a little test implementation of points 4 and 5 in my previous message, using a vector<vector<bool>> organized the other way around from the current code (i.e., it effectively encodes "subset", not "superset"). The implementation difficulty with this is that we need a way to map from PDBs to indexes in the vector, and in my little test I used an unordered_map<pointer, int> to do this for every subset check, which is almost pointless because it means we still have to do a hash table lookup whenever the old code did a hash table lookup (plus, of course, the new bit-vector lookup). But it's a smaller hash table (one entry per pattern, not one entry per subset relationship) and thus more likely to fit in the cache. I ran this on my machine on woodworking-opt08-strips #22 and recorded the time for dominance pruning: issue731-base: 107 seconds issue731-v4: 176 seconds test code: 122 seconds So we get back a substantial chunk of the time lost between issue731-base and issue731-v4, which was the aim. This is despite the hash lookups which I think should dominate overall runtime of the new code. I'd be surprised if the new code weren't substantially faster than issue731-base on this instance if we could avoid these lookups. Avoiding the lookups requires converting everything from pointers to indices, which is tedious, but not difficult and not likely to be too expensive (it is linear-time while the overall algorithm is quadratic). I think this kind of conversion comes up in quite a few places, especially now that we're planning to move to 64 bits and often like ints for being more compact than pointers. I wonder if it's worth thinking about some generic helper functions or classes for this kind of conversion (and perhaps also the conversion back). But I'm not sure if it can be made generic enough to be reusable while also being easy to implement and use. Perhaps a few simple helper functions could already go a long way. So this looks promising. The main open question for me is if there are other planning tasks with different characteristics regarding the number of patterns etc. where the current approach is viable in terms of memory but the two-dimensional bit vector is not (i.e., a task with lots of patterns and a very sparse superset relation). If yes, this would complicate things. The code is very ugly, was implemented in 5 minutes and contains no great insights, so I wasn't planning to push it anywhere. But I'm attaching it here just in case.
msg6497 (view)	Author: malte	Date: 2017-09-01.17:14:40
I have had a look at the code, and there are some opportunities for data structure and/or algorithmic changes: 1) One possibility is to precompute the union of variables included in each collection, then use this as a quick necessary criterion for dominance before doing the full test. collection1 can only dominate collection2 if it includes a (not necessarily strict) superset of variables. 2) The test whether a given single PDB is dominated by a given collection occurs as a "subtest" of the overall dominance test, and it could be precomputed. Currently, we likely redo this test lots of times. The question is whether this is worth the memory cost. 3) Depending on how we organize the loops, we might be working with the same fixed collection for a long time (for one iteration of the outer loop), and we might be able to organize the data so that information related to this collection is kept closely together in memory and perhaps even release it once we're done with this collection. 4) The superset relation data structure is currently a "flat" hash set of pairs. Using something like a hash table of hash tables, if organized the right way around, might be better for memory locality and more generally access speed. 5) We store the superset relation as a hash set of pairs of pointers. According to information in issue705, I would roughly estimate the memory requirements of such a hash set as (5 + sqrt(2)) * sizeof(void ) N if there are N pairs in the hash set. (sqrt(2) occurs as the geometric mean of 1 and 2 to account for the fact that the number of buckets can overshoot N by a factor of 1 to 2.) This is roughly 51 bytes per entry on a 64-bit build (half that on a 32-bit build, but let's look into the future). If we used a bit vector instead, we'd only need 1 bit per pair, but of course we would need this bit for every pair, not just the pairs where the superset relation holds. So if the probability that an arbitrary pattern is a superset of another is larger than 1/408 (51 * 8), we have a win for the bit vector, and it would certainly be much faster to access and have good memory locality if we can orient it the right way around. I have no good intuition how dense the hash set is in practice. I think it could be useful to measure this in an experiment. For this, we would need to count the number of entries in the superset relation and then divide this by the number of patterns squared (or perhaps to avoid very low numbers: divide the number of patterns squared by the number of entries in the superset relation).
msg6489 (view)	Author: jendrik	Date: 2017-09-01.15:29:33
We noticed in issue731 that dominance pruning can be a major bottleneck at least for systematic PDBs. Malte writes: "The fact that we have long running configurations that spend more than 90% of their time on dominance pruning suggests that we should look a bit more closely whether dominance pruning is always a good idea, whether we should set a time limit for it, etc. [...] We might be crippling our PDB-based planners in some domains. Woodworking with astar(cpdbs(systematic(2))) would make a good test domain for this."

History
Date	User	Action	Args
2017-12-05 12:10:04	malte	set	messages: + msg6723
2017-12-05 11:31:07	jendrik	set	status: reviewing -> resolved messages: + msg6717
2017-12-05 10:14:56	malte	set	messages: + msg6714
2017-12-05 10:09:41	jendrik	set	messages: + msg6713 summary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominance_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) ~~Get rid of the hack mentioned in dominance_pruning.cc.~~
2017-12-05 08:59:57	jendrik	set	messages: + msg6706
2017-12-05 01:10:49	malte	set	messages: + msg6705 summary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominance_pruning.cc.
2017-12-05 01:06:47	malte	set	messages: + msg6703
2017-12-05 00:58:10	malte	set	messages: - msg6703
2017-12-04 23:04:05	jendrik	set	messages: + msg6704
2017-12-04 17:26:24	jendrik	set	messages: + msg6703 summary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc.
2017-12-04 08:43:17	jendrik	set	summary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) Consider using int-based representations for PDB heuristics; see msg6514. 3) Get rid of the hack mentioned in dominated_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc.
2017-12-04 02:12:28	malte	set	messages: + msg6699 summary: After evaluating the current code: 1) Consider precomputing dominance of individual PDBs by the current collection. 2) Consider using int-based representations for PDB heuristics; see msg6514. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) Consider using int-based representations for PDB heuristics; see msg6514. 3) Get rid of the hack mentioned in dominated_pruning.cc.
2017-12-04 01:45:27	malte	set	messages: + msg6698 summary: After evaluating the current code: 1) Consider precomputing dominance of individual PDBs by the current collection. 2) Consider using int-based representations for PDB heuristics; see msg6514.
2017-12-03 19:36:41	malte	set	messages: + msg6697
2017-12-03 15:31:29	malte	set	messages: + msg6696
2017-12-03 15:01:39	malte	set	messages: + msg6695
2017-12-03 12:35:42	malte	set	messages: + msg6694
2017-12-03 12:19:47	jendrik	set	messages: + msg6693
2017-12-02 23:44:02	jendrik	set	status: chatting -> reviewing messages: + msg6692
2017-12-02 22:40:11	malte	set	messages: + msg6691
2017-12-02 21:11:48	jendrik	set	assignedto: jendrik messages: + msg6690
2017-09-23 13:05:47	malte	set	messages: + msg6539
2017-09-06 17:47:09	silvan	set	messages: + msg6530
2017-09-03 17:22:53	malte	set	messages: + msg6514
2017-09-02 02:28:16	malte	set	messages: + msg6501
2017-09-02 02:24:57	malte	set	files: + hacky-version-of-dominance_pruning.cc messages: + msg6500
2017-09-01 23:45:44	florian	set	nosy: + florian
2017-09-01 17:14:40	malte	set	status: unread -> chatting messages: + msg6497
2017-09-01 15:30:16	silvan	set	nosy: + silvan
2017-09-01 15:29:33	jendrik	create

Issue735