Issue735

Title analyze dominance pruning
Priority wish Status resolved
Superseder Nosy List florian, jendrik, malte, silvan
Assigned To jendrik Keywords
Optional summary
After evaluating the current code:

1) ~~Consider precomputing dominance of individual PDBs by the current
collection.~~ => done in issue735-v3

2) ~~Consider using int-based representations for PDB heuristics; see 
msg6514.~~ 
=> issue755

3) ~~Get rid of the hack mentioned in dominance_pruning.cc.~~

Created on 2017-09-01.15:29:33 by jendrik, last changed by malte.

Summary
After evaluating the current code:

1) Consider precomputing dominance of individual PDBs by the current
collection. => done in issue735-v3

2) Consider using int-based representations for PDB heuristics; see 
msg6514. 
=> issue755

3) Get rid of the hack mentioned in dominance_pruning.cc.
Files
File name Uploaded Type Edit Remove
hacky-version-of-dominance_pruning.cc malte, 2017-09-02.02:24:57 text/x-c++src
Messages
msg6723 (view) Author: malte Date: 2017-12-05.12:10:04
Thanks a lot, Jendrik!
msg6717 (view) Author: jendrik Date: 2017-12-05.11:31:07
Merged.
msg6714 (view) Author: malte Date: 2017-12-05.10:14:56
Looks good!
msg6713 (view) Author: jendrik Date: 2017-12-05.10:09:41
The hack is gone. The new "Pruner" code looks good to me. If you're fine with my 
small changes, I think we can merge this.
msg6706 (view) Author: jendrik Date: 2017-12-05.08:59:57
I'll look into passing the number of values, now that we're happy with the 
experiment results.
msg6705 (view) Author: malte Date: 2017-12-05.01:10:49
Many thanks! When combining the data from msg6703 and msg6704, it is interesting
to see that before this issue, dominance pruning for cpdbs(systematic(2)) was
actually mildly detrimental for coverage.

While it would be nicer not to lose coverage in any domain, I think the new code
is fast enough to merge. Before we merge, we need to get rid of the hack,
though. Anyone willing to look into this? It's "only" about passing the number
of variables in the task into the pruning function, but I don't know how readily
available this information is at the caller site.

Once this is done, all that needs to be done is a code review (if someone wants to).
msg6704 (view) Author: jendrik Date: 2017-12-04.23:04:05
Here are the results evaluating the dominance_pruning option for v3:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-no-pruning.html

Coverage drops only in airport and parcprinter when enabling dominance pruning.
msg6703 (view) Author: jendrik Date: 2017-12-04.17:26:24
Here are the results:

All revisions and configurations in single table (including debug build):
http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3.html

Comparing sequential revisions:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-base-issue735-v1-compare.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-v1-issue735-v2-compare.html
http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-v2-issue735-v3-compare.html

Comparing base and v3:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v3-issue735-base-issue735-v3-compare.html

I started another experiment that compares dominance pruning vs. no dominance pruning for v3.
msg6699 (view) Author: malte Date: 2017-12-04.02:12:28
I've tagged the code from msg6697 as issue735-v2.

I've also implemented the procomputation mentioned in msg6698 and tagged it as
issue735-v3.

It would be good to run some experiments comparing base, v1, v2 and v3, and also
to do a run of v3 in debug mode.

I've updated the summary.
msg6698 (view) Author: malte Date: 2017-12-04.01:45:27
Some further thoughts:

1) The new code still potentially recomputes dominance information for the same
PDB over and over again, which is not necessary. For the case of cpdb(2), this
probably won't matter (or might even be a good idea), but in cases where we have
larger patterns, it might be useful to only compute once whether the current
pattern collection dominates a given pattern. This should not be a difficult
addition to the current code.

2) The new code converts the pattern collection from a pointer-based to an
int-based representation. As mentioned in msg6514, this could be useful for
computing the actual heuristic values, too. When we close this issue, we should
not forget this aspect.

I've added these to points to the summary.
msg6697 (view) Author: malte Date: 2017-12-03.19:36:41
I looked into the code and eventually found a nicer way to compute this, which
I've pushed to Jendrik's repository. On miconic-s15-0, this improves dominance
pruning from 22.66 seconds to 0.19 seconds on my machine compared to v1.

There is currently one hack in the code that still needs to be removed: the new
algorithm needs to know the number of state variables, but this information is
currently not available. In the hack, I'm using the number of state variables in
the global task, but of course this needs to be changed. The correct solution is
for the caller of this code to pass the number of state variables in.

Jendrik, can you look into removing the hack? I can also do it if you prefer,
but I don't know how soon I'll find the time.

Also, can you run an experiment with the new code and compare it to base and v1?
There are a bunch of assertions that it would be good to exercise, so I suggest
also running an experiment in debug mode.

The new code should be more 64-bit-friendly because it uses vector<int>-based
data structures in the inner loops rather than pointers.
msg6696 (view) Author: malte Date: 2017-12-03.15:31:29
I had a look at the code and left a few comments.

The experimental results show that this is a huge improvement, but they also
show that this reamins very time-critical, so more could be done.
msg6695 (view) Author: malte Date: 2017-12-03.15:01:39
Come to think about it, I think the performance issue here was not so much with
64-bit mode as such but with the new hash functions we want to use. So I'm not
sure if a 64-bit experiment would show much. It would be interesting to have
data with/without the new hash functions, but I'm not sure how easy that would be.
msg6694 (view) Author: malte Date: 2017-12-03.12:35:42
Thanks! Can you also run experiments without dominance pruning for comparison?

Can you also run all three configurations in 64-bit mode, since this was what we
were originally interested in when looking at this?
msg6693 (view) Author: jendrik Date: 2017-12-03.12:19:47
Here are the results:
http://ai.cs.unibas.ch/_tmp_files/seipp/issue735-v1-issue735-base-issue735-v1-compare.html

The results include some dominance-pruning-specific attributes. They all start with the 
"dominance_pruning_" prefix. 

"dominance_pruning_failed" is 1 if the PDB collection could be built and the maximum 
additive subsets could be computed but dominance pruning did not finish successfully.

"pdb_collection_construction_time" includes both the time to construct the PDBs and 
comuting the maximum additive subsets.
msg6692 (view) Author: jendrik Date: 2017-12-02.23:44:02
You can find the pull request at 
https://bitbucket.org/jendrikseipp/downward/pull-requests/78 and
experiments are running.
msg6691 (view) Author: malte Date: 2017-12-02.22:40:11
Please do. Eight seconds is still a long time, but much better than 55.
msg6690 (view) Author: jendrik Date: 2017-12-02.21:11:48
I just implemented option 1) and for woodworking-opt08:p22.pddl and systematic(2) 
the time for dominance pruning goes down from 55 to 8 seconds. Should I prepare a 
pull request for this version?
msg6539 (view) Author: malte Date: 2017-09-23.13:05:47
A while back, I wrote a little Python/pseudo-code version of the code just to
understand what it implements. I'm including it here in case it's useful for
further work on the issue. If I remember correctly (it's been a while), it is
implemented essentially like the current code except that one function is split
into two.

# PDBCollection: vector<PDB>
# PDBCollectionVector: vector<PDBCollection>
# PDBRelation: hash_set<pair<PDB, PDB>>

def compute_superset_relation(pdbs):
    return PDBRelation((pdb1, pdb2)
                       for pdb1 in pdbs for pdb2 in pdbs
                       if pdb1.is_superset_of(pdb2))

def is_pdb_dominated(subset_pdb, superset_pdbs, superset_relation):
    return any((pdb, subset_pdb) in superset_relation
               for pdb in superset_pdbs)

def does_collection_dominate(supercollection, subcollection, superset_relation):
    return all(is_pdb_dominated(pdb, supercollection)
               for pdb in subcollection)

def prune_dominated_subsets(pdbs, max_additive_subsets):
    # pdbs: PDBCollection
    # max_additive_subsets: PDBCollectionVector
    nondominated_subsets = []
    subset_removed = set()

    superset_relation = computer_superset_relation(pdbs)

    for coll1 in max_additive_subsets:
        useful = True
        for coll2 in max_additive_subsets:
            if coll1 is coll2 or coll2 in subset_removed:
                continue
            if does_collection_dominate(coll2, coll1, superset_relation):
                useful = False
                break
        if useful:
            nondominated_subsets.append(coll1)
        else:
            subset_removed.add(coll1)

    # Omitted: compute union of PDBs in the remaining PDB collections.
    # Is this only used for diagnostic output?

    return nondominated_subsets
msg6530 (view) Author: silvan Date: 2017-09-06.17:47:09
It sounds like this should not wait too long, but if it can, I could work on it
 starting in roughly a month.
msg6514 (view) Author: malte Date: 2017-09-03.17:22:53
Something we do not need to consider for this issue, but is a related thought: 

Representing the canonical heuristic with int indices rather than PDB pointers
might be interesting more generally because it may allow us to avoid hash tables
in some other places, too.

I haven't looked at the code, but I vaguely remember that some PDB-based
configurations were among the ones that suffered more than others when switching
to 64 bits. This is often related to the use of hash tables. (I just tried to
find evidence for PDB heuristics doing poorly with 64 bits, but couldn't find
much, so I may be misremembering this.)
msg6501 (view) Author: malte Date: 2017-09-02.02:28:16
Anyone interested in working on this?

This would be nice because issue731 is blocked by this. No familiarity with the
PDB code is necessary, the dominance pruning code only uses it in trivial ways.
This is essentially an algorithm that works with a vector of vectors of vectors
of ints. :-) In other words, a fun little self-contained task.
msg6500 (view) Author: malte Date: 2017-09-02.02:24:57
I've made a little test implementation of points 4 and 5 in my previous message,
using a vector<vector<bool>> organized the other way around from the current
code (i.e., it effectively encodes "subset", not "superset").

The implementation difficulty with this is that we need a way to map from PDBs
to indexes in the vector, and in my little test I used an unordered_map<pointer,
int> to do this for every subset check, which is almost pointless because it
means we still have to do a hash table lookup whenever the old code did a hash
table lookup (plus, of course, the new bit-vector lookup). But it's a smaller
hash table (one entry per pattern, not one entry per subset relationship) and
thus more likely to fit in the cache.

I ran this on my machine on woodworking-opt08-strips #22 and recorded the time
for dominance pruning:

issue731-base: 107 seconds
issue731-v4:   176 seconds
test code:     122 seconds

So we get back a substantial chunk of the time lost between issue731-base and
issue731-v4, which was the aim. This is despite the hash lookups which I think
should dominate overall runtime of the new code. I'd be surprised if the new
code weren't substantially faster than issue731-base on this instance if we
could avoid these lookups.

Avoiding the lookups requires converting everything from pointers to indices,
which is tedious, but not difficult and not likely to be too expensive (it is
linear-time while the overall algorithm is quadratic). I think this kind of
conversion comes up in quite a few places, especially now that we're planning to
move to 64 bits and often like ints for being more compact than pointers. I
wonder if it's worth thinking about some generic helper functions or classes for
this kind of conversion (and perhaps also the conversion back). But I'm not sure
if it can be made generic enough to be reusable while also being easy to
implement and use. Perhaps a few simple helper functions could already go a long
way.

So this looks promising. The main open question for me is if there are other
planning tasks with different characteristics regarding the number of patterns
etc. where the current approach is viable in terms of memory but the
two-dimensional bit vector is not (i.e., a task with lots of patterns and a very
sparse superset relation). If yes, this would complicate things.

The code is very ugly, was implemented in 5 minutes and contains no great
insights, so I wasn't planning to push it anywhere. But I'm attaching it here
just in case.
msg6497 (view) Author: malte Date: 2017-09-01.17:14:40
I have had a look at the code, and there are some opportunities for data
structure and/or algorithmic changes:

1) One possibility is to precompute the union of variables included in each
collection, then use this as a quick necessary criterion for dominance before
doing the full test. collection1 can only dominate collection2 if it includes a
(not necessarily strict) superset of variables.

2) The test whether a given single PDB is dominated by a given collection occurs
as a "subtest" of the overall dominance test, and it could be precomputed.
Currently, we likely redo this test lots of times. The question is whether this
is worth the memory cost.

3) Depending on how we organize the loops, we might be working with the same
*fixed* collection for a long time (for one iteration of the outer loop), and we
might be able to organize the data so that information related to this
collection is kept closely together in memory and perhaps even release it once
we're done with this collection.

4) The superset relation data structure is currently a "flat" hash set of pairs.
Using something like a hash table of hash tables, if organized the right way
around, might be better for memory locality and more generally access speed.

5) We store the superset relation as a hash set of pairs of pointers. According
to information in issue705, I would roughly estimate the memory requirements of
such a hash set as (5 + sqrt(2)) * sizeof(void *) * N if there are N pairs in
the hash set. (sqrt(2) occurs as the geometric mean of 1 and 2 to account for
the fact that the number of buckets can overshoot N by a factor of 1 to 2.)

This is roughly 51 bytes per entry on a 64-bit build (half that on a 32-bit
build, but let's look into the future).

If we used a bit vector instead, we'd only need 1 bit per pair, but of course we
would need this bit for *every* pair, not just the pairs where the superset
relation holds. So if the probability that an arbitrary pattern is a superset of
another is larger than 1/408 (51 * 8), we have a win for the bit vector, and it
would certainly be much faster to access and have good memory locality if we can
orient it the right way around.

I have no good intuition how dense the hash set is in practice. I think it could
be useful to measure this in an experiment. For this, we would need to count the
number of entries in the superset relation and then divide this by the number of
patterns squared (or perhaps to avoid very low numbers: divide the number of
patterns squared by the number of entries in the superset relation).
msg6489 (view) Author: jendrik Date: 2017-09-01.15:29:33
We noticed in issue731 that dominance pruning can be a major bottleneck at least 
for systematic PDBs.

Malte writes:

"The fact that we have long running configurations that spend more than 90%
of their time on dominance pruning suggests that we should look a bit more
closely whether dominance pruning is always a good idea, whether we should set a
time limit for it, etc. [...] We might be crippling our PDB-based planners in 
some domains. Woodworking with astar(cpdbs(systematic(2))) would make a good test 
domain for this."
History
Date User Action Args
2017-12-05 12:10:04maltesetmessages: + msg6723
2017-12-05 11:31:07jendriksetstatus: reviewing -> resolved
messages: + msg6717
2017-12-05 10:14:56maltesetmessages: + msg6714
2017-12-05 10:09:41jendriksetmessages: + msg6713
summary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominance_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) ~~Get rid of the hack mentioned in dominance_pruning.cc.~~
2017-12-05 08:59:57jendriksetmessages: + msg6706
2017-12-05 01:10:49maltesetmessages: + msg6705
summary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominance_pruning.cc.
2017-12-05 01:06:47maltesetmessages: + msg6703
2017-12-05 00:58:10maltesetmessages: - msg6703
2017-12-04 23:04:05jendriksetmessages: + msg6704
2017-12-04 17:26:24jendriksetmessages: + msg6703
summary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc.
2017-12-04 08:43:17jendriksetsummary: After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) Consider using int-based representations for PDB heuristics; see msg6514. 3) Get rid of the hack mentioned in dominated_pruning.cc. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) ~~Consider using int-based representations for PDB heuristics; see msg6514.~~ => issue755 3) Get rid of the hack mentioned in dominated_pruning.cc.
2017-12-04 02:12:28maltesetmessages: + msg6699
summary: After evaluating the current code: 1) Consider precomputing dominance of individual PDBs by the current collection. 2) Consider using int-based representations for PDB heuristics; see msg6514. -> After evaluating the current code: 1) ~~Consider precomputing dominance of individual PDBs by the current collection.~~ => done in issue735-v3 2) Consider using int-based representations for PDB heuristics; see msg6514. 3) Get rid of the hack mentioned in dominated_pruning.cc.
2017-12-04 01:45:27maltesetmessages: + msg6698
summary: After evaluating the current code: 1) Consider precomputing dominance of individual PDBs by the current collection. 2) Consider using int-based representations for PDB heuristics; see msg6514.
2017-12-03 19:36:41maltesetmessages: + msg6697
2017-12-03 15:31:29maltesetmessages: + msg6696
2017-12-03 15:01:39maltesetmessages: + msg6695
2017-12-03 12:35:42maltesetmessages: + msg6694
2017-12-03 12:19:47jendriksetmessages: + msg6693
2017-12-02 23:44:02jendriksetstatus: chatting -> reviewing
messages: + msg6692
2017-12-02 22:40:11maltesetmessages: + msg6691
2017-12-02 21:11:48jendriksetassignedto: jendrik
messages: + msg6690
2017-09-23 13:05:47maltesetmessages: + msg6539
2017-09-06 17:47:09silvansetmessages: + msg6530
2017-09-03 17:22:53maltesetmessages: + msg6514
2017-09-02 02:28:16maltesetmessages: + msg6501
2017-09-02 02:24:57maltesetfiles: + hacky-version-of-dominance_pruning.cc
messages: + msg6500
2017-09-01 23:45:44floriansetnosy: + florian
2017-09-01 17:14:40maltesetstatus: unread -> chatting
messages: + msg6497
2017-09-01 15:30:16silvansetnosy: + silvan
2017-09-01 15:29:33jendrikcreate