Title PDB dominance pruning crashes for empty patterns
Priority bug Status reviewing
Superseder Nosy List jendrik, malte, silvan
Assigned To silvan Keywords
Optional summary

Created on 2018-05-04.16:11:53 by jendrik, last changed by silvan.

File name Uploaded Type Edit Remove silvan, 2018-06-04.14:00:22 text/x-python
msg7247 (view) Author: silvan Date: 2018-06-08.16:33:33
Apparently, yes.
msg7246 (view) Author: malte Date: 2018-06-08.16:32:36
> I changed the validate_and_normalize functions to be called from the manual
> generators. They now also throw a user input error if the pattern is empty or
> if the pattern collection contains an empty pattern.

I would not do that: there is nothing formally wrong with empty patterns, and I
can see users wanting to use them, e.g. in experiments where you try out
patterns with 0..N variables. It may make sense for us to normalize them away in
pattern collections, but I don't think they should be illegal. (Note that the
main point for this issue is that we don't want duplicates and empty patterns in
the additive subsets, which is slightly different from not having them in
pattern collections. But I suppose filtering them away in pattern collections
makes sense, too.)

More generally, looks like this needs some more offline discussion of what we want?
msg7245 (view) Author: silvan Date: 2018-06-08.16:25:25
I changed the validate_and_normalize functions to be called from the manual
generators. They now also throw a user input error if the pattern is empty or if
the pattern collection contains an empty pattern. They sort in the input for
removing duplicates, in which case they only print a warning.

I also added functions that check whether a given pattern or pattern collection
is valid, which means that it is non-empty (and for collections, does not
contain empty patterns) and that it is unique, but not necessarily sorted.
(There is too much overlap in the code, this is so far currently intended for
testing and discussing what we need).

Specifying empty patterns/pattern collection is now not possible anymore.
However, as Malte wrote in his last comment, it is not clear what exactly to
demand for the normalization. I think demanding non-empty patterns is
reasonable, but the greedy pattern generator with max_states set to 1 does not
satisfy this and the genetic algorithm with sizes set to 1 produces an empty
collection. In contrast, the hillclimbing generator with pdb_max_size and
collection_max_size set to 1 still generates the pattern collection consisting
of a singleton pattern for each goal variable, and thus satisfies these
normalization conditions. 

Any thoughts on what we should require and how we make sure that all generators
satisfy the requirements?
msg7224 (view) Author: malte Date: 2018-06-06.17:57:28
I see.

We currently call validate_and_normalize_patterns from the
PatternCollectionInformation constructor, and I'm not sure this is the best
place. As the presence of user output shows, this is clearly a method targeted
of validation of user-level input rather than something intended to clean up
internally generated pattern collections. If internally generated pattern
collections triggered a warning about duplicates etc., I'd view this as a bug.

So I would do this validation/normalization as part of the user input processing
before generating the PatternCollectionInformation, and not have it as part of
the code paths for internally generated collections.

If our internally generated collections currently violate some normalization
assumptions and if hence we also need some form of normalization of internally
generated collections, that would be good to know. Things like hill-climbing
patterns and systematic patterns are perhaps not in sorted order, but should
satisfy the other normalization requirements (all variables in bounds, no
duplicates, no empty patterns).

So: it's probably not a good idea to use the same code for cleaning up
user-provided patterns and internally generated patterns. The
PatternCollectionInformation constructor should perhaps not normalize patterns
itself, but rather expect that they already satisfy some form of normalization
(which one exactly we might need to think about a bit more). It would make sense
for it to assert this normalization, similar to the current
"information_is_valid" code.
msg7222 (view) Author: silvan Date: 2018-06-06.17:39:25
I just found a complication which may be a reason of why so far we did not touch
the given patterns: when later on adding pdbs and/or max additive subsets to the
PatternCollectionInformation from outside, these should be consistent with the
pattern collection given at construction time. When modifying the given pattern
collection, especially reordering or removing patterns, we would need
communicate this change to the outside to that the set of pdbs could be
reordered/modified in the same way.
msg7221 (view) Author: malte Date: 2018-06-06.17:35:17
We discussed that a clean and consistent way to do this is to normalize pattern
collections on input, similar to the way that we already normalize individual

Currently we do a "sort-of" normalization where we warn about certain
abnormalities, but don't modify the collection. It would be more consistent with
pattern normalization and easier for the downstream code to just normalize.

In addition to what the normalization check already does, normalization should
remove empty patterns from pattern collections. Then the dominance-testing code
can assert that patterns are never empty. We should also document what
guarantees normalization offers, and in the code that relies on these
assumptions (e.g. asserting we don't have an empty pattern), it may be useful to
add a pointer to the code that ensures this guarantee, so that the next time
around it is clear why we can make this assumption.
msg7209 (view) Author: malte Date: 2018-06-05.19:31:29
> I suggest we briefly discuss this live tomorrow.

(If you think that makes sense, please remind me tomorrow.)
msg7208 (view) Author: malte Date: 2018-06-05.19:30:44
I thought about this some more: empty patterns break a strict partial ordering
property we had before. If we disallow empty patterns, for families of additive
patterns C and C', we have that "C dominates C' and C' dominates C" iff "C =
C'", which is a useful property to have. Among other things, it means that a
family can be treated like a *set*  in the sense that what is pruned does not
depend on the order in which we process the additive sets. If empty patterns are
present, we don't have that any more. For example, with the two families [[4,
6], []] and [[4, 6]], which one survives depends on the order of processing
because either dominates the other, and if we're not careful, it could be easy
to end up in a situation where we prune both because both are dominated by
something else.

Hence I think it might be better to address the problem we have here by
filtering out empty patterns from additive sets somewhere. It may not be
necessary right now, but I think it could be a cleaner solution and could help
us avoid trouble later. I suggest we briefly discuss this live tomorrow.
msg7169 (view) Author: silvan Date: 2018-06-04.17:43:51
Changed it and also added one sentence that explains that we consider empty
patterns to be dominated by non-empty pattern collections.
msg7168 (view) Author: malte Date: 2018-06-04.17:35:51
Done with the review.
msg7165 (view) Author: silvan Date: 2018-06-04.15:25:02
Changing the dominance pruning code to consider empty patterns always to be
dominated by non-empty pattern collections fixes the problem for all tested
corner cases.
msg7160 (view) Author: malte Date: 2018-06-04.14:12:22
The test cases look good to me.
msg7156 (view) Author: silvan Date: 2018-06-04.14:00:08
I wrote a small test script (attached) and tested:

The single pdb heuristic (I suppose this is what you meant by "manual
algorithm") with empty patterns does not fail, so the first two configs succeed.

The CPDB heuristic with an empty collection also works, but it doesn't work as
soon as one pattern of the collection (configs #4 and #5).

Did I miss more corner cases?

Otherwise, I'll look into fixing the pruning code.
msg7119 (view) Author: malte Date: 2018-05-07.14:00:21
Changed the title. Does someone want to write a test that tests these and other
corner cases? (Corner cases I would test: empty patterns with the manual
algorithm and with the greedy algorithm; manual pattern collections that are
empty, that consist only of the empty pattern, and that consist of the empty
pattern and another one; ipdb with parameters set so low that it cannot perform
an iteration.) The tests should be run in debug mode, and I suppose it's enough
to test one or a few tasks.

I can look into fixing the pruning code if nobody else does it first.
msg7118 (view) Author: silvan Date: 2018-05-07.13:26:02
I also think rather the heuristics (or other users) of pattern (collection)
generators should complain if they get an empty pattern and they can't handle it.
msg7117 (view) Author: malte Date: 2018-05-07.13:19:23
I wouldn't say the bug is with the genetic algorithm. The "manual" and "greedy"
pattern generators can also generate empty patterns. I think the bug is that
they are not correctly processed. Other thoughts? If not, we should rename this
msg7112 (view) Author: jendrik Date: 2018-05-04.16:11:53
The call 

./ && ./ ../benchmarks/airport/p01-airport1-p1.pddl --search "astar(cpdbs(genetic()))"

results in a segmentation fault when the dominance pruning code tries to access the first variable in an empty pattern 
(this can be seen in debug mode).
Date User Action Args
2018-06-08 16:33:33silvansetmessages: + msg7247
2018-06-08 16:32:36maltesetmessages: + msg7246
2018-06-08 16:25:25silvansetmessages: + msg7245
2018-06-06 17:57:28maltesetmessages: + msg7224
2018-06-06 17:39:25silvansetmessages: + msg7222
2018-06-06 17:35:17maltesetmessages: + msg7221
2018-06-05 19:31:29maltesetmessages: + msg7209
2018-06-05 19:30:44maltesetmessages: + msg7208
2018-06-04 17:43:51silvansetmessages: + msg7169
2018-06-04 17:35:51maltesetmessages: + msg7168
2018-06-04 15:25:02silvansetstatus: in-progress -> reviewing
messages: + msg7165
2018-06-04 14:12:22maltesetmessages: + msg7160
2018-06-04 14:00:22silvansetfiles: +
2018-06-04 14:00:08silvansetstatus: chatting -> in-progress
assignedto: silvan
messages: + msg7156
2018-05-07 14:00:21maltesetmessages: + msg7119
title: genetic algorithm generates empty patterns -> PDB dominance pruning crashes for empty patterns
2018-05-07 13:26:02silvansetmessages: + msg7118
2018-05-07 13:19:23maltesetstatus: unread -> chatting
messages: + msg7117
2018-05-04 18:49:55silvansetnosy: + silvan
2018-05-04 16:11:53jendrikcreate