Issue908

Title Change MaxAdditiveSubsets to use indices into the pattern/PDB collection rather than shared pointers to PDBs
Priority wish Status resolved
Superseder Nosy List florian, jendrik, malte, silvan
Assigned To silvan Keywords
Optional summary

Created on 2019-03-12.15:31:32 by silvan, last changed by silvan.

Messages
msg8733 (view) Author: silvan Date: 2019-03-21.16:40:36
Done (adding the fix, not a bug, I hope).
msg8730 (view) Author: silvan Date: 2019-03-21.16:33:04
Since I already fixed a more subtle bug today, I feel allowed to add another
small one ;-)
msg8729 (view) Author: malte Date: 2019-03-21.16:32:20
Sure, if we don't introduce a bug. :-)
msg8728 (view) Author: silvan Date: 2019-03-21.16:31:41
They seem small and clear enough so that I could do them in one commit on the
default branch, if you don't mind.
msg8727 (view) Author: malte Date: 2019-03-21.16:28:22
> Ah, I thought this was already done since after you posted that message, you
> left 3-4 comments on bitbucket 

Those were ones I could do straight away, whereas I thought for the code review
I'd have to set aside a larger chunk of time.

But I had a quick look now and noticed that the changes are fairly localized, in
which case I don't really have anything to add except for the cosmetic comments
I made. I would have wanted to look at it in more detail if there had been more
structural changes. So no further comments besides the ones I already made,
which you can feel free to ignore given that this is already merged and
reconsidering things would cause extra work.
msg8725 (view) Author: silvan Date: 2019-03-21.16:17:12
Ah, I thought this was already done since after you posted that message, you
left 3-4 comments on bitbucket :-)
msg8724 (view) Author: malte Date: 2019-03-21.16:16:23
Sorry, I also wanted to have a quick look (see msg8683), but didn't get to it
yet. I'll do that immediately.
msg8721 (view) Author: silvan Date: 2019-03-21.15:53:01
Thanks! Merged.
msg8713 (view) Author: jendrik Date: 2019-03-20.14:01:41
The results look very good and the number of expansions doesn't change, so I'm
fine with merging this.
msg8712 (view) Author: silvan Date: 2019-03-20.13:52:49
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v6-issue908-base-issue908-v6-compare.html
msg8707 (view) Author: jendrik Date: 2019-03-19.23:41:59
I had a look but the renaming makes reviewing this quite hard. I'd be fine with
merging this after running a final experiment that checks that the number of
expansions remains the same compared to the base version. I'd be interested in
results for hillclimbing(), systematic(2) and systematic(3) patterns.
msg8702 (view) Author: silvan Date: 2019-03-19.11:24:58
I implemented the suggestion with 
using PatternID = int;
using PatternClique = vector<PatternID>;
and changed "max additive subsets" to "pattern cliques" everywhere.

Jendrik, did you finish reviewing already (you left some comments)? Do you want
to have a look at the commit which renames everything (I think the diff is now
quite hard to read, even though I haven't renamed the file max_additive_pdb_sets
yet).
msg8687 (view) Author: malte Date: 2019-03-17.14:25:09
> Besides using this name convention for variables, should we keep defining this
> name as an alias for what is now "vector(vector(int))" and what used to be
> "vector(vector(shared_ptr(PatternDatabase)))" (the tracker complaints if using
> langle/rangle )?

The good thing about these type aliases is that they can make the code more
readable. The bad thing is that they can make it less obvious what type is used.
It is always a judgement call.

Personally, I think that an "Xs" (i.e., "X" with plural "s") alias for vector<X>
can be more harmful than helpful because it obscures the fact that it's a
vector, and vector<X> is very readable anyway. When we use an "Xs" alias, it's
often mainly because "X" is very long because it is itself a templated type. So
I would rather define an alias for "MaxAdditivePatternSubset" and then use
"vector<MaxAdditivePatternSubset>".

Perhaps the names are complicated enough here that it's worth introducing some
new terminology, like "PatternClique" for a "maximal additive subset" of
patterns. This always gets rid of the issue that "subset" sounds like a set-like
representation when in truth it's a sequence.

And with the "int"s everywhere, perhaps it would also be useful to use aliases
for ints that are specific kinds of indexes here, as in the delete realaxation
heuristics with PropID and OpID. If we have "PatternID", it might make things a
bit clearer in some places.

All of these just meant as suggestions.
msg8686 (view) Author: silvan Date: 2019-03-17.11:42:26
Here is the comparison against the version that does not use a member for
h_values:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v5-issue908-v4-issue908-v5-compare.html
I think we can definitely use the "proper" variant without the member.

I interpreted the "pattern subset" differently, namely as a "subset of a
pattern", hence my (confusing) suggestion. I'm fine with going for
MaxAdditivePatternSubsets then. Besides using this name convention for
variables, should we keep defining this name as an alias for what is now
"vector(vector(int))" and what used to be
"vector(vector(shared_ptr(PatternDatabase)))" (the tracker complaints if using
langle/rangle )?
msg8683 (view) Author: malte Date: 2019-03-15.15:08:40
Wow! I expected some improvement, but these are amazingly good results.

Regarding the name: MaxAdditivePatternSubsets sounds good to me and better than
the names involving PDB and also better than the name without either Pattern or
PDB. I think the name is at least as correct as
MaxAdditivePatternCollectionSubsets; the latter confuses me a bit. A "max
additive pattern subset" is clearly a (maximal, additive) subset of patterns, so
with the plural "subsets", it denotes a bunch of these things, which is what it is.

I will have a quick look at the diff, but it would be good if someone else could
review this more thoroughly.
msg8682 (view) Author: silvan Date: 2019-03-15.14:58:04
Latest results are very good:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v4-issue908-base-issue908-v4-compare.html

We still use the name "MaxAdditivePDBSubsets" for what now is a vector of 
vector of ints, and also the file containing the functions to compute them
is called like this. Do you think keeping the word "PDB" for that type is
important? I know that from a heuristic point of view, it makes sense to speak
of "additive PDBs", but I think we would also call patterns to be additive or
not, and in the code, computing max. additive subsets is independent of PDBs, it
only requires patterns. So I would suggest to rename the type (and the file) to
MaxAdditiveSubsets or MaxAdditivePatternSubsets, or maybe even more correctly,
MaxAdditivePatternCollectionSubsets (but very long).
msg8677 (view) Author: silvan Date: 2019-03-14.09:35:47
(There is another place in the hill climbing code where we actually can use the
same trick.)
msg8676 (view) Author: silvan Date: 2019-03-14.08:37:57
The hill climbing generator already uses the CanonicalPDBs class, so the answer
is no.

Actually, I'm not really sure profiling will help me here, since the difference
in runtime can be significant in both directions (see agricola, for example). Do
you think the difference be caused by other effects (cache locality, memory
layout)?
msg8675 (view) Author: jendrik Date: 2019-03-14.08:34:24
Can't you use the same trick there?
msg8674 (view) Author: silvan Date: 2019-03-14.08:23:59
As expected, search time is now greatly decreased:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v3-issue908-base-issue908-v3-compare.html

Still, the time to compute hill climbing patterns is higher on average than
before. I'll try once more to find some reason for this by profiling.
msg8673 (view) Author: silvan Date: 2019-03-13.20:11:07
Indeed, it does :-)
msg8672 (view) Author: malte Date: 2019-03-13.18:54:50
That sounds like an excellent suggestion.
msg8671 (view) Author: jendrik Date: 2019-03-13.18:53:31
Previously (and in the current version of the pull request), we compute the
heuristic value in each PDB multiple times. With the index-based version, we can
now calculate the heuristic value in each PDB once and sum and max over these
values afterwards.
msg8670 (view) Author: silvan Date: 2019-03-13.18:43:22
I implemented the suggested changes and run some experiments:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v2-issue908-base-issue908-v2-compare.html

Unfortunately, while memory decreases significantly, search time increases
noticeably for both hill climbing and systematic patterns. After looking a long
time at the diff, trying to figure out differences, also running a profiler, I
can only guess that it is due to the one level extra indirection we now have
when computing the value of the CPDB heuristic: looping over max. additive
subsets now means to retrieve an index which is used to look up the stored PDB
(pointer) instead of directly retrieving the PDB pointer.

With hill climbing patterns, also the generation time
(generator_computation_time) often increases a lot, but also sometimes decreases.

I'd be happy if someone could have a look at the pull-request to check if there
is anything obvious which I didn't see:
https://bitbucket.org/SilvanS/fd-dev/pull-requests/46/issue908/diff
msg8663 (view) Author: silvan Date: 2019-03-12.15:31:32
As a preparation for issue906 and as a general step towards not having to use
shared_ptr to PDBs everywhere, we don't want to store shared pointers to PDBs in
MaxAdditiveSubsets anymore, but instead store the indices into the vector of
patterns.
History
Date User Action Args
2019-03-21 16:40:36silvansetmessages: + msg8733
2019-03-21 16:33:04silvansetmessages: + msg8730
2019-03-21 16:32:20maltesetmessages: + msg8729
2019-03-21 16:31:41silvansetmessages: + msg8728
2019-03-21 16:28:22maltesetmessages: + msg8727
2019-03-21 16:17:12silvansetmessages: + msg8725
2019-03-21 16:16:23maltesetmessages: + msg8724
2019-03-21 15:53:01silvansetstatus: reviewing -> resolved
messages: + msg8721
summary: TODOs: - don't forget to change the name of the file max_additive_pdb_sets to pattern_cliques in the end. ->
2019-03-20 14:01:41jendriksetmessages: + msg8713
2019-03-20 13:52:49silvansetmessages: + msg8712
2019-03-19 23:41:59jendriksetmessages: + msg8707
2019-03-19 11:24:58silvansetmessages: + msg8702
summary: TODOs: - don't forget to change the name of MaxAdditivePDBSubsets to something without PDBs in its name, e.g., MaxAdditiveSubsets. -> TODOs: - don't forget to change the name of the file max_additive_pdb_sets to pattern_cliques in the end.
2019-03-18 09:33:57floriansetnosy: + florian
2019-03-17 14:25:09maltesetmessages: + msg8687
2019-03-17 11:42:26silvansetmessages: + msg8686
2019-03-15 15:08:40maltesetmessages: + msg8683
2019-03-15 14:58:04silvansetmessages: + msg8682
2019-03-14 09:35:47silvansetmessages: + msg8677
2019-03-14 08:37:57silvansetmessages: + msg8676
2019-03-14 08:34:24jendriksetmessages: + msg8675
2019-03-14 08:23:59silvansetmessages: + msg8674
2019-03-13 20:11:07silvansetmessages: + msg8673
2019-03-13 18:54:51maltesetmessages: + msg8672
2019-03-13 18:53:31jendriksetmessages: + msg8671
2019-03-13 18:43:22silvansetstatus: in-progress -> reviewing
messages: + msg8670
summary: TODOs: - don't forget to change the name of MaxAdditivePDBSubsets to something without PDBs in its name, e.g., MaxAdditiveSubsets.
2019-03-12 15:31:32silvancreate