Issue 908: Change MaxAdditiveSubsets to use indices into the pattern/PDB collection rather than shared pointers to PDBs

Title	Change MaxAdditiveSubsets to use indices into the pattern/PDB collection rather than shared pointers to PDBs
Priority	wish	Status	resolved
Superseder		Nosy List	florian, jendrik, malte, silvan
Assigned To	silvan	Keywords
Optional summary

Created on 2019-03-12.15:31:32 by silvan, last changed by silvan.

Messages
msg8733 (view)	Author: silvan	Date: 2019-03-21.16:40:36
Done (adding the fix, not a bug, I hope).
msg8730 (view)	Author: silvan	Date: 2019-03-21.16:33:04
Since I already fixed a more subtle bug today, I feel allowed to add another small one ;-)
msg8729 (view)	Author: malte	Date: 2019-03-21.16:32:20
Sure, if we don't introduce a bug. :-)
msg8728 (view)	Author: silvan	Date: 2019-03-21.16:31:41
They seem small and clear enough so that I could do them in one commit on the default branch, if you don't mind.
msg8727 (view)	Author: malte	Date: 2019-03-21.16:28:22
> Ah, I thought this was already done since after you posted that message, you > left 3-4 comments on bitbucket Those were ones I could do straight away, whereas I thought for the code review I'd have to set aside a larger chunk of time. But I had a quick look now and noticed that the changes are fairly localized, in which case I don't really have anything to add except for the cosmetic comments I made. I would have wanted to look at it in more detail if there had been more structural changes. So no further comments besides the ones I already made, which you can feel free to ignore given that this is already merged and reconsidering things would cause extra work.
msg8725 (view)	Author: silvan	Date: 2019-03-21.16:17:12
Ah, I thought this was already done since after you posted that message, you left 3-4 comments on bitbucket :-)
msg8724 (view)	Author: malte	Date: 2019-03-21.16:16:23
Sorry, I also wanted to have a quick look (see msg8683), but didn't get to it yet. I'll do that immediately.
msg8721 (view)	Author: silvan	Date: 2019-03-21.15:53:01
Thanks! Merged.
msg8713 (view)	Author: jendrik	Date: 2019-03-20.14:01:41
The results look very good and the number of expansions doesn't change, so I'm fine with merging this.
msg8712 (view)	Author: silvan	Date: 2019-03-20.13:52:49
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v6-issue908-base-issue908-v6-compare.html
msg8707 (view)	Author: jendrik	Date: 2019-03-19.23:41:59
I had a look but the renaming makes reviewing this quite hard. I'd be fine with merging this after running a final experiment that checks that the number of expansions remains the same compared to the base version. I'd be interested in results for hillclimbing(), systematic(2) and systematic(3) patterns.
msg8702 (view)	Author: silvan	Date: 2019-03-19.11:24:58
I implemented the suggestion with using PatternID = int; using PatternClique = vector<PatternID>; and changed "max additive subsets" to "pattern cliques" everywhere. Jendrik, did you finish reviewing already (you left some comments)? Do you want to have a look at the commit which renames everything (I think the diff is now quite hard to read, even though I haven't renamed the file max_additive_pdb_sets yet).
msg8687 (view)	Author: malte	Date: 2019-03-17.14:25:09
> Besides using this name convention for variables, should we keep defining this > name as an alias for what is now "vector(vector(int))" and what used to be > "vector(vector(shared_ptr(PatternDatabase)))" (the tracker complaints if using > langle/rangle )? The good thing about these type aliases is that they can make the code more readable. The bad thing is that they can make it less obvious what type is used. It is always a judgement call. Personally, I think that an "Xs" (i.e., "X" with plural "s") alias for vector<X> can be more harmful than helpful because it obscures the fact that it's a vector, and vector<X> is very readable anyway. When we use an "Xs" alias, it's often mainly because "X" is very long because it is itself a templated type. So I would rather define an alias for "MaxAdditivePatternSubset" and then use "vector<MaxAdditivePatternSubset>". Perhaps the names are complicated enough here that it's worth introducing some new terminology, like "PatternClique" for a "maximal additive subset" of patterns. This always gets rid of the issue that "subset" sounds like a set-like representation when in truth it's a sequence. And with the "int"s everywhere, perhaps it would also be useful to use aliases for ints that are specific kinds of indexes here, as in the delete realaxation heuristics with PropID and OpID. If we have "PatternID", it might make things a bit clearer in some places. All of these just meant as suggestions.
msg8686 (view)	Author: silvan	Date: 2019-03-17.11:42:26
Here is the comparison against the version that does not use a member for h_values: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v5-issue908-v4-issue908-v5-compare.html I think we can definitely use the "proper" variant without the member. I interpreted the "pattern subset" differently, namely as a "subset of a pattern", hence my (confusing) suggestion. I'm fine with going for MaxAdditivePatternSubsets then. Besides using this name convention for variables, should we keep defining this name as an alias for what is now "vector(vector(int))" and what used to be "vector(vector(shared_ptr(PatternDatabase)))" (the tracker complaints if using langle/rangle )?
msg8683 (view)	Author: malte	Date: 2019-03-15.15:08:40
Wow! I expected some improvement, but these are amazingly good results. Regarding the name: MaxAdditivePatternSubsets sounds good to me and better than the names involving PDB and also better than the name without either Pattern or PDB. I think the name is at least as correct as MaxAdditivePatternCollectionSubsets; the latter confuses me a bit. A "max additive pattern subset" is clearly a (maximal, additive) subset of patterns, so with the plural "subsets", it denotes a bunch of these things, which is what it is. I will have a quick look at the diff, but it would be good if someone else could review this more thoroughly.
msg8682 (view)	Author: silvan	Date: 2019-03-15.14:58:04
Latest results are very good: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v4-issue908-base-issue908-v4-compare.html We still use the name "MaxAdditivePDBSubsets" for what now is a vector of vector of ints, and also the file containing the functions to compute them is called like this. Do you think keeping the word "PDB" for that type is important? I know that from a heuristic point of view, it makes sense to speak of "additive PDBs", but I think we would also call patterns to be additive or not, and in the code, computing max. additive subsets is independent of PDBs, it only requires patterns. So I would suggest to rename the type (and the file) to MaxAdditiveSubsets or MaxAdditivePatternSubsets, or maybe even more correctly, MaxAdditivePatternCollectionSubsets (but very long).
msg8677 (view)	Author: silvan	Date: 2019-03-14.09:35:47
(There is another place in the hill climbing code where we actually can use the same trick.)
msg8676 (view)	Author: silvan	Date: 2019-03-14.08:37:57
The hill climbing generator already uses the CanonicalPDBs class, so the answer is no. Actually, I'm not really sure profiling will help me here, since the difference in runtime can be significant in both directions (see agricola, for example). Do you think the difference be caused by other effects (cache locality, memory layout)?
msg8675 (view)	Author: jendrik	Date: 2019-03-14.08:34:24
Can't you use the same trick there?
msg8674 (view)	Author: silvan	Date: 2019-03-14.08:23:59
As expected, search time is now greatly decreased: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v3-issue908-base-issue908-v3-compare.html Still, the time to compute hill climbing patterns is higher on average than before. I'll try once more to find some reason for this by profiling.
msg8673 (view)	Author: silvan	Date: 2019-03-13.20:11:07
Indeed, it does :-)
msg8672 (view)	Author: malte	Date: 2019-03-13.18:54:50
That sounds like an excellent suggestion.
msg8671 (view)	Author: jendrik	Date: 2019-03-13.18:53:31
Previously (and in the current version of the pull request), we compute the heuristic value in each PDB multiple times. With the index-based version, we can now calculate the heuristic value in each PDB once and sum and max over these values afterwards.
msg8670 (view)	Author: silvan	Date: 2019-03-13.18:43:22
I implemented the suggested changes and run some experiments: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue908-v2-issue908-base-issue908-v2-compare.html Unfortunately, while memory decreases significantly, search time increases noticeably for both hill climbing and systematic patterns. After looking a long time at the diff, trying to figure out differences, also running a profiler, I can only guess that it is due to the one level extra indirection we now have when computing the value of the CPDB heuristic: looping over max. additive subsets now means to retrieve an index which is used to look up the stored PDB (pointer) instead of directly retrieving the PDB pointer. With hill climbing patterns, also the generation time (generator_computation_time) often increases a lot, but also sometimes decreases. I'd be happy if someone could have a look at the pull-request to check if there is anything obvious which I didn't see: https://bitbucket.org/SilvanS/fd-dev/pull-requests/46/issue908/diff
msg8663 (view)	Author: silvan	Date: 2019-03-12.15:31:32
As a preparation for issue906 and as a general step towards not having to use shared_ptr to PDBs everywhere, we don't want to store shared pointers to PDBs in MaxAdditiveSubsets anymore, but instead store the indices into the vector of patterns.

History
Date	User	Action	Args
2019-03-21 16:40:36	silvan	set	messages: + msg8733
2019-03-21 16:33:04	silvan	set	messages: + msg8730
2019-03-21 16:32:20	malte	set	messages: + msg8729
2019-03-21 16:31:41	silvan	set	messages: + msg8728
2019-03-21 16:28:22	malte	set	messages: + msg8727
2019-03-21 16:17:12	silvan	set	messages: + msg8725
2019-03-21 16:16:23	malte	set	messages: + msg8724
2019-03-21 15:53:01	silvan	set	status: reviewing -> resolved messages: + msg8721 summary: TODOs: - don't forget to change the name of the file max_additive_pdb_sets to pattern_cliques in the end. ->
2019-03-20 14:01:41	jendrik	set	messages: + msg8713
2019-03-20 13:52:49	silvan	set	messages: + msg8712
2019-03-19 23:41:59	jendrik	set	messages: + msg8707
2019-03-19 11:24:58	silvan	set	messages: + msg8702 summary: TODOs: - don't forget to change the name of MaxAdditivePDBSubsets to something without PDBs in its name, e.g., MaxAdditiveSubsets. -> TODOs: - don't forget to change the name of the file max_additive_pdb_sets to pattern_cliques in the end.
2019-03-18 09:33:57	florian	set	nosy: + florian
2019-03-17 14:25:09	malte	set	messages: + msg8687
2019-03-17 11:42:26	silvan	set	messages: + msg8686
2019-03-15 15:08:40	malte	set	messages: + msg8683
2019-03-15 14:58:04	silvan	set	messages: + msg8682
2019-03-14 09:35:47	silvan	set	messages: + msg8677
2019-03-14 08:37:57	silvan	set	messages: + msg8676
2019-03-14 08:34:24	jendrik	set	messages: + msg8675
2019-03-14 08:23:59	silvan	set	messages: + msg8674
2019-03-13 20:11:07	silvan	set	messages: + msg8673
2019-03-13 18:54:51	malte	set	messages: + msg8672
2019-03-13 18:53:31	jendrik	set	messages: + msg8671
2019-03-13 18:43:22	silvan	set	status: in-progress -> reviewing messages: + msg8670 summary: TODOs: - don't forget to change the name of MaxAdditivePDBSubsets to something without PDBs in its name, e.g., MaxAdditiveSubsets.
2019-03-12 15:31:32	silvan	create

Issue908