Issue1007

Title Integrate CEGAR-based pattern collection generation algorithms
Priority wish Status resolved
Superseder Nosy List jendrik, malte, patfer, salome, silvan, thomas
Assigned To silvan Keywords
Optional summary

Created on 2021-02-10.16:43:36 by silvan, last changed by silvan.

Files
File name Uploaded Type Edit Remove
nomysteryopt11-p17-profiles.tar.gz silvan, 2021-05-04.17:09:35 application/gzip
Messages
msg10314 (view) Author: silvan Date: 2021-06-16.10:20:27
Merged! Thanks everyone for their help!
msg10312 (view) Author: malte Date: 2021-05-29.12:35:29
Sorting doesn't make a difference; under normal conditions, set<X> is substantially slower than unordered_set<X> both in theoretical complexity (O(log n) for insertion and membership test vs. O(1)) and in the constant factors involved. This can be different if the hash function is expensive to compute or leads to too many collisions.
msg10311 (view) Author: silvan Date: 2021-05-28.22:37:25
I think Thomas left a comment about this. Then I checked how duplicate detection is done in hill-climbing and saw that it uses set<Pattern>. I assumed the motivation is that patterns are sorted anyway and therefore using set can be faster than hash set in this scenario. If this assumption is wrong, then we should indeed revert the change here, but then we should also discuss why hill-climbing does duplicate detection this way and if this shouldn't be changed, too.
msg10310 (view) Author: malte Date: 2021-05-28.22:17:19
Normally we would prefer hash sets over balanced trees (set) because they are usually much faster (as long as the hash function is reasonable). What is the motivation for going in the opposite direction? I am surprised that this speeds things up, but I haven't looked at the details of the code.
msg10309 (view) Author: silvan Date: 2021-05-26.18:00:18
I implemented another change: use set<Pattern> for duplicate detection instead of HashSet<Pattern>. This reduces CEGAR runtime quite a bit:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v18-fixed-seed-issue1007-v17-issue1007-v18-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v18-multiple-seeds-issue1007-v17-issue1007-v18-compare.html
Yet coverage, on average, actually decreases by 0.9 for the multiple CEGAR configuration. I still think the change makes sense and we should keep it. Any other thoughts?
msg10308 (view) Author: silvan Date: 2021-05-20.17:21:13
I finished the remaining missing changes and would appreciate a review of the added documentation.
msg10307 (view) Author: silvan Date: 2021-05-18.17:33:17
CEGAR configurations:

v14 vs v15: 
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v15-fixed-seed-issue1007-v14-issue1007-v15-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v15-multiple-seeds-issue1007-v14-issue1007-v15-compare.html
Clearly some positive trends regarding scores, likely due to v15 using int instead of size_t (see issue1018).

So from here on we consider v15 the new "baseline" for CEGAR configurations. 

v15 vs v16:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-fixed-seed-issue1007-v15-issue1007-v16-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-multiple-seeds-issue1007-v15-issue1007-v16-compare.html
v16 compares slightly positively in terms of memory.

v15 vs v17:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-fixed-seed-issue1007-v15-issue1007-v17-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-multiple-seeds-issue1007-v15-issue1007-v17-compare.html
Results are somewhat mixed regarding runtimes, expansions and coverage, which increase or decrease for different (fixed seed) configurations, but remain roughly the same on average. Computation of the CEGAR algorithm decreases substantially.

v16 vs v17:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-fixed-seed-issue1007-v16-issue1007-v17-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-multiple-seeds-issue1007-v16-issue1007-v17-compare.html
The comparison is similar to v17 compared to v15, but slightly worse (because v16 is slightly better overall compared to v15). Coverage slightly decreases on average, so do memory and search time scores, but expansion score and total time score increase for the multiple CEGAR configuration. And, as compared to v15, the CEGAR runtime is substantially faster.

The next step is to decide on which version to keep. I think the question is if we want to give up on computing strongly optimal plans for some saved runtime, but on other large improvements.
msg10304 (view) Author: silvan Date: 2021-05-11.18:24:29
baseline PDB configurations:

v16:
vs v15: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-ipdb-sys-issue1007-v15-issue1007-v16-compare.html
Surprisingly, if anything, there is a mildly positive effect on the baseline PDB configurations.
vs base-v2: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-ipdb-sys-issue1007-base-v2-issue1007-v16-compare.html

v17:
vs v16: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-ipdb-sys-issue1007-v16-issue1007-v17-compare.html
Although total time score decreases, there is even an increase in coverage of +2 for sys-3, possibly due to less memory usage due to not using the additional vector.
vs base-v2: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v16-v17-ipdb-sys-issue1007-base-v2-issue1007-v17-compare.html
Surprisingly, compared to the main version, there is even a +3 of coverage with sys-3. Maybe the additional int used in AbstractOperator or anything helps to achieve a better memory layout of PatternDatabase? Anyway, all of this shows that the baseline is not at all negatively affected and that even such small changes in code can lead to changes in performance which we might otherwise consider problematic (consider the -2 in total score time here, for example...)
msg10303 (view) Author: silvan Date: 2021-05-11.18:15:09
To clarify a few a things, here is a description of the latest versions:

v15: like v14 which we decided to be a good implementation for computing plans directly when computing PDBs, but with the latest version of main integrated.

v16: only track the number of zero-cost operators in domains with zero-cost operators. This means not using the vector which counts the zero-cost operators in absence of these. This should not impact the baseline because all the new if clauses are within if clauses which are only true when computing a plan.

v17: do not track zero-cost operators at all. This removes the vector from PatternDatabase but the code no longer computes strongly optimal plans.
msg10302 (view) Author: malte Date: 2021-05-07.16:58:26
But isn't that exactly the same in the other two versions? Maybe I misunderstand something. Discuss it on Discord?
msg10301 (view) Author: silvan Date: 2021-05-07.16:53:45
This is about testing the impact of having (empty) vectors in PDBs and an additional int in AbstractOpertor.
msg10300 (view) Author: malte Date: 2021-05-07.16:52:38
Do these changes make any difference for these configurations? I thought these two versions would only make a difference in the settings where we extract a plan, which we don't do for these configurations, right?
msg10297 (view) Author: silvan Date: 2021-05-07.14:59:08
The results which Jendrik asked are here: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v15-ipdb-sys-issue1007-base-v2-issue1007-v15-compare.html
There is no performanc penalty for the systematic pattern generator, however, the hill climbing generator now (after merging from main) compares a bit worse than in our previous test in terms of CPDB computation time score. Note that this is still the version where we use 2 additional vectors.

Should I run the same comparison with the other two versions (where we disable one vector in non-zero-cost domains or where we drop the zero-cost operator distinction entirely)? Or do you think the results aren't too bad?
msg10296 (view) Author: malte Date: 2021-05-07.11:06:43
>> If I understand correctly, many PDB configs now produce substantially less
>> output. This seems somewhat unrelated to the issue, but I'm not against it. It
>> should be mentioned in the change log, though.
> 
> They should output exactly the same as before.

Right, I misremembered what was changed. The obscure "combo" generator no longer dumps the collection, but that makes sense and was probably only ever a debug feature. We discussed on Zoom that the optional dumping of the collection, which we always want to disable, should be removed from the statistics dump method altogether.
msg10295 (view) Author: silvan Date: 2021-05-07.09:34:38
> My main concern is that the changes make the PatternDatabase implementation more complicated and they add members to the class that only the CEGAR pattern generators need. Have you tested experimentally that the additional memory that's needed for them is not problematic for systematic(2) and systematic(3)?

I'll do that.

> I am not sure why the changes to the abstract tasks are included, but I may have forgotten about a discussion related to this.

There is no need for these changes - I made them when adding the now-removed ProjectedTask but I thought that it would make a lot of sense to rename that one method because I found it very non-obvious what it does and why it is needed.

> Unless I missed it, I think there is no changelog entry yet.

Good point, I'll add one.

> If I understand correctly, many PDB configs now produce substantially less output. This seems somewhat unrelated to the issue, but I'm not against it. It should be mentioned in the change log, though.

They should output exactly the same as before.
msg10294 (view) Author: malte Date: 2021-05-06.23:57:11
I've had a look at the pull request and left some comments. I only considered changes to existing files, not new files, because I'm less worried about new code and we already have other reviewers.

I am not sure why the changes to the abstract tasks are included, but I may have forgotten about a discussion related to this.

Unless I missed it, I think there is no changelog entry yet.

If I understand correctly, many PDB configs now produce substantially less output. This seems somewhat unrelated to the issue, but I'm not against it. It should be mentioned in the change log, though.

I think that's all I have (plus the individual comments on the pull request).
msg10293 (view) Author: jendrik Date: 2021-05-06.23:39:20
I left some comments on GitHub.

My main concern is that the changes make the PatternDatabase implementation more complicated and they add members to the class that only the CEGAR pattern generators need. Have you tested experimentally that the additional memory that's needed for them is not problematic for systematic(2) and systematic(3)?

I think the cleanest approach to this would be to merge issue1019 before this one.
msg10292 (view) Author: silvan Date: 2021-05-06.20:08:14
We discussed the results and decided to keep this variant, as performance mostly improves and this way of computing plans requires much fewer lines of code.

Malte wanted to have a look at the pull request, which is now up to date after merging the latest version from main.
msg10290 (view) Author: silvan Date: 2021-05-06.10:29:44
We discussed to give the computation of plans along PDBs a last shot: we give up on uniform randomization and only store a single generating operator for each state during Dijkstra's. Basically, this is what we tried to do in v11, however with the fix for zero-cost operators which we count for tie-breaking.

Single config comparison:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v14-fixed-seed-issue1007-v8c-issue1007-v14-compare-hillclimbing.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v14-fixed-seed-issue1007-v8c-issue1007-v14-compare-cegar.html
While coverage increases overall, there are small changes in several domains, including a -3 in agricola with the single CEGAR config.
With the multiple CEGAR algorithm, there are large differences of computation time for different domains:  the computation now is a lot faster on elevators, but a lot slower on logistics98. 

Average results:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v14-multiple-seeds-issue1007-v8c-issue1007-v14-compare.html
Coverage and most of the scores increases on average, but like in the single config results, they also decrease on a few domains. Memory score overall decreases for the tested configs.
msg10289 (view) Author: silvan Date: 2021-05-06.09:00:56
I updated the links of msg10281 because they didn't contain the issue name or the date. The reports can now be found under:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v13-fixed-seed-issue1007-v8c-issue1007-v13-compare-hillclimbing.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v13-fixed-seed-issue1007-v8c-issue1007-v13-compare-cegar.html
msg10286 (view) Author: silvan Date: 2021-05-04.17:09:34
I looked at nomystery-opt11-p06 because the single CEGAR config has a large relative construction time difference: 2.00s vs 11.44s. The logs clearly show that computing the PDBs including plans does not scale once patterns include at least ~5 variables. I then profiled p17 of that domain (0.30s vs 1.27s) to see that sorting the applicable operators takes up too much runtime. But even when not sorting (thus giving up on the uniform random selection of generating operators), the along-computation of plans is too expensive (for example, 25% of the time is spent asking the mersenne-twister object for the next pseudo-random value). I attach the three profiles (v8c: the previous implementation computing plans using ProjectedTask; v13: the latest version, v13nosort: without sorting).
msg10285 (view) Author: silvan Date: 2021-05-04.15:53:45
The second link in msg10281 should be: https://ai.dmi.unibas.ch/_tmp_files/sieverss/cpdbs-cegar.html
msg10283 (view) Author: silvan Date: 2021-05-04.10:43:00
Here is the average report over 10 runs of each configuration: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v13-multiple-seeds-issue1007-v8c-issue1007-v13-compare.html
It seems to confirm that due to the slower computation time of CEGAR, multiple CEGAR runs fewer iterations and finds fewer patterns, although the numbers (both for iterations and patterns) go in both directions.
msg10281 (view) Author: silvan Date: 2021-05-03.16:40:45
I updated the report to include some more attributes that are already part of the output:
cpdbs_computation_time: runtime of the construction + dominance pruning and comptation of max cliques
cpdbs_num_pattern: size of the collection
cpdbs_total_pdb_size: summed PDB sizes over all PDBs (seems to not have recorded correctly)

The same three attributes are also included only for the CEGAR computation, named cegar_*, plus additionally cegar_num_iterations which counts the number of iterations before. 

Report for iPDB: https://ai.dmi.unibas.ch/_tmp_files/sieverss/cpdbs-hillclimbing.html 
Report for CEGAR configs: https://ai.dmi.unibas.ch/_tmp_files/sieverss/cpdbs-hillclimbing.html

The computation time of the patterns increases significantly (both for iPDB and CEGAR). The number of iterations (and with that, often, the size of the collection/summed PDB sizes) by both single and multiple CEGAR in particular decreases significantly.
msg10274 (view) Author: thomas Date: 2021-04-30.19:35:12
Is there any way to add information on the size of the generated PDBs? E.g., the number of abstract states / transitions in the largest PDB, or the sum of abstract states / transitions over all generated PDBs?

I think it would be good to know if the new way of generating plans causes the decrease in time and memory scores, or if these are caused because it became more likely that larger PDBs (but less?) are created. If the latter is the case, we might be in a situation where parameters have been optimized for the original implementation, and those parameters simply do not work as well for the changed implementation.

Could you also add information on the initial h-value?
msg10271 (view) Author: silvan Date: 2021-04-30.18:20:55
The result for single configurations (with a fixed random seed) for v13 compared against the previous "set version" v8c are in: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v13-fixed-seed-issue1007-v8c-issue1007-v13-compare.html
Even though coverage increases for some configurations, both memory and time scores significantly decrease and the best config loses 11 coverage. I wouldn't put too much hope in the experiments averaged over 10 runs. Looking at iPDB, it seems that sorting the applicable operators costs too much runtime.
msg10265 (view) Author: silvan Date: 2021-04-30.11:55:49
We discussed further how we could hopefully fix the performance by not storing all generating operators for each predecessor, but only one of them (v11). To avoid the bias, we chose sort the operators their effect and skip equivalent ones, thus only considering generating operators uniformly at random (v12). Unfortunately, the problem still persisted: 
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v11-v12-fixed-seed-issue1007-v8c-issue1007-v12-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v11-v12-fixed-seed-issue1007-v10-issue1007-v11-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v11-v12-fixed-seed-issue1007-v11-issue1007-v12-compare.html

A closer inspection revealed that we run into cycles on problems with zero-cost operators. We discussed this and decided that we could try to solve this issue by additionally counting, for each state, how many 0-cost operators lead to this state using the generating operator. Experiments for this fix, v13, are running.
msg10252 (view) Author: silvan Date: 2021-04-28.09:24:29
As discussed in the sprint meeting, Thomas and I implemented the alternative of computing plans in an interleaved fashion during PDB construction. While it has a minor negative impact on standard iPDB, performance degrades significantly for the CEGAR algorithms:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v10-fixed-seed-issue1007-v8c-issue1007-v10-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v10-multiple-seeds-issue1007-v8c-issue1007-v10-compare.html
I looked at many log files to see what causes the changes, but the only effect I could see was that the new version computes different patterns, due to finding different plans (thus possibly picking different flaws etc.). It seems that the new version needs to merge patterns much more frequently, therefore requiring more memory. 

I don't really know if there is anything we can do about this - Thomas and I made some effort to ensure that we do not exclude choices too early on: during Dijkstra's, when updating the distances of a predecessor, we reset the list of operators leading to this predecessor and in later iterations include any other operators leading to the same predecessor with the same cost. When Dijkstra's finishes, we compute the wildcard plan by following a random generating operator of each encountered state and then computing all parallel operators leading to the same successor state. This mimics what we did so far with the EHC approach on the projected task.
msg10250 (view) Author: silvan Date: 2021-04-26.08:53:22
Finally, I implemented and tested a change which Thomas suggested: instead of using ProjectedTask and its successor generator, build a forward-version of MatchTree when computing the PDB. This could have some synergy because we already compute all abstract operators anyway.

Unfortunately, implementing this was much more complicated than I anticipated. The main reason is that the computation of PDBs requires quite a few steps: 
- compute the hash function (hash_multipliers), which as a by-product, computes num_states (for the final PDB)
- compute variable_to_index for the computation of abstract operators
- compute abstract operators (requires hash function, variable_to_index)
- compute match tree (requires abstract operators, hash function)
- compute distances (requires all of the above)

I tried separating these steps into separate classes (e.g., Projecton, HashFunction, AbstractOperators) and functions/factories, but I don't like the resulting interface between the functions/factories too much. If we want to pursue this idea further, I think it makes sense to tackle the refactoring of PDB into several pieces in a separate issue.

I nevertheless evaluated the initial implementation and compared the standard CEGAR configs plus iPDB (and already fixed the segfaults on unsolvable tasks which are still reported here):
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v9-fixed-seed-issue1007-v8c-issue1007-v9-compare.html
In the comparison of single configurations, there is a rather visible negative impact on runtime scores for most configurations, and coverage ranges between -1 and +4. For the CEGAR configs, however, there are lots of randomization effects.
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v9-multiple-seeds-issue1007-v8c-issue1007-v9-compare.html#summary
The average report shows similar results: coverage is mixed and runtime scores go rather down.

To pursue this issue, I'd like to discuss
a) what you think of the whole idea of not using ProjectedTask but MatchTree for computing plans for PDBs
b) any ideas of how the existing classes PatternDatabase, AbstractOperator and MachTree could be rearranged to allow using them separately from using the PDB constructor

In my opinion, as long as there is no clear benefit in terms of code design or experimental results of using MatchTree over ProjectedTask, I wouldn't pursue this any further. Regardless of this decision, I would like to come up with a solution for b) since I think the PatternDatabase class should be split into computation and the actual PDB anyway (and I already spent a good amount of time doing this initial implementation).
msg10249 (view) Author: silvan Date: 2021-04-26.08:04:12
I have results for the intermediate versions v7b and v7c. Recall that I originally expected no performance changes or randomization effects from v7 to v8 but that the average comparison showed a minus of 1.7 coverage for one multiple CEGAR config.

v7 vs v7b: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v7b-v7c-fixed-seed-issue1007-v7-issue1007-v7b-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v7b-v7c-issue1007-v7-issue1007-v7b-compare.html
No large differences

v7b vs v7c:
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v7b-v7c-fixed-seed-issue1007-v7b-issue1007-v7c-compare.html
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v7b-v7c-issue1007-v7b-issue1007-v7c-compare.html
No large differences (if anything, slightly positive)

v7c vs v8: this actually contains a small difference: instead of passing unpacked State objects for computing plans, pass the unpacked values (vector<int>)
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7c-v8-fixed-seed-issue1007-v7c-issue1007-v8-compare.html
Fixed seed comparison: coverage is stable, slight decrease of scores for the blacklisting + stagnation multiple CEGAR config (which was the one with coverage loss when comparing v7 to v8 originally)
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7c-v8-issue1007-v7c-issue1007-v8-compare.html
Average comparison: some decrease of scores, but nothing too wild. Also, the coverage differences of those three pairwise comparison don't add up to the originally observed -1.7.

I'll leave everything as is for those revisions.
msg10247 (view) Author: silvan Date: 2021-04-15.19:13:35
I implemented many changes, mainly due to Thomas' review. I tagged several revisions and report results averaged over 10 runs and single configuration comparisons.

results:
v8: compared to v7, lots of refactoring due to Thomas' review. I tried to exclude changes that would affect randomization or performance generally (obviously, this didn't work out):
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v8-v8b-issue1007-v7-issue1007-v8-compare.html
In the average report, coverage goes down a bit for the multiple CEGAR config, scores do, too, but we have observed such changes even on average for minor changes before. I'll nevertheless tag a few revisions in between v7 and v8.
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v8-v8b-fixed-seed-issue1007-v7-issue1007-v8-compare.html
In the fixed seed comparison, everything looks good expect for the multiple CEGAR variant with both blacklisting and stagnation, where coverage goes down by 1.

v8b: randomize order of operators of wildcard plan steps (important for refining)
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v8-v8b-issue1007-v8-issue1007-v8b-compare.html
Average comparison: coverage is stable, but for multiple CEGAR, total time increases quite a bit (and expansions decrease), memory increases a bit. But I wouldn't change anything here -- randomizing everything is important.
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-v8-v8b-fixed-seed-issue1007-v8-issue1007-v8b-compare.html
In the fixed seed comparison, the change of randomization has a very visible effect (which is why we also make the average reports).

v8c: multiple CEGAR: use a single RNG object for all calls to CEGAR instead of creating a new one every time
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v8b-v8c-v8d-issue1007-v8b-issue1007-v8c-compare.html
Average comparison: for multiple CEGAR, coverage decreases slightly, but time scores increase a bit, memory stays the same. This is a very typical effect of changing randomization.
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v8b-v8c-v8d-fixed-seed-issue1007-v8b-issue1007-v8c-compare.html
The fixed seed comparison also shows a strong influence of using the same same RNG object, with positive and negative impact on scores (memory/search time/total time), but only positive impact on expansion score. I'm hence inclined to preserve this change (even though the average coverage slightly decreases).

v8d: use vector<bool> instead of unordered_set<int> for blacklisted variables
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v8b-v8c-v8d-issue1007-v8c-issue1007-v8d-compare.html
Average comparison: coverage is stable but all scores slightly decrease.
https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v8b-v8c-v8d-fixed-seed-issue1007-v8c-issue1007-v8d-compare.html
The fixed seed comparison shows mostly no changes; there is no visible trade-off of memory/time consumption. From the code perspective, I prefer the original unordered_set<int> solution and therefore just reverted the code accordingly.
msg10239 (view) Author: thomas Date: 2021-03-30.20:43:59
I reviewed the cegar and single generator and would like to discuss the issue with the projection generation that came up in our meeting yesterday before I proceed.
msg10168 (view) Author: silvan Date: 2021-03-08.19:19:51
v7: refactor multiple CEGAR generator which simplifies its structure quite a bit. Coverage loss still in range of previously observed "noise": https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v7-multiple-seeds-issue1007-v6-issue1007-v7-compare.html
msg10160 (view) Author: silvan Date: 2021-03-01.21:20:40
I couldn't find anything else, so I'm fine leaving the changes as they are. I think the code is now ready for deeper code reviews: https://github.com/aibasel/downward/pull/33

The core new files are cegar.*, steepest_ascent_enforced_hill_climbing.*, and pattern_collection_generator_*_cegar.*. One could easily review the other files (changes to PDBs, projected_task.* etc.) independently of those added files.
msg10159 (view) Author: silvan Date: 2021-03-01.09:49:21
The last link should have been: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v5-multiple-seeds-issue1007-v4-b-issue1007-v5-compare.html
msg10158 (view) Author: silvan Date: 2021-02-28.11:22:18
v4-b (compared to v4-a): https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v4-b-multiple-seeds-issue1007-v4-a-issue1007-v4-b-compare.html no large differences

v5 compared to v4-b (this is contains the "original change of v5", i.e., plans are directly computed as sequence of parallel operators rather than as sequence of states): https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v5-issue1007-v4-issue1007-v5-compare.html This is the change that let performance go down for the single CEGAR configurations. I'll try to investigate a bit more.
msg10152 (view) Author: silvan Date: 2021-02-26.16:22:46
I have to retract the statement regarding version v5, unfortunately. The random number generator between v4 and v5 differed (unintended) because I used a different compiler to compile v4 and v5. In particular, I swapped the order of loading CMake and GCC-8.3.0 on the grid. After recompiling both versions with the same compiler, the results are less pronounced and negative for single CEGAR.

Therefore, I ran another experiment which compares a single and a multiple CEGAR configuration of v4 and v5 using 10 the average over 10 runs with different random seeds. For the single CEGAR configuration, coverage decreases by 2.7, the multiple CEGAR variant remains the same (no more insane improvements): https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v5-multiple-seeds-issue1007-v4-issue1007-v5-compare.html

To track down the differences that causes this, I added further intermediate code versions v4-a and v4-b. v4-a differs from v4 only by going in inverse order over the goals of the planning task, !which are already given in a random order!. Nevertheless, this causes (positive) changes in coverage even in a randomized experiment: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v4-a-multiple-seeds-issue1007-v4-issue1007-v4-a-compare.html

For version v4-b, experiments haven't finished yet. However, since we get changes in coverage even when averaging over 10 runs due to such small changes, I won't try to nail down the cause of the changes for the single CEGAR configs much further. I think that in the end, before merging the issue, we should instead conduct a large experiment to see the performance compared to what we reported in the paper.

In the meantime, I also tagged version v6: the main difference to v5, besides lots of refactoring, is to use a vector instead of a hash map to map variables to projected variables in ProjectedTask. No differences observed: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v6-multiple-seeds-issue1007-v5-issue1007-v6-compare.html
msg10116 (view) Author: silvan Date: 2021-02-21.19:18:11
v5: we split off the computation of plans for projections via steepest ascent enforced hill climbing into its own file (extern function). We also simplified the computation of plans (note: we compute "wildcard" plans, i.e., sequences of sets of "parallel" operators that all induce the same abstract transition) by not computing a sequence of states first and then finding all operators that induce that state sequence, but rather finding and storing those operators right away, thus not needing to store a sequence of states. This improved performance substantially (disregarding possible random effects; just evaluating a single random seed): https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v5-issue1007-v4-issue1007-v5-compare.html
msg10073 (view) Author: malte Date: 2021-02-16.19:57:21
> I would suggest to go with the fixed version that always allows merging, as
> described in the paper, and not further investigate this, since the
> performance is on par to before.

Agreed. We only considered the alternatives because we thought that the fixed version performed significantly worse than the buggy ones, and this is no longer what we think.
msg10072 (view) Author: silvan Date: 2021-02-16.18:25:15
Experiments for v4: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v4-single-cegar-allow-merging-options.html
(I forgot to set the CPLEX path, hence the PhO configs failed.)
Even if assuming some variance due to randomization, as observed with the previous experiment, we can observe that not allowing merging harms performance. The other two configs are equivalent to the previously reported experiment which compares the version without and with bug fix. 

I would suggest to go with the fixed version that always allows merging, as described in the paper, and not further investigate this, since the performance is on par to before. For the final experiments before merging this issue, we can still consider to use SCP/PhO in addition to the canonical heuristic and we definitely should use the average over multiple runs with different random seeds, as done in the paper.
msg10071 (view) Author: silvan Date: 2021-02-16.16:45:52
Regarding v2/v3 (bug fix):
- this is the first experiment which made me worrying about performance degradation: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v3-issue1007-v2-issue1007-v3-compare.html
- in fact, running one of two configs 10 times each and averaging shows that there is *no* large performance problem; total time is increased a bit but coverage is unaffected: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v3-single-cegar-wildcard-average-issue1007-v2-issue1007-v3-compare.html

I still added the options discussed before (allowing merges only for precondition flaws, as before, for all types of flaws, as described in the paper, or not at all). This is v4 in the code. Experiments are running; however, only for a fixed random seed and I wonder if this is a good idea...
msg10069 (view) Author: silvan Date: 2021-02-16.09:10:44
I made some progress and also ran experiments to ensure that there are no major performance changes. I tagged the following versions:

base: code from the paper repo adapted to the latest Fast Downward version
v1: refactoring: splitting off the algorithmic part from the generators
v2: refactoring of options; changes affect the RNG because things happen in different orders now
v3: bug fix: if an abstract solution of a pattern is not a concrete solution, consider all goal variables of the task as flaws (unless blacklisted already) also if they are already included in the collection. Previously, we would never raise such flaws because all goal variables are always part of the collection from the start on. This only affects the single CEGAR algorithm.

Experiments:
- base/v1: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v1-issue1007-base-issue1007-v1-compare.html
  looks good, no large changes
- v1/v2: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1007-v2-best-average-issue1007-v1-issue1007-v2-compare.html
  the changed randomization doesn't show any bad effects (coverage increases by 0.3 averaged over 10 runs for the single best config I tested)
- v2/v3: initial experiments confirm what I expected: performance decreases substantially because we now spend a lot of iterations of the algorithm blacklisting goal variables because it is likely that we can never merge the two patterns containing the goal variables (recall that all goal variables are part of the initial collection already). Previously, we just didn't raise any goal violation flaws, which was much more efficient. I'm running a full set of averaged experiments for a single config, but this will take a while and I think that we will need to find a solution for this problem anyway.
msg10023 (view) Author: silvan Date: 2021-02-10.16:43:36
We want to integrate the CEGAR-based algorithms for pattern collection generation described in the ICAPS 2019 paper by Rovner et al.
History
Date User Action Args
2021-06-16 10:20:39silvansetsummary: Pull request: https://github.com/aibasel/downward/pull/33 ->
2021-06-16 10:20:28silvansetstatus: in-progress -> resolved
messages: + msg10314
2021-05-29 12:35:29maltesetmessages: + msg10312
2021-05-28 22:37:25silvansetmessages: + msg10311
2021-05-28 22:17:19maltesetmessages: + msg10310
2021-05-26 18:00:19silvansetmessages: + msg10309
2021-05-20 17:21:14silvansetmessages: + msg10308
2021-05-18 17:33:18silvansetmessages: + msg10307
2021-05-11 18:24:29silvansetmessages: + msg10304
2021-05-11 18:15:09silvansetmessages: + msg10303
2021-05-07 16:58:26maltesetmessages: + msg10302
2021-05-07 16:53:45silvansetmessages: + msg10301
2021-05-07 16:52:38maltesetmessages: + msg10300
2021-05-07 14:59:08silvansetmessages: + msg10297
2021-05-07 11:06:43maltesetmessages: + msg10296
2021-05-07 09:34:38silvansetmessages: + msg10295
2021-05-06 23:57:11maltesetmessages: + msg10294
2021-05-06 23:39:20jendriksetmessages: + msg10293
2021-05-06 20:08:14silvansetmessages: + msg10292
2021-05-06 10:29:45silvansetmessages: + msg10290
2021-05-06 09:00:58silvansetmessages: + msg10289
2021-05-04 17:09:35silvansetfiles: + nomysteryopt11-p17-profiles.tar.gz
messages: + msg10286
2021-05-04 15:53:46silvansetmessages: + msg10285
2021-05-04 10:43:01silvansetmessages: + msg10283
2021-05-03 16:40:45silvansetmessages: + msg10281
2021-04-30 19:35:12thomassetmessages: + msg10274
2021-04-30 18:20:56silvansetmessages: + msg10271
2021-04-30 11:55:49silvansetmessages: + msg10265
2021-04-28 09:24:30silvansetmessages: + msg10252
2021-04-26 08:53:23silvansetmessages: + msg10250
2021-04-26 08:04:13silvansetmessages: + msg10249
2021-04-15 19:13:39silvansetmessages: + msg10247
2021-03-30 20:43:59thomassetmessages: + msg10239
2021-03-08 19:19:52silvansetmessages: + msg10168
2021-03-01 21:20:41silvansetmessages: + msg10160
summary: Pull request: https://github.com/aibasel/downward/pull/33
2021-03-01 09:49:21silvansetmessages: + msg10159
2021-02-28 11:22:19silvansetmessages: + msg10158
2021-02-26 16:22:47silvansetmessages: + msg10152
2021-02-21 19:18:20silvansetnosy: + thomas
2021-02-21 19:18:12silvansetmessages: + msg10116
2021-02-16 19:57:21maltesetmessages: + msg10073
2021-02-16 18:25:15silvansetmessages: + msg10072
2021-02-16 16:45:53silvansetmessages: + msg10071
2021-02-16 09:10:44silvansetmessages: + msg10069
2021-02-10 16:43:36silvancreate