Issue1042

Title Refactoring of pruning method: create LimitedPruning class and move statistics to base pruning class
Priority wish Status resolved
Superseder Nosy List jendrik, malte, silvan
Assigned To silvan Keywords
Optional summary
Part of meta issue966.

Created on 2022-02-03.18:15:43 by silvan, last changed by silvan.

Summary
Part of meta issue966.
Messages
msg10760 (view) Author: silvan Date: 2022-07-13.08:51:17
For the record, with GCC 10.3.0, the effect is less visible than with Clang, but similar as with GCC 8.3.0: https://ai.dmi.unibas.ch/_tmp_files/sieverss/2022-06-27-regression-gcc10.3.0-6769e68c6a728ab2a76cc12e2e6ae066f7fd289e-2a9cc5a5a6870031808c4e896393c1c82d9289b2-compare.html
msg10754 (view) Author: malte Date: 2022-07-12.12:39:35
I'll wait for you to create the new issue rather than create it myself and cause confusion, but here is already some input for it: the main culprit for the slowdown seem to be the calls to the timer in PruningMethod::prune_operators (timer.resume and timer.stop).

On my local machine, I ran some tests on barman-opt11-strips/pfile01-001.pddl with the M&S configuration. This is one of the tasks with a bad slowdown in Silvan's gcc experiment (from 42ish seconds to 52ish seconds). Locally, I got the following search times (not total runtimes of the search component!) with three run repetitions:

- 1st revision in Silvan's experiment: 18.25s, 17.89s, 17.91s
- 2nd revision in Silvan's experiment: 23.35s, 23.20s, 23.20s
- 2nd revision without timer calls:    18.48s, 18.09s, 18.15s

Further minimizing PruningMethod::prune_operators, leaving only the call to prune, brought no further clear benefit:              18.13s, 18.36s, 18.12s
msg10752 (view) Author: malte Date: 2022-07-12.12:10:19
Yes, that's probably a good idea.
msg10751 (view) Author: silvan Date: 2022-07-12.10:50:35
Results with Clang 12.0.1: https://ai.dmi.unibas.ch/_tmp_files/sieverss/2022-06-27-regression-clang12.0.1-6769e68c6a728ab2a76cc12e2e6ae066f7fd289e-2a9cc5a5a6870031808c4e896393c1c82d9289b2-compare.html

Looking at time scores, he performance degradation is similar, even slightly more pronounced. I think we should do something about this. Should I open a new issue for this?
msg10737 (view) Author: malte Date: 2022-07-04.12:14:32
Does someone want to look into this?

The hypothesis "bad compiler luck" could be tested by running with different compilers. At least g++ vs. clang should not be too difficult to test.
msg10736 (view) Author: silvan Date: 2022-07-04.12:11:30
I stumbled over this issue again when I recently merged the main branch into several of my repositories and noticed a more significant drop in search time. I tracked this down to the changes introduced in this issue and ran an experiment using more than just LM-cut: https://ai.dmi.unibas.ch/_tmp_files/sieverss/2022-06-27-regression-6769e68c6a728ab2a76cc12e2e6ae066f7fd289e-2a9cc5a5a6870031808c4e896393c1c82d9289b2-compare.html

For LM-cut, the observation is consistent with the observations in this issue, namely search time drops a bit (and consistently), in particular in miconic. However, with configurations with faster heuristic evaluation like M&S, CPDBs and blind, the decrease in the summed search time score is around 10 rather than around 1, which I thought could be a reason to re-consider if the change of this issue is acceptable or not.

Also, I think that we don't run any regression tests in our Github actions, otherwise we would have noticed this regression earlier. I don't remember if we wanted to re-introduce them at some point or if we thought that they were not so useful.
msg10601 (view) Author: silvan Date: 2022-02-11.12:01:07
Merged.
msg10600 (view) Author: silvan Date: 2022-02-11.11:55:44
Thanks a lot for the deep analysis! I removed the wrong override of print_statistics().

Regarding using a log in future: I absolutely agree. I wouldn't have been against printing statistics also with null pruning, but I'm also fine with deferring thinking about this when we introduce a log.
msg10599 (view) Author: malte Date: 2022-02-11.11:49:19
For limited pruning, I looked at the operator-centric configuration because this was the one that was most severely affected in coverage. I looked at the same miconic task again (28-0) because we again had a clear effect in the grid experiment: 5.78s for base, 6.71s for v3. I have similar conclusions: no algorithmic cause for the slowdown, looks like bad compiler luck.

In the following, when I say "v3", I mean v3 with a local change to enable statistics printing for limited pruning.

In detail I got the following runtimes (total time of search component) with time spent on pruning in parentheses:

base: 8.90s (0.44s), 8.82s (0.43s), 8.91s (0.43s), 8.90s (0.43s), 8.89s (0.43s)
v3:   8.88s (0.45s), 8.81s (0.44s), 8.81s (0.44s), 8.87s (0.44s), 8.89s (0.44s)

No reason to be concerned there.


miconic was also one of the domains where we lost coverage in this configuration in our grid experiment, so I also ran the affected task (once).

Base took 435.37s to solve the task on my machine, v3 took 442.66s. However, after 4.93s (base) and 4.99s (v3), pruning switched itself off because the pruning ratio was 0. Once pruning is switched off, limited pruning doesn't have an additional indirection to another pruning method any more. The total time spent for pruning for the runs was 0.25s (base) and 0.28s (v3). So the runtime difference of ~7s cannot be accounted to pruning but must happen in unchanged parts of the code.


All in all, to me this looks good to merge after we've added the missing statistics printing.


BTW, from this experience I have some new input on the earlier discussion of having NullPruning versus having a null pointer if we don't want to prune. It was useful in this case to be able to print the statistics for NullPruning (primarily for runtime), and I realized that it would also be useful for experiments to be able to enable printing of all statistics for null pruning even though it never prunes anything because this will make it easier to report results on pruning statistics in experiments like these in lab because all configurations can be handled uniformly.

For the future, I think it would be useful to give the pruning class a log option so that pruning statistics can be selective enabled/disabled (in particular, so that they can be enabled for null pruning, but probably other people might also be interested in disabling them for other pruning methods). Then print_statistics won't need to be virtual any more.

I'm not even sure we need different logging-related defaults for null pruning vs. other pruning methods. Perhaps it's a reasonable default to not print anything for any of the pruning methods, and only print in verbose mode.
msg10598 (view) Author: malte Date: 2022-02-11.11:36:01
Already mentioned on Discord, just adding for the record: in LimitedPruning, printing of statistics is disabled in the header file. This should be changed before merging.
msg10597 (view) Author: malte Date: 2022-02-11.10:55:52
I ran a local test using astar(blind()) (= the null pruning method) on miconic/s28-0.pddl. In Silvan's experiments, we see a consistent slow-down for astar(blind()) in miconic, often around 10%, often a bit less, but a very consistent picture. For this task, the base took 5.73s and v3 took 6.16s in Silvan's experiment.

In our discussion we said that calling size() twice and two additions of longs shouldn't take that much time. I noticed that there is some further new overhead for null pruning, namely starting and stopping the timer. This is probably a bit more expensive than the other things, but not at the scale we're seeing. In any case, all of these overheads were already present in the past for methods other than null pruning, and these methods also have increased overheads.

The results of my initial tests were (three runs each):

base: 8.45s, 8.47s, 8.48s
v3: 8.32s, 8.34s, 8.36s

So v3 was consistently a bit *faster*.

I then wanted to know how much time is actually spent inside this pruning code. The timer inside PruningMethod measures this, but for null pruning, reporting of statistics is disabled. So I locally enabled it again (commenting out the overrides for print_statistics) and ran this again. This again changed the runtime:

v3 with statistics printed: 8.05s, 7.93s, 8.01s

So again consistently *faster*.

To make sure I wasn't suffering too much from time-dependent effects (different background noise on my computer), I then also reran base again 7 more times and got times very similar to the original ones:

base: 8.44s, 8.44s, 8.46s, 8.40s, 8.48s, 8.42s, 8.46s.

The time spent inside the pruning code according to the pruning statistics output over five runs was:

pruning: 0.004s, 0.003s, 0.003s, 0.003s, 0.003s

Note the extra digit. This is 0.00s if we round to two digits after the point as before. There is a bit of a Heisenberg principle problem with this in that the timer cannot completely account for the full time spent inside the timer, but at minimum it should be able to account for half of it with our implementation. If we only spend 0.00s inside the changed code, then 0.43s of runtime difference on the grid (which is faster than my computer) are not due to algorithmic inefficiencies.

I think this is strong evidence that what we're seeing is compiler fickleness where small, possibly unrelated changes affect memory layout or optimizations. It's not runtime noise, as the times for the same code are quite consistent, and it's not "real" code efficiency, as it clearly isn't an algorithmic optimization to enable the printing of statistics.


I only looked at one config and one task, but perhaps this is already good enough to conclude that the effects are due to an "unlucky compile". The null-pruning config was the second-worst affected config in Silvan's latest experiment. But just to be safe, I'll also look at a limited pruning config. It could be that we're seeing two separate effects, an unlucky compile *and* the new limited pruning has an efficiency cost.
msg10591 (view) Author: silvan Date: 2022-02-10.22:52:41
I could see the comments before, just not the one regarding the friend discussion. Anyway, I found it now.

I agree that this is not an ideal solution. What also isn't ideal with the compositional approach is that we now use two instances of pruning method, which has two longs and one pointer. It is negligible, probably, but still unnecessary in principle.

I ran experiments after fixing limited pruning to use the internal pruning method: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1042-v3-issue1042-base-issue1042-v3-compare.html
Qualitatively, there is still an increase in runtime in all configs, including in particular those using limited pruning.

Maybe in the end the compositional approach isn't ideal after all? I'd still be fine merging this is it is, but I don't have a strong opinion. Probably something we should discuss tomorrow morning.
msg10590 (view) Author: malte Date: 2022-02-10.21:01:14
I probably forgot "Finish review" on github. Perhaps the comments arrived now?

The protected/friend thing is unfortunate because friend is nasty and because it somewhat breaks the encapsulation of LimitedPruning if the base class needs to know about it. But let's stick to the current solution then. I think alternatives would be more complicated. If we come up with a better idea later, we can implement it then.
msg10589 (view) Author: silvan Date: 2022-02-10.19:39:18
I tried answering via email because I couldn't find your comment on Github. LimitedPruning should have called prune and not prune_operators, of course. This error must have slipped in when renaming the methods. The friend declaration is there exactly so that LimitedPruning can call the private prune method. Changing it to protected doesn't solve this issue and still requires a friend declaration.
msg10587 (view) Author: malte Date: 2022-02-10.19:22:24
I would like to discuss the prune vs. prune_operator thing before merging, but that's the only thing.
msg10579 (view) Author: silvan Date: 2022-02-10.16:02:11
Yes, the only difference is moving code from stubborn sets to the base class (the statistics tracking and timer), creating the limited pruning method and the afore-mentioned change of the interface (one public pruning method, one virtual internal one) that we discussed. I was alone on the grid when the experiments ran.

It is clear what you mean regarding syntax, I hope I fixed it in the change log, but I didn't care about the experiment configs.

I agree that this might be very well due to noise. I remember that we observed even larger differences than these here when integrating the CEGAR-PDBs code and we couldn't reproduce them or find good reasons for them.

I tried reproducing this locally. For smaller tasks, runtimes can vary a lot and deviate in both directions from the experimental report. For a larger task (openstacks-opt11-strips/p08.pddl), the difference can be reproduced: astar(lmcut) needs 35s on v2 compared to 32s on base. I tried floortile-opt14-strips/p03-4-3-2.pddl for the opposite (base should be slower than v2), but couldn't reproduce it (get the same times). I'm kind of reluctant to dive into profiling here due to the very small absolute differences which would make it hard.

Should we have a deeper look or do we accept the change in score (< 1 mostly) and call this issue done?
msg10574 (view) Author: malte Date: 2022-02-10.14:08:08
I was going to ask if we could add the pruning statistics to the report, but it's probably not necessary. We can see indirectly that nothing has changed there from score_generated.

So these were now run in a clean experiment, and the only difference between the two revisions are the changes to the stubborn set code? Some of the runtime swings are pretty wild, but I guess that sometimes happens depending on the grid load. It would be interesting to see if any of the bad cases can be reproduced locally in non-noisy runs.

At this point I wouldn't rule out noise. There are also many cases that get faster.

BTW, the configs in the report still have syntax errors (missing parentheses) that are only accepted due to bugs in the option parser. (I think I commented on this on github in the context of the change log message, but let me know if you need more details.)
msg10571 (view) Author: silvan Date: 2022-02-10.10:54:42
New results: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1042-v2-issue1042-base-issue1042-v2-compare.html

Unfortunately, all configs solve fewer tasks and the superficial reason seems to be an increased runtime. 

I don't really have a good idea of what this causes. Might it be the additional call of the "internal" prune method within the public one, prune_operators, called from search? Since even the null-pruning config is affected, I don't have any other ideas.
msg10569 (view) Author: silvan Date: 2022-02-10.09:40:58
I addressed the comments and now wait for experiments to finish.
msg10566 (view) Author: malte Date: 2022-02-09.18:38:03
I looked at the pull request and I like it. I left a few polishing comments. Regarding the experiments, I don't think the one we have is very helpful, but I won't insist on another experiment before merging.

(Once again, it would be nice if we had something we run automatically that means we don't have to do these "let's be careful just in case" experiments all the time...)
msg10565 (view) Author: malte Date: 2022-02-09.18:27:34
I looked at the results, but I didn't comment on them because I wasn't sure what to make of them. It would be good to know if the time difference is really caused by running the two halves of the experiments at different times, and with these reverted changes, the code tested in the experiment is now quite different from the code we're thinking of merging.
msg10552 (view) Author: silvan Date: 2022-02-09.09:05:07
I reverted the change. Can you please have a look at the latest results I posted?
msg10545 (view) Author: malte Date: 2022-02-08.18:11:07
> Should we revert removal of null pruning then and let it override
> "print_statistics" to not do anything?

I'd be happier with that (although I'm sorry for causing extra work).

Sorry for the late reply, my 14:00 meeting too much longer than I expected.
msg10539 (view) Author: silvan Date: 2022-02-08.16:31:20
I additionally ran the baseline and created a comparison report: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1042-base-issue1042-base-issue1042-v1-compare.html

Note that both experiments ran individually (they use different command line options), which probably explains the tendency for a slower version v1 (A* without pruning runs out of times in two cases where this didn't happen in the baseline version).
msg10537 (view) Author: silvan Date: 2022-02-08.14:16:48
Thanks for the explanations!

Should we revert removal of null pruning then and let it override "print_statistics" to not do anything?
msg10536 (view) Author: malte Date: 2022-02-08.13:40:54
The null pruning method was intentionally there to avoid exactly the kind of "if"s that we now had to add. It's a common thing in the strategy pattern to have a "do-nothing" strategy when that is a viable option exactly to avoid having a special case for this, and I quite like it. I also think it has value from a user perspective. For example, you can easily set up an n-by-m experiment comparing A* with n heuristics and m configurations of pruning without having to special-case the "don't prune" scenario.

The "add_pruning_options_to_parser" is in the wrong place only because we don't currently have a right place for these kinds of things. Note the other similar comments in the same file and the comment for this method that mentions to move this somewhere else once we've added pruning to the other search algorithms.

I agree it shouldn't be a static function of SearchEngine; it should be a free function instead. This is already mentioned as a TODO in the header file. I don't think this is dissimilar from the many other similar methods for recurring options (e.g. to add a RNG option, perhaps also one for verbosity)? Perhaps rather than being in the file that implemented the base class, it could also be in a separate utilities file or something like this, but I think this is better done when we do the general long-overdue cleanup of the SearchEngine class hierarchy.

It's very similar to "add_succ_order_options" for lazy search: there is nothing about these options that is conceptually specific to lazy search, so we moved the related option code to the base class for the moment even though it is currently only used by lazy search, with the intention of eventually adding the missing functionality to eager search. Moving this back to lazy search would not be a step in the intended direction.
msg10535 (view) Author: silvan Date: 2022-02-08.13:20:50
> For the former, I think I understand the motivation behind this (it probably has something to do with allowing to have no output for null pruning but have output for the others), but would have preferred a less disruptive solution. I don't think this change is mentioned in the change log, is it? It's a user-facing change (some command-line configurations that used to work no longer work), so it definitely should be.

Not wanting to print statistics was the trigger, but I think there is no reason to have a null pruning method. I assume Florian was right that this was added in this form due to the lack of knowledge of "optional options" (or them not being implemented yet). I don't see why any user would use/have used pruning=null and would need to adapt their command line now. But I agree that the nicer solution would be to have a separate issue for this. Should I move this change into its own issue?

>For the latter, I don't see a motivation or a connection to this issue, and I think it's just a bad change. The only reason why other search algorithms (lazy search, EHC) currently don't have pruning is that we didn't bother to add the option yet, but there is nothing about pruning that seems related to eager search to me.

I think that it generally makes sense to make small clean-ups while stumbling over things that are not perfect or have TODOs associated with them. I interpreted the second part of the comment of add_pruning_options_to_parser

"   Method doesn't belong here because it's only useful for certain derived classes.
   TODO: Figure out where it belongs and move it there. */"

as "we are not sure all search engines can or should support pruning". I really think that adding "add_pruning_options" to SearchEngine was a mistake in the first place, since SearchEngine doesn't even have a member "pruning_method", only "EagerSearch" does. Anyway, I see your point regarding this being unrelated to this issue and therefore reverted the change.

>Regarding the change log more generally, I think changes like these should explain the command-line change with an example. I would use one with atom-based stubborn sets since this is also a configuration we would recommend.
Added.
msg10533 (view) Author: malte Date: 2022-02-08.12:54:16
Regarding the experiments, I think experiments that allow us to do a before/after comparison are a lot easier to interpret than experiments that just test the new revision but not the original one. But I guess it's OK.

Regarding the pull request, I don't like that this does so many things that are unrelated to the issue, such as removing the null pruning method and moving the pruning options from common functionality in a search engine base class to eager search only.

For the former, I think I understand the motivation behind this (it probably has something to do with allowing to have no output for null pruning but have output for the others), but would have preferred a less disruptive solution. I don't think this change is mentioned in the change log, is it? It's a user-facing change (some command-line configurations that used to work no longer work), so it definitely should be.

For the latter, I don't see a motivation or a connection to this issue, and I think it's just a bad change. The only reason why other search algorithms (lazy search, EHC) currently don't have pruning is that we didn't bother to add the option yet, but there is nothing about pruning that seems related to eager search to me.

Regarding the change log more generally, I think changes like these should explain the command-line change with an example. I would use one with atom-based stubborn sets since this is also a configuration we would recommend.
msg10532 (view) Author: silvan Date: 2022-02-08.10:51:55
I ran A* with LM-cut on the optimal benchmarks for 5 minutes, in the following configurations:
- no pruning (we replaced the null pruning method by making pruning an optional option)
- simple stubborn sets without and with limits
- atom-centric stubborn sets without and with limits

Here are the results: https://ai.dmi.unibas.ch/_tmp_files/sieverss/issue1042-v1.html

There are no crashes and I couldn't find any unexpected results.

From my side, this is ready to be merged.
msg10521 (view) Author: malte Date: 2022-02-07.19:32:46
Experiments: I think so, yes, I feel a bit uncomfortable if we don't run any experiments. Happy to use a short timeout (like 1 minute) and a limited set of configurations/benchmarks.
msg10498 (view) Author: silvan Date: 2022-02-04.11:31:47
I created a pull-request: https://github.com/aibasel/downward/pull/89
Do we need to run experiments for this issue?
msg10494 (view) Author: silvan Date: 2022-02-03.18:15:43
We want to move the current functionality for disabling pruning if not enough pruning is done within x expansions to a separate LimitedPruning class, which wraps an actual pruning class. We further want to move handling of statistics, which is currently done in the stubborn set base class, to the pruning base class. The latter will keep track of the number of generated states and the number of pruned states and print statistics depending on a verbosity level, which allows disabling printing statistics for the null pruning method.
History
Date User Action Args
2022-07-13 08:51:17silvansetmessages: + msg10760
2022-07-12 12:39:35maltesetmessages: + msg10754
2022-07-12 12:10:19maltesetmessages: + msg10752
2022-07-12 10:50:35silvansetmessages: + msg10751
2022-07-04 12:14:32maltesetmessages: + msg10737
2022-07-04 12:11:30silvansetmessages: + msg10736
2022-02-11 12:01:07silvansetstatus: testing -> resolved
messages: + msg10601
2022-02-11 11:55:44silvansetmessages: + msg10600
2022-02-11 11:49:19maltesetmessages: + msg10599
2022-02-11 11:36:01maltesetmessages: + msg10598
2022-02-11 10:55:52maltesetmessages: + msg10597
2022-02-10 22:52:41silvansetmessages: + msg10591
2022-02-10 21:01:14maltesetmessages: + msg10590
2022-02-10 19:39:18silvansetmessages: + msg10589
2022-02-10 19:22:24maltesetmessages: + msg10587
2022-02-10 16:02:11silvansetmessages: + msg10579
2022-02-10 14:08:08maltesetmessages: + msg10574
2022-02-10 10:54:42silvansetmessages: + msg10571
2022-02-10 09:40:58silvansetmessages: + msg10569
2022-02-09 18:38:03maltesetmessages: + msg10566
2022-02-09 18:27:34maltesetmessages: + msg10565
2022-02-09 09:05:07silvansetmessages: + msg10552
2022-02-08 18:11:07maltesetmessages: + msg10545
2022-02-08 16:31:20silvansetmessages: + msg10539
2022-02-08 14:16:48silvansetmessages: + msg10537
2022-02-08 13:40:54maltesetmessages: + msg10536
2022-02-08 13:20:50silvansetmessages: + msg10535
2022-02-08 12:54:16maltesetmessages: + msg10533
2022-02-08 10:51:56silvansetstatus: in-progress -> testing
messages: + msg10532
2022-02-07 19:32:46maltesetmessages: + msg10521
2022-02-04 11:31:47silvansetstatus: unread -> in-progress
assignedto: silvan
messages: + msg10498
2022-02-03 18:15:43silvancreate