Title M&S refactoring part 2: get rid of LabelEquivalenceRelation and use label mappings local to TransitionSystem instead
Priority feature Status reviewing
Superseder Nosy List atorralba, jendrik, malte, salome, silvan
Assigned To silvan Keywords
Optional summary
Part of issue567, and to some extent follow-up of issue851.

Pull request:

Possible follow-up issue:
- rework EquivalenceRelation which can be the bottleneck of the merge-and-shrink computation in certain tasks

Created on 2019-09-18.15:34:48 by silvan, last changed by silvan.

Part of issue567, and to some extent follow-up of issue851.

Pull request:

Possible follow-up issue:
- rework EquivalenceRelation which can be the bottleneck of the merge-and-shrink computation in certain tasks
msg10803 (view) Author: silvan Date: 2022-07-21.14:50:55
For completeness, experiments comparing main against the latest version:
msg10802 (view) Author: silvan Date: 2022-07-20.10:54:28
This issue is now ready for a final code review.

The main change was to replace LabelEquivalenceRelation, which stored locally equivalent labels in a list<int> and additionally stored an iterator into that list for fast removal during label reduction, by a simple vector<int> without storing an additional iterator. That greatly simplified the code and also made everything use less memory and run faster, except for label reductions in cases where the local label equivalence classes are huge. For example, in agricola-opt18, runtime of label reduction increases significantly because looking for just a few reduced labels in a large local label equivalence class takes very long time.
msg10801 (view) Author: silvan Date: 2022-07-20.08:57:06
I re-ran the latest experiments (after merging from main; which is base-v4), using *no* time limit for the M&S computation, as this made it harder to see the impact of the change. Indeed, the results are a lot clearer now, clearly positive in almost all cases.

v15 (all local labels are made contiguous always):
v16 (local labels are not contiguous and we reserve memory for the maximum number of possible labels):
v17 (as v16, but without reserving memory):

v15 vs v17:

As stated below, I think we should stick with v17 because making local labels contiguous doesn't have benefits and reserving very large amounts of memory has a clear negative impact.

v18 (store the number of active aka. non-reduced labels rather than computing them on demand):
I'm still surprised that this has a slightly negative impact on time rather than on memory (for storing one additional int). 

v19 (use vector plus sort and unique rather than set into vector for computing transitions of reduced labels):
msg10793 (view) Author: silvan Date: 2022-07-19.08:37:19
In v19, we use a vector to collect all transitions of a reduced labels and then use sort and unique. Previously, we used a set and turned this into a vector at the end. Definitely a positive impact:
msg10789 (view) Author: silvan Date: 2022-07-18.16:14:08
v18 keeps track of the number of active (aka. non-reduced) labels in the Labels class rather than computing that number over and over again for each label reduction.

Results look somewhat mixed w.r.t.  Inspecting a few instances where the construction time of merge-and-shrink increases, this seems to be due to our insufficiently fine-grained time limit feature. Sometimes, the previous configuration would just finish a merge/label reduction slightly after 900s (the time limit for M&S), and the new configuration, being slightly faster, continues with the next iteration where label reduction takes a few minutes before the time limit is checked again and M&S computation is terminated. There are also cases where the M&S computation gets slightly slower - this can only be due to fluctuations on the grid. Overall, coverage is not affected and the changes in runtime scores are small so that we can keep the change.
msg10783 (view) Author: silvan Date: 2022-07-18.10:04:54
I merged the latest main branch into this issue and re-ran experiments: base-v4 vs v15

Everything looks good.

Further, I re-evaluated the difference of making all local labels contiguous (v15; introduced in v3) vs. possibly adding a new local label when applying label reduction. In version v16, the code was moved back to not making local labels contiguous, however, reserving memory for the maximum number of labels which is also the (theoretical) maximum number of local labels a transition system could have. This significantly lowers the number of cases where the construction of the atomic FTS succeeds:

In v17, I therefore evaluated the middle ground of not making local labels contiguous, but neither reserving memory, thus accepting a few reallocations when creating new local labels. This version is comparable with v15, slightly faster (expected), but no clear drawbacks:

Since I don't see any direct benefits for making local labels contiguous, I would opt for the latest version which doesn't do this but which also doesn't (unnecessarily) large amounts of memory.
msg10617 (view) Author: silvan Date: 2022-02-21.15:09:56
I need to double-check but if I remember correctly, then I discussed with Florian and Salomé that we wanted to try using bitsets for storing global labels and see if this helps reducing the bottle neck that label reduction can be in some domains. That would mean that the issue is waiting for me and not for you :)
msg10616 (view) Author: malte Date: 2022-02-21.15:00:00
Hi all, what's the status? Silvan, is this waiting for code review? If yes, can you find a reviewer? I'd also like to have a quick look before merging, but things will move much faster if I know someone else approved it already.

The experimental results look fine to me.
msg10441 (view) Author: jendrik Date: 2021-09-22.16:18:22
I'd err on the side of readable code.
msg10440 (view) Author: silvan Date: 2021-09-22.14:58:49
In v14, I implemented a change suggested by Florian: before, TransitionSystem had three vectors to store information about local labels: local_label_to_global_labels, mapping each local label to the group of locally equivalent global labels it represents; local_label_to_cost, mapping each local label to the minimum cost of the represented global labels; local_label_to_transitions, mapping each local label to the set of transitions of the represented global labels. Now, there is a class LocalLabelInfo holding the group of represented global labels, the cost and the transitions, and TransitionSystem has a single vector storing LocalLabelInfo for each local label.

The results suggest that this refactoring is beneficial for the construction time of the heuristic, possibly due to improved cache efficiency (for a local label, its information is now stored together), however at the cost of an increase in memory consumption (I assume because of the custom class which serves as an intermediate wrapper):

I'm not sure if we like this trade-off. The code, however, was improved through this refactoring because it avoids repetitive pieces of code when accessing different information for a given local label. Any thoughts on the results/the refactoring?
msg10439 (view) Author: silvan Date: 2021-09-21.10:55:35
Results after some refactoring; no performance changes:
msg10244 (view) Author: malte Date: 2021-04-09.12:04:55
> Apparently, the overhead of that method, which creates a HashState and calls
> feed, is too large when hashing plain data types like int. I think we should
> check if the planner uses HashMap/HashSet with int or similar data types at
> other places. Also, we should document that using the stl data structures
> directly is preferred in such cases.

I think this is not the first time we discover this. These hash functions are designed with container objects in mind and with an emphasis on distributing as uniformly as possible. They are certainly not cryptographic hash functions, but in the spectrum of "cheap computation vs. uniform distribution", they emphasize uniform distribution more than the typical library implementations.

What makes this even more complicated is that trivial hash functions like
    [](int i) { return i; }
that can be very problematic for distribution (for mod-based hash tables) can be excellent for cache locality in typical scenarios where we see many keys clustered close to each other.

We also have the dedicated IntHashSet type that can be useful in certain contexts, but I think it is primarily designed to be memory-efficient for very large hash tables, not to be fast for small ones. (It is used only by the state registry.)

The question is how best to document things like these. If we just mention this in hash.h, I am not sure we will see it when we need to.
msg10243 (view) Author: silvan Date: 2021-04-09.10:55:21
It turns out that the culprit of the performance loss in v12e was not the simplification of EquivalenceRelation, but the switch from std::unordered_map to utils::HashMap. I got skeptical when "std::uint64_t get_hash64(const T &value)" showed up in a profile. Indeed, reverting this change in v12e fixes the performance loss:
and also mildly improves over 12c, although the scale of changes could be in the range of noise:

Apparently, the overhead of that method, which creates a HashState and calls feed, is too large when hashing plain data types like int. I think we should check if the planner uses HashMap/HashSet with int or similar data types at other places. Also, we should document that using the stl data structures directly is preferred in such cases.

Furthermore, when looking at a profile of the first task of parking-opt14-strips, we noticed that EquivalenceRelation::refine is responsible for 87% of the run-time of the first three merge-and-shrink iterations. It might be worth looking into completely reworking EquivalenceRelation in another issue. I added an item to the summary.
msg10242 (view) Author: silvan Date: 2021-04-06.19:23:37
v12: only refactoring in comparison to v9-fix (reported in the previous comment):
By now, I assume that changes in runtime scores of up to 2 can just be attributed to noise in the sense of code changes that should not impact the performance, but might affect memory layout or whatever else.

v12b: removes the class GlobalLabel (which only contained an int standing for the cost of the label) and GlobalLabels stores vector<int> label_costs directly instead of unique pointer to GlobalLabel:
If anything, I expected a positive impact because of one less layer of indirection and not using unique pointers. But the effect is "negative", in the same amount as v12 compared to v9-fix was positive even though this was pure refactoring.

v12c: add iterators for GlobalLabels which allows iterating over "valid" labels, which, however, is only used during the construction of the atomic FTS:
Basically no effect, was expected.

v12d: Florian and I greatly simplified the class EquivelanceRelation (in algorithms). It only works with ints now, only can be constructed as the universal equivalence relation, and no longer supports "unspecified elements". This allows using it with only valid labels, which should avoid having reduced labels as "unspecified elements" in the equivalence relation.
The results are bad and I don't know (yet) why. We moved previously templated methods to the cc file, maybe this has an effect. We'll need to investigate this further.
msg10241 (view) Author: silvan Date: 2021-04-05.18:49:10
I noticed that the loop that erases old labels from transition systems when applying label reductions unnecessarily considered all labels of a label group instead of stopping after erasing the label. This significantly improves the performance:
msg10238 (view) Author: silvan Date: 2021-03-30.10:24:39
After merging the latest changes from main (v9), the changes in this issue compare less favourably to main (base-v3) than previously, although overall still positively:
(successful constructions for more tasks, less memory consumption, often faster, sometimes significantly slower, however).

Compared to the previous version (v8), the merge-and-shrink construction slows down a bit (ms_construction_time, score_ms_construction_time):
I don't really understand why, as all changes between v8 and v9 are due to changes in main. 

I therefore also compared the two main revisions against each other:
The results of this comparison are very surprising to me because I don't know what could cause this difference. The diff between the two versions only makes cosmetic changes in M&S. The major things we changed in Fast Downward, like the state representation, shouldn't affect the construction of M&S heuristics.

Since the latest version still uses much less memory and only becomes significantly slower in a few domains, I think we can accept that strange performance change. (We cannot do something about the weird change due to main anyway.) Instead, when discussing this issue with Florian and Salomé, we thought it would be a good idea to look at the problematic domains, in particular, agricola, where construction time can be doubled compared to main. 

I did that and found that applying label reductions to transition systems is the culprit: where we previously stored , for each label, the id of its "local label group" and an iterator into that group (a list), we now only store the group id and need to search for the label in its group (now a vector) for removing it. I attempted to improve the situation using set (v10) or HashSet (v11). While this helped in the previously problematic case agricola (to some extent, only), it significantly slowed down the computation in other domains:
So I think using vectors (v9) is the right direction and we should discuss if we can live with slower construction times in a few domains. Alternatively, we could (again) store an iterator into a label group for each label, but I didn't like that previous part of the code too much, so I would rather try to avoid it.

Florian, Salomé and I also discussed if reverting the change that all "local label ids" (i.e., the label groups) are stored contiguously (added in v3). This might trade some memory usage for runtime, although back then we didn't observe too much penalty for enforcing the contiguous storing of label groups.
msg9714 (view) Author: malte Date: 2020-09-17.09:23:09
Sounds good to me. The cost partitioning thing is not a blocker or an overriding concern, just a thing I suggest we keep in mind. It is a goal to open these things up a bit more given how much the BDE project focuses on this aspect.
msg9713 (view) Author: silvan Date: 2020-09-16.18:38:01
The new implementation would make implementing optimal cost partitioning easier. For SCP, it would probably not help but could make it more complicated as we need to compute the remaining costs of the actual labels, not some labels "local" to a transition system.

Nevertheless, I'm not sure if this consideration (regarding cost partitioning) should be a central concern in this issue because I think that the change in this issue makes the code related to transition systems easier in general. But of course, getting someone else to review this first would be good. I hoped Alvaro could have a look into this since he also would like to see this change, I think, but he might be too busy in Denmark now. I'll ask Florian/Thomas if they can find some time.
msg9712 (view) Author: malte Date: 2020-08-31.17:53:51
Hi Silvan, to move this forward, it would be good if someone else could look at the  code first.

This is currently an implementation detail of merge-and-shrink, but it may become more visible when we add more cost partitioning or operator-counting constraint features. Does the new representation help with that, hinder that, or is it neutral? Would it make sense to get someone else who was included in cost partitioning and merge-and-shrink to review?
msg9343 (view) Author: silvan Date: 2020-06-09.09:30:59
I evaluated the current implementation (with all improvements together) after merging from default:
msg9332 (view) Author: silvan Date: 2020-05-24.16:00:21
I looked into this again today but couldn't actually find anything else I would like to do in this issue. So now, with an unnecessary delay of 8 months :-), I'd like to ask for a review. Or, as a first step, for ideas of how to simplify reviewing in this case. The pull request list the following changes:

+0 -59
M src/search/algorithms/
+64 -1
M src/search/algorithms/equivalence_relation.h
+3 -7
M src/search/merge_and_shrink/
+8 -9
M src/search/merge_and_shrink/
+5 -5
M src/search/merge_and_shrink/factored_transition_system.h
+69 -47
M src/search/merge_and_shrink/
+72 -0
A src/search/merge_and_shrink/
+47 -0
A src/search/merge_and_shrink/global_labels.h
+0 -134
D src/search/merge_and_shrink/
+0 -122
D src/search/merge_and_shrink/label_equivalence_relation.h
+3 -4
M src/search/merge_and_shrink/
+0 -62
D src/search/merge_and_shrink/
+0 -46
D src/search/merge_and_shrink/labels.h
+4 -1
M src/search/merge_and_shrink/
+1 -2
M src/search/merge_and_shrink/
+1 -3
M src/search/merge_and_shrink/
+270 -129
M src/search/merge_and_shrink/
+50 -52
M src/search/merge_and_shrink/transition_system.h 

The files labels.* and global_labels.* should be reviewed next to each other (the new class is a bit simpler) and some of the logic of what used to be in label_equivalence_relation is now part of transition_system (and fts_factory, which creates transition systems), however also modified as described below. The change to equivalence_relation is due to making refine being a templated method (therefore in the header). The rest are only small changes required to adapt to the changed interface of transition systems.
msg8967 (view) Author: silvan Date: 2019-09-26.14:32:44
Added link to meta issue567.

I just found that issue851 can be seen as a predecessor of this issue. To quote its summary: 
Possible further improvements: do not store information for all labels in each
transition system, but only for the relevant ones. This requires either a
concept of local label numbers and a mapping between global and local labels,
or using hash maps instead of vectors whenever storing information per label,
leaving out irrelevant labels.
msg8964 (view) Author: silvan Date: 2019-09-21.17:35:43
I'll you know, yes.

In step 3, after reducing labels and shrinking, we always make sure that there are no gaps in vector<vector<Transition>>, i.e., we have a contiguous "numbering" of "local labels". 

The effect can be both negative or positive, especially in large instances, sometimes reducing memory and speed, sometimes trading the former for the latter:
msg8963 (view) Author: malte Date: 2019-09-20.21:36:42
Hi Silvan, I'm curious, but lack the time, so I'll stay quiet for the time being.  But sounds exciting! Let us know when something needs to be done in terms of reviewing etc.
msg8962 (view) Author: silvan Date: 2019-09-19.11:41:38
In a small second step, I did some more minor refactoring to avoid having a global label mapping from global to local labels, but instead having it individually for each transition system. No large changes:
msg8961 (view) Author: silvan Date: 2019-09-18.22:04:06
The results of the first step are in. The only change is to replace LabelEquivalenceRelation (which uses a hash map to map global labels to local label group ids and a list for said groups) by a combination of global to local and local to global label mappings in each transition system, only using vectors.

There is a clear positive performance trend and we can finish the computation of atomic FTS and the whole heuristic in more cases than before. (The attributes ms_out_of_memory and ms_out_of_time were not correctly parsed as of this writing; the report will be updated tomorrow.)

Here's a pull request if someone is already curious:
msg8960 (view) Author: silvan Date: 2019-09-18.15:34:48
In a recent discussion with Álvaro, we came to the conclusion that it would be worth trying to change the representation of transition systems. In particular, getting rid of LabelEquivalenceRelation and its underlying data structure of using a hash map from labels to local label group ids, and using lists for label groups. Using the notion of "local labels" and vector<int> to store both a mapping from "global" to local labels and vice versa could not only help simplify the code greatly, but also potentially improve performance.

In a first step, we will replace LabelEquivalenceRelation by appropriate mappings (vector<int>) as just described. Later, we could also try to always keep local and/or global label numbers contiguous and thereby get rid of the need for testing if a global label number is valid or not (because it has been reduced). In principle, we could also attempt the same thing for transition systems in factored transition systems. Finally, what is up for debate is whether we want to not store transitions of irrelevant labels, like it was the case years ago before the introduction of generalized label reduction.
Date User Action Args
2022-07-21 14:50:55silvansetmessages: + msg10803
2022-07-20 10:54:28silvansetmessages: + msg10802
2022-07-20 08:57:06silvansetmessages: + msg10801
2022-07-19 08:37:19silvansetmessages: + msg10793
2022-07-18 16:14:08silvansetmessages: + msg10789
2022-07-18 10:04:54silvansetmessages: + msg10783
2022-02-21 15:09:56silvansetmessages: + msg10617
2022-02-21 15:00:01maltesetmessages: + msg10616
2021-09-22 16:18:22jendriksetmessages: + msg10441
2021-09-22 14:58:49silvansetmessages: + msg10440
2021-09-21 10:55:36silvansetmessages: + msg10439
2021-04-09 12:04:55maltesetmessages: + msg10244
2021-04-09 10:55:22silvansetmessages: + msg10243
summary: Part of issue567, and to some extent follow-up of issue851. Pull request: -> Part of issue567, and to some extent follow-up of issue851. Pull request: Possible follow-up issue: - rework EquivalenceRelation which can be the bottleneck of the merge-and-shrink computation in certain tasks
2021-04-06 19:23:38silvansetmessages: + msg10242
2021-04-05 18:49:10silvansetmessages: + msg10241
2021-03-30 10:24:41silvansetnosy: + salome
messages: + msg10238
2020-09-17 09:23:10maltesetmessages: + msg9714
2020-09-16 18:38:02silvansetmessages: + msg9713
2020-08-31 17:53:52maltesetmessages: + msg9712
2020-07-24 14:23:54silvansetsummary: Part of issue567, and to some extent follow-up of issue851. -> Part of issue567, and to some extent follow-up of issue851. Pull request:
2020-06-09 09:30:59silvansetstatus: in-progress -> reviewing
messages: + msg9343
2020-05-24 16:00:22silvansetmessages: + msg9332
2019-09-26 14:32:44silvansettitle: M&S: get rid of LabelEquivalenceRelation and use label mappings local to TransitionSystem instead -> M&S refactoring part 2: get rid of LabelEquivalenceRelation and use label mappings local to TransitionSystem instead
messages: + msg8967
summary: Part of issue567, and to some extent follow-up of issue851.
2019-09-21 17:35:43silvansetmessages: + msg8964
2019-09-20 21:36:42maltesetmessages: + msg8963
2019-09-19 11:41:38silvansetmessages: + msg8962
2019-09-19 09:40:57jendriksetnosy: + jendrik
2019-09-18 22:04:06silvansetmessages: + msg8961
2019-09-18 15:34:48silvancreate