Title M&S refactoring part 2: get rid of LabelEquivalenceRelation and use label mappings local to TransitionSystem instead
Priority feature Status resolved
Superseder Nosy List atorralba, jendrik, malte, salome, silvan
Assigned To silvan Keywords
Optional summary
Part of issue567, and to some extent follow-up of issue851.

Pull request:

Possible follow-up issue:
- rework EquivalenceRelation which can be the bottleneck of the merge-and-shrink computation in certain tasks

Created on 2019-09-18.15:34:48 by silvan, last changed by silvan.

Part of issue567, and to some extent follow-up of issue851.

Pull request:

Possible follow-up issue:
- rework EquivalenceRelation which can be the bottleneck of the merge-and-shrink computation in certain tasks
msg11020 (view) Author: silvan Date: 2023-02-14.16:37:04
msg11013 (view) Author: silvan Date: 2023-02-10.20:31:10
We decided to go with v24. I merged the latest changes from main and re-ran experiments, also using sbMIASM as merge strategy combined with SCCs.

The results are still very positive, although not as much any more as with v20, which was before we did some refactoring and changed the algorithm for applying label reductions:

With that, we need to write a commit message and are ready to commit the changes of this issue.
msg11000 (view) Author: malte Date: 2023-02-10.13:33:49
How do we proceed? Merge v24?

There is probably more to be done here, but also it doesn't all have to be done in one commit.
msg10997 (view) Author: silvan Date: 2023-02-09.22:59:18
In v25, we hoped to remedy the slowdown of v24 by only sorting label groups on demand, i.e., before removing labels from groups due to label reduction. Unfortunately, things only got a lot worse:
msg10988 (view) Author: silvan Date: 2023-02-09.08:55:41
v22: Move the constructor of LocalLabelInfo to header again.
restores (some of) the runtime performance

v23: I noticed that with the removal of LocalEquivalenceRelation, I (accidentally?) also removed the iterator for TransitionSystem, which resulted in *not* skipping inactive local labels (which do not represent any label anymore). v23 fixes this.
slight decrease of M&S construction time

v24: Always have label groups sorted and when applying label reductions, remove reduced labels in batches instead of individually
Qquite a notable increase of M&S construction time (and thus total time), also slight increase of memory. It seems that sorting is more expensive than what we save by repeated resizing of vectors.
msg10987 (view) Author: silvan Date: 2023-02-08.13:30:34
Malte and I discussed the results and looked a bit at the code. The change that not only fixed the constructor using "move", but also moved it to the implementation file, should be partially reverted by moving the fixed version of it to the header file again. Otherwise, we continue with evaluating more changes.
msg10986 (view) Author: silvan Date: 2023-02-08.09:14:58
After addressing all minor comments, refactoring the iterator class for TransitionSystem to use STL-based iterators and getting rid of a friend declaration in LocalLabelInfo, I tagged version v21 but stumbled over some performance changes:

Note that the constant increase in memory is due to a difference in compilation, probably the inclusion of CPLEX. I re-compiled v20 under the same conditions in the meantime and there is no such increase.

To track the difference in M&S construction time, I tagged more versions:
v20a: after minor refactoring: - no differences
v20b: refactoring the iterator class: - beneficial changes
v20c: getting rid of friend declaration:
- slowed down computation. I really don't know why.
v21: using move in the move constructor - another slowdown; no clue why
msg10960 (view) Author: malte Date: 2023-02-01.23:45:43
From my side, feel free to merge.
msg10958 (view) Author: silvan Date: 2023-02-01.10:27:22
The debug experiments finished without errors:
msg10956 (view) Author: silvan Date: 2023-01-31.19:13:11
Thanks a lot for the comments! I fixed all minor things where no discussion was necessary and replied to a few other comments where we should discuss. 

Regarding EquivalenceRelation, I fixed 1 and 3 (the removal of the latter was an oversight). I'm fine with not dealing with that class in more detail in this issue. I think that we simplified (and thereby improved) it already quite a bit now.
msg10947 (view) Author: malte Date: 2023-01-27.16:02:11
Regarding EquivalenceRelation:

1. It seems that the system #include can (and hence should) be moved from the header to the implementation file.
2. There are parts of the design that I find weird, but it looks like this weirdness was already previously present.
3. It seems that the refine method used to include a comment about its algorithmic complexity, and now this comment is gone. Is there a good reason for this? As a user of such a class, I'd generally like to know such things when they are not obvious.

More generally, I think the class looks in need of revision and also better documentation, especially as we want to keep treating it as a general tool (i.e., have it inside the algorithms directory). But this doesn't necessarily have to be part of this issue.

I cannot really review the class fully at the moment, but if you're happy to proceed without such a review, then we can proceed.
msg10946 (view) Author: malte Date: 2023-01-27.13:58:27
The experiments look good, and I'm done looking at the pull request except that I still need to look at equivalence_relation.{cc,h}.

I left mostly polishing comments that you can ignore for the most part if you prefer, apart from things like the typo and changelog entry. I'll have a look at the equivalence relation class later.
msg10945 (view) Author: silvan Date: 2023-01-27.12:04:43
Interpretation of results: the computation of merge-and-shrink now is significantly faster and requires less memory, with very few exceptions only. This allows the atomic FTS to be computed in 34 more tasks than before. Total runtime is also reduced and a few more tasks can be solved.
msg10803 (view) Author: silvan Date: 2022-07-21.14:50:55
For completeness, experiments comparing main against the latest version:
msg10802 (view) Author: silvan Date: 2022-07-20.10:54:28
This issue is now ready for a final code review.

The main change was to replace LabelEquivalenceRelation, which stored locally equivalent labels in a list<int> and additionally stored an iterator into that list for fast removal during label reduction, by a simple vector<int> without storing an additional iterator. That greatly simplified the code and also made everything use less memory and run faster, except for label reductions in cases where the local label equivalence classes are huge. For example, in agricola-opt18, runtime of label reduction increases significantly because looking for just a few reduced labels in a large local label equivalence class takes very long time.
msg10801 (view) Author: silvan Date: 2022-07-20.08:57:06
I re-ran the latest experiments (after merging from main; which is base-v4), using *no* time limit for the M&S computation, as this made it harder to see the impact of the change. Indeed, the results are a lot clearer now, clearly positive in almost all cases.

v15 (all local labels are made contiguous always):
v16 (local labels are not contiguous and we reserve memory for the maximum number of possible labels):
v17 (as v16, but without reserving memory):

v15 vs v17:

As stated below, I think we should stick with v17 because making local labels contiguous doesn't have benefits and reserving very large amounts of memory has a clear negative impact.

v18 (store the number of active aka. non-reduced labels rather than computing them on demand):
I'm still surprised that this has a slightly negative impact on time rather than on memory (for storing one additional int). 

v19 (use vector plus sort and unique rather than set into vector for computing transitions of reduced labels):
msg10793 (view) Author: silvan Date: 2022-07-19.08:37:19
In v19, we use a vector to collect all transitions of a reduced labels and then use sort and unique. Previously, we used a set and turned this into a vector at the end. Definitely a positive impact:
msg10789 (view) Author: silvan Date: 2022-07-18.16:14:08
v18 keeps track of the number of active (aka. non-reduced) labels in the Labels class rather than computing that number over and over again for each label reduction.

Results look somewhat mixed w.r.t.  Inspecting a few instances where the construction time of merge-and-shrink increases, this seems to be due to our insufficiently fine-grained time limit feature. Sometimes, the previous configuration would just finish a merge/label reduction slightly after 900s (the time limit for M&S), and the new configuration, being slightly faster, continues with the next iteration where label reduction takes a few minutes before the time limit is checked again and M&S computation is terminated. There are also cases where the M&S computation gets slightly slower - this can only be due to fluctuations on the grid. Overall, coverage is not affected and the changes in runtime scores are small so that we can keep the change.
msg10783 (view) Author: silvan Date: 2022-07-18.10:04:54
I merged the latest main branch into this issue and re-ran experiments: base-v4 vs v15

Everything looks good.

Further, I re-evaluated the difference of making all local labels contiguous (v15; introduced in v3) vs. possibly adding a new local label when applying label reduction. In version v16, the code was moved back to not making local labels contiguous, however, reserving memory for the maximum number of labels which is also the (theoretical) maximum number of local labels a transition system could have. This significantly lowers the number of cases where the construction of the atomic FTS succeeds:

In v17, I therefore evaluated the middle ground of not making local labels contiguous, but neither reserving memory, thus accepting a few reallocations when creating new local labels. This version is comparable with v15, slightly faster (expected), but no clear drawbacks:

Since I don't see any direct benefits for making local labels contiguous, I would opt for the latest version which doesn't do this but which also doesn't (unnecessarily) large amounts of memory.
msg10617 (view) Author: silvan Date: 2022-02-21.15:09:56
I need to double-check but if I remember correctly, then I discussed with Florian and Salomé that we wanted to try using bitsets for storing global labels and see if this helps reducing the bottle neck that label reduction can be in some domains. That would mean that the issue is waiting for me and not for you :)
msg10616 (view) Author: malte Date: 2022-02-21.15:00:00
Hi all, what's the status? Silvan, is this waiting for code review? If yes, can you find a reviewer? I'd also like to have a quick look before merging, but things will move much faster if I know someone else approved it already.

The experimental results look fine to me.
msg10441 (view) Author: jendrik Date: 2021-09-22.16:18:22
I'd err on the side of readable code.
msg10440 (view) Author: silvan Date: 2021-09-22.14:58:49
In v14, I implemented a change suggested by Florian: before, TransitionSystem had three vectors to store information about local labels: local_label_to_global_labels, mapping each local label to the group of locally equivalent global labels it represents; local_label_to_cost, mapping each local label to the minimum cost of the represented global labels; local_label_to_transitions, mapping each local label to the set of transitions of the represented global labels. Now, there is a class LocalLabelInfo holding the group of represented global labels, the cost and the transitions, and TransitionSystem has a single vector storing LocalLabelInfo for each local label.

The results suggest that this refactoring is beneficial for the construction time of the heuristic, possibly due to improved cache efficiency (for a local label, its information is now stored together), however at the cost of an increase in memory consumption (I assume because of the custom class which serves as an intermediate wrapper):

I'm not sure if we like this trade-off. The code, however, was improved through this refactoring because it avoids repetitive pieces of code when accessing different information for a given local label. Any thoughts on the results/the refactoring?
msg10439 (view) Author: silvan Date: 2021-09-21.10:55:35
Results after some refactoring; no performance changes:
msg10244 (view) Author: malte Date: 2021-04-09.12:04:55
> Apparently, the overhead of that method, which creates a HashState and calls
> feed, is too large when hashing plain data types like int. I think we should
> check if the planner uses HashMap/HashSet with int or similar data types at
> other places. Also, we should document that using the stl data structures
> directly is preferred in such cases.

I think this is not the first time we discover this. These hash functions are designed with container objects in mind and with an emphasis on distributing as uniformly as possible. They are certainly not cryptographic hash functions, but in the spectrum of "cheap computation vs. uniform distribution", they emphasize uniform distribution more than the typical library implementations.

What makes this even more complicated is that trivial hash functions like
    [](int i) { return i; }
that can be very problematic for distribution (for mod-based hash tables) can be excellent for cache locality in typical scenarios where we see many keys clustered close to each other.

We also have the dedicated IntHashSet type that can be useful in certain contexts, but I think it is primarily designed to be memory-efficient for very large hash tables, not to be fast for small ones. (It is used only by the state registry.)

The question is how best to document things like these. If we just mention this in hash.h, I am not sure we will see it when we need to.
msg10243 (view) Author: silvan Date: 2021-04-09.10:55:21
It turns out that the culprit of the performance loss in v12e was not the simplification of EquivalenceRelation, but the switch from std::unordered_map to utils::HashMap. I got skeptical when "std::uint64_t get_hash64(const T &value)" showed up in a profile. Indeed, reverting this change in v12e fixes the performance loss:
and also mildly improves over 12c, although the scale of changes could be in the range of noise:

Apparently, the overhead of that method, which creates a HashState and calls feed, is too large when hashing plain data types like int. I think we should check if the planner uses HashMap/HashSet with int or similar data types at other places. Also, we should document that using the stl data structures directly is preferred in such cases.

Furthermore, when looking at a profile of the first task of parking-opt14-strips, we noticed that EquivalenceRelation::refine is responsible for 87% of the run-time of the first three merge-and-shrink iterations. It might be worth looking into completely reworking EquivalenceRelation in another issue. I added an item to the summary.
msg10242 (view) Author: silvan Date: 2021-04-06.19:23:37
v12: only refactoring in comparison to v9-fix (reported in the previous comment):
By now, I assume that changes in runtime scores of up to 2 can just be attributed to noise in the sense of code changes that should not impact the performance, but might affect memory layout or whatever else.

v12b: removes the class GlobalLabel (which only contained an int standing for the cost of the label) and GlobalLabels stores vector<int> label_costs directly instead of unique pointer to GlobalLabel:
If anything, I expected a positive impact because of one less layer of indirection and not using unique pointers. But the effect is "negative", in the same amount as v12 compared to v9-fix was positive even though this was pure refactoring.

v12c: add iterators for GlobalLabels which allows iterating over "valid" labels, which, however, is only used during the construction of the atomic FTS:
Basically no effect, was expected.

v12d: Florian and I greatly simplified the class EquivelanceRelation (in algorithms). It only works with ints now, only can be constructed as the universal equivalence relation, and no longer supports "unspecified elements". This allows using it with only valid labels, which should avoid having reduced labels as "unspecified elements" in the equivalence relation.
The results are bad and I don't know (yet) why. We moved previously templated methods to the cc file, maybe this has an effect. We'll need to investigate this further.
msg10241 (view) Author: silvan Date: 2021-04-05.18:49:10
I noticed that the loop that erases old labels from transition systems when applying label reductions unnecessarily considered all labels of a label group instead of stopping after erasing the label. This significantly improves the performance:
msg10238 (view) Author: silvan Date: 2021-03-30.10:24:39
After merging the latest changes from main (v9), the changes in this issue compare less favourably to main (base-v3) than previously, although overall still positively:
(successful constructions for more tasks, less memory consumption, often faster, sometimes significantly slower, however).

Compared to the previous version (v8), the merge-and-shrink construction slows down a bit (ms_construction_time, score_ms_construction_time):
I don't really understand why, as all changes between v8 and v9 are due to changes in main. 

I therefore also compared the two main revisions against each other:
The results of this comparison are very surprising to me because I don't know what could cause this difference. The diff between the two versions only makes cosmetic changes in M&S. The major things we changed in Fast Downward, like the state representation, shouldn't affect the construction of M&S heuristics.

Since the latest version still uses much less memory and only becomes significantly slower in a few domains, I think we can accept that strange performance change. (We cannot do something about the weird change due to main anyway.) Instead, when discussing this issue with Florian and Salomé, we thought it would be a good idea to look at the problematic domains, in particular, agricola, where construction time can be doubled compared to main. 

I did that and found that applying label reductions to transition systems is the culprit: where we previously stored , for each label, the id of its "local label group" and an iterator into that group (a list), we now only store the group id and need to search for the label in its group (now a vector) for removing it. I attempted to improve the situation using set (v10) or HashSet (v11). While this helped in the previously problematic case agricola (to some extent, only), it significantly slowed down the computation in other domains:
So I think using vectors (v9) is the right direction and we should discuss if we can live with slower construction times in a few domains. Alternatively, we could (again) store an iterator into a label group for each label, but I didn't like that previous part of the code too much, so I would rather try to avoid it.

Florian, Salomé and I also discussed if reverting the change that all "local label ids" (i.e., the label groups) are stored contiguously (added in v3). This might trade some memory usage for runtime, although back then we didn't observe too much penalty for enforcing the contiguous storing of label groups.
msg9714 (view) Author: malte Date: 2020-09-17.09:23:09
Sounds good to me. The cost partitioning thing is not a blocker or an overriding concern, just a thing I suggest we keep in mind. It is a goal to open these things up a bit more given how much the BDE project focuses on this aspect.
msg9713 (view) Author: silvan Date: 2020-09-16.18:38:01
The new implementation would make implementing optimal cost partitioning easier. For SCP, it would probably not help but could make it more complicated as we need to compute the remaining costs of the actual labels, not some labels "local" to a transition system.

Nevertheless, I'm not sure if this consideration (regarding cost partitioning) should be a central concern in this issue because I think that the change in this issue makes the code related to transition systems easier in general. But of course, getting someone else to review this first would be good. I hoped Alvaro could have a look into this since he also would like to see this change, I think, but he might be too busy in Denmark now. I'll ask Florian/Thomas if they can find some time.
msg9712 (view) Author: malte Date: 2020-08-31.17:53:51
Hi Silvan, to move this forward, it would be good if someone else could look at the  code first.

This is currently an implementation detail of merge-and-shrink, but it may become more visible when we add more cost partitioning or operator-counting constraint features. Does the new representation help with that, hinder that, or is it neutral? Would it make sense to get someone else who was included in cost partitioning and merge-and-shrink to review?
msg9343 (view) Author: silvan Date: 2020-06-09.09:30:59
I evaluated the current implementation (with all improvements together) after merging from default:
msg9332 (view) Author: silvan Date: 2020-05-24.16:00:21
I looked into this again today but couldn't actually find anything else I would like to do in this issue. So now, with an unnecessary delay of 8 months :-), I'd like to ask for a review. Or, as a first step, for ideas of how to simplify reviewing in this case. The pull request list the following changes:

+0 -59
M src/search/algorithms/
+64 -1
M src/search/algorithms/equivalence_relation.h
+3 -7
M src/search/merge_and_shrink/
+8 -9
M src/search/merge_and_shrink/
+5 -5
M src/search/merge_and_shrink/factored_transition_system.h
+69 -47
M src/search/merge_and_shrink/
+72 -0
A src/search/merge_and_shrink/
+47 -0
A src/search/merge_and_shrink/global_labels.h
+0 -134
D src/search/merge_and_shrink/
+0 -122
D src/search/merge_and_shrink/label_equivalence_relation.h
+3 -4
M src/search/merge_and_shrink/
+0 -62
D src/search/merge_and_shrink/
+0 -46
D src/search/merge_and_shrink/labels.h
+4 -1
M src/search/merge_and_shrink/
+1 -2
M src/search/merge_and_shrink/
+1 -3
M src/search/merge_and_shrink/
+270 -129
M src/search/merge_and_shrink/
+50 -52
M src/search/merge_and_shrink/transition_system.h 

The files labels.* and global_labels.* should be reviewed next to each other (the new class is a bit simpler) and some of the logic of what used to be in label_equivalence_relation is now part of transition_system (and fts_factory, which creates transition systems), however also modified as described below. The change to equivalence_relation is due to making refine being a templated method (therefore in the header). The rest are only small changes required to adapt to the changed interface of transition systems.
msg8967 (view) Author: silvan Date: 2019-09-26.14:32:44
Added link to meta issue567.

I just found that issue851 can be seen as a predecessor of this issue. To quote its summary: 
Possible further improvements: do not store information for all labels in each
transition system, but only for the relevant ones. This requires either a
concept of local label numbers and a mapping between global and local labels,
or using hash maps instead of vectors whenever storing information per label,
leaving out irrelevant labels.
msg8964 (view) Author: silvan Date: 2019-09-21.17:35:43
I'll you know, yes.

In step 3, after reducing labels and shrinking, we always make sure that there are no gaps in vector<vector<Transition>>, i.e., we have a contiguous "numbering" of "local labels". 

The effect can be both negative or positive, especially in large instances, sometimes reducing memory and speed, sometimes trading the former for the latter:
msg8963 (view) Author: malte Date: 2019-09-20.21:36:42
Hi Silvan, I'm curious, but lack the time, so I'll stay quiet for the time being.  But sounds exciting! Let us know when something needs to be done in terms of reviewing etc.
msg8962 (view) Author: silvan Date: 2019-09-19.11:41:38
In a small second step, I did some more minor refactoring to avoid having a global label mapping from global to local labels, but instead having it individually for each transition system. No large changes:
msg8961 (view) Author: silvan Date: 2019-09-18.22:04:06
The results of the first step are in. The only change is to replace LabelEquivalenceRelation (which uses a hash map to map global labels to local label group ids and a list for said groups) by a combination of global to local and local to global label mappings in each transition system, only using vectors.

There is a clear positive performance trend and we can finish the computation of atomic FTS and the whole heuristic in more cases than before. (The attributes ms_out_of_memory and ms_out_of_time were not correctly parsed as of this writing; the report will be updated tomorrow.)

Here's a pull request if someone is already curious:
msg8960 (view) Author: silvan Date: 2019-09-18.15:34:48
In a recent discussion with Álvaro, we came to the conclusion that it would be worth trying to change the representation of transition systems. In particular, getting rid of LabelEquivalenceRelation and its underlying data structure of using a hash map from labels to local label group ids, and using lists for label groups. Using the notion of "local labels" and vector<int> to store both a mapping from "global" to local labels and vice versa could not only help simplify the code greatly, but also potentially improve performance.

In a first step, we will replace LabelEquivalenceRelation by appropriate mappings (vector<int>) as just described. Later, we could also try to always keep local and/or global label numbers contiguous and thereby get rid of the need for testing if a global label number is valid or not (because it has been reduced). In principle, we could also attempt the same thing for transition systems in factored transition systems. Finally, what is up for debate is whether we want to not store transitions of irrelevant labels, like it was the case years ago before the introduction of generalized label reduction.
Date User Action Args
2023-02-14 16:37:04silvansetstatus: reviewing -> resolved
messages: + msg11020
2023-02-10 20:31:10silvansetmessages: + msg11013
2023-02-10 13:33:49maltesetmessages: + msg11000
2023-02-09 22:59:18silvansetmessages: + msg10997
2023-02-09 08:55:41silvansetmessages: + msg10988
2023-02-08 13:30:34silvansetmessages: + msg10987
2023-02-08 09:14:58silvansetmessages: + msg10986
2023-02-01 23:45:43maltesetmessages: + msg10960
2023-02-01 10:27:22silvansetmessages: + msg10958
2023-01-31 19:13:11silvansetmessages: + msg10956
2023-01-27 16:02:11maltesetmessages: + msg10947
2023-01-27 13:58:27maltesetmessages: + msg10946
2023-01-27 12:04:43silvansetmessages: + msg10945
2022-07-21 14:50:55silvansetmessages: + msg10803
2022-07-20 10:54:28silvansetmessages: + msg10802
2022-07-20 08:57:06silvansetmessages: + msg10801
2022-07-19 08:37:19silvansetmessages: + msg10793
2022-07-18 16:14:08silvansetmessages: + msg10789
2022-07-18 10:04:54silvansetmessages: + msg10783
2022-02-21 15:09:56silvansetmessages: + msg10617
2022-02-21 15:00:01maltesetmessages: + msg10616
2021-09-22 16:18:22jendriksetmessages: + msg10441
2021-09-22 14:58:49silvansetmessages: + msg10440
2021-09-21 10:55:36silvansetmessages: + msg10439
2021-04-09 12:04:55maltesetmessages: + msg10244
2021-04-09 10:55:22silvansetmessages: + msg10243
summary: Part of issue567, and to some extent follow-up of issue851. Pull request: -> Part of issue567, and to some extent follow-up of issue851. Pull request: Possible follow-up issue: - rework EquivalenceRelation which can be the bottleneck of the merge-and-shrink computation in certain tasks
2021-04-06 19:23:38silvansetmessages: + msg10242
2021-04-05 18:49:10silvansetmessages: + msg10241
2021-03-30 10:24:41silvansetnosy: + salome
messages: + msg10238
2020-09-17 09:23:10maltesetmessages: + msg9714
2020-09-16 18:38:02silvansetmessages: + msg9713
2020-08-31 17:53:52maltesetmessages: + msg9712
2020-07-24 14:23:54silvansetsummary: Part of issue567, and to some extent follow-up of issue851. -> Part of issue567, and to some extent follow-up of issue851. Pull request:
2020-06-09 09:30:59silvansetstatus: in-progress -> reviewing
messages: + msg9343
2020-05-24 16:00:22silvansetmessages: + msg9332
2019-09-26 14:32:44silvansettitle: M&S: get rid of LabelEquivalenceRelation and use label mappings local to TransitionSystem instead -> M&S refactoring part 2: get rid of LabelEquivalenceRelation and use label mappings local to TransitionSystem instead
messages: + msg8967
summary: Part of issue567, and to some extent follow-up of issue851.
2019-09-21 17:35:43silvansetmessages: + msg8964
2019-09-20 21:36:42maltesetmessages: + msg8963
2019-09-19 11:41:38silvansetmessages: + msg8962
2019-09-19 09:40:57jendriksetnosy: + jendrik
2019-09-18 22:04:06silvansetmessages: + msg8961
2019-09-18 15:34:48silvancreate