Title Refactor landmark code in connection to exploration class.
Priority wish Status resolved
Superseder Nosy List clemens, jendrik, malte, remo, salome, silvan, thomas
Assigned To Keywords
Optional summary

Created on 2022-02-08.17:17:54 by clemens, last changed by salome.

File name Uploaded Type Edit Remove
callgrind.out.base-nomystery salome, 2022-04-25.14:19:08 application/octet-stream
callgrind.out.v2-nomystery salome, 2022-04-25.14:19:12 application/octet-stream
lmgraph_gen_time_relative-v3-opt.pdf salome, 2022-05-10.14:34:29 application/pdf
lmgraph_gen_time_relative-v3-sat.pdf salome, 2022-05-10.14:34:22 application/pdf
time_relative-soplex.pdf salome, 2022-04-26.10:09:35 application/pdf
time_relative-v3-opt.pdf salome, 2022-05-10.14:34:08 application/pdf
time_relative-v3.1-opt.pdf salome, 2022-05-23.17:23:16 application/pdf
time_relative.pdf salome, 2022-04-25.14:18:59 application/pdf
msg10730 (view) Author: salome Date: 2022-06-01.16:35:52
The issue is merged now.
msg10729 (view) Author: malte Date: 2022-05-30.11:37:02
I think this looks good to be merged.
msg10722 (view) Author: salome Date: 2022-05-24.20:38:15
You might be onto something. There is a cache that reuses old builds, and in my experiments it used the same build for base while we had these weird results. I manually deleted this cache when I reran v3-opt because I needed to rebuild the planner with both cplex and soplex.
(The v2 soplex run was done in a different directory without using the cache and with a build that only had soplex.)
msg10721 (view) Author: malte Date: 2022-05-24.19:35:24
> I also again attached a relative scatterplot (time_relative-v3.1-opt.pdf) It
> seems that the systematic slowdown is gone now... I'm completely baffled. The
> only thing I can think of now is if the randomization of the runs is not
> random enough which for some reason negatively affected one configuration, but
> it doesn't make sense either...

That's not it, unless Jendrik made changes to lab since the last time we checked this. From what I recall, you can easily verify yourself from the scripts that lab produces how the randomization works and that it indeed is successful at ensuring that there is no systematic bias of certain task IDs running in a certain order. Besides, the very clear patterns we saw in the scatter plots in my view don't fit the kind of behavior we would expect from these kinds of grid influences, it very much looks like something that comes from the planner process itself.

It's more likely some kind of change in the grid environment, the compilers, or the factors that affect how CPLEX is loaded.

@Jendrik: is there still a cache that reuses old builds if they are available? That could cause systematic errors even in clean experiments if there were compiler changes in the meantime.

In any case, I agree with Clemens. At least in principle -- I haven't had time to look at the data myself yet, and perhaps I should do this first; but I cannot do it right now, I only have a weak mobile Internet connection at the moment.
msg10720 (view) Author: clemens Date: 2022-05-23.17:56:09
I don't really know what to make of this... My gut feeling tells me to just merge and forget about this, but that doesn't sound like a very satisfying thing to do. Nevertheless, we've tried a lot of things and couldn't find any useful traces so far, so I don't see any big issues with this procedure given the newest results.

I've added a changelog entry.
msg10719 (view) Author: salome Date: 2022-05-23.17:23:16
I reran v3-optimal with both cplex and soplex. The results are in the same link as the old v3 experiment:

I also again attached a relative scatterplot (time_relative-v3.1-opt.pdf) It seems that the systematic slowdown is gone now... I'm completely baffled. The only thing I can think of now is if the randomization of the runs is not random enough which for some reason negatively affected one configuration, but it doesn't make sense either...
msg10718 (view) Author: malte Date: 2022-05-23.13:56:21
We discussed offline to add configurations using Soplex instead of CPLEX to the experiment. Please rerun everything in a single experiment, so that we can also compare numbers between Soplex and CPLEX. (Data from experiments that were not run together generally cannot be trusted because they have systematic errors based on what else happened on the grid at the same time.)

If the Soplex numbers are OK, I suggest we live with the bad CPLEX numbers. So then the final step would be to write the changelog entry and merge.
msg10717 (view) Author: salome Date: 2022-05-10.14:34:08
I addressed the issue with increased landmark generation time and ran another round of experiments:

Overall the results look good, landmark generation time in settlers is now faster, and in the other domains we are either faster or the slowdown seems to be random noise. The two attached relative scatter plots (lmgraph_gen_time_relative-v3-{opt/sat}.pdf) also confirm this in my eyes.

The LP issue still persists, and even got worse. For seq-opt-bjolp-opt, we now have a coverage loss of 8, and a more or less consistent slowdown of around 10% in search time (see attached scatterplot time_relative-v3-opt.pdf). Between v2 and v3 only the two classes Exploration and LandmarkFactoryRelaxation were touched (, which I see as further evidence that the slowdown is out of our control.

My suggestion would be to write the changelog entry and then merge the issue, probably with Malte taking a brief look over the code first. But if anyone has any ideas what we can do about the LP issue I'm happy to try it out.
msg10712 (view) Author: salome Date: 2022-05-04.12:48:35
On a different note: When Clemens was about to write the changelog entry, he noticed that in some domains, the landmark generation time goes up significantly. Most exemplary is the domain settlers-sat18-adl, especially for the lama configurations, where time increases by an order of magnitude (

I already (most likely) found the reason: We changed the way to check whether a UnaryOperator is excluded because it makes an excluded Proposition true. Long story short, we loop over UnaryOperators instead of the original operators, which is highly inefficient when an Operator has many effects (in the problem I looked at it was usually between 70 to 120). I will reimplement this part and run another round of experiments.
msg10711 (view) Author: salome Date: 2022-05-04.12:44:05
We continued looking into the LP issue and verified (for the nomystery task) that in both configurations the LP solver obtains the exact same constraints (in the same order) and builds identical LPs. We also checked whether the time difference actually comes from the LP by measuring the time spent in lp.load() and lp.solve() with two separate timers. We could confirm this in two tasks:

nomystery p06:
Load Timer: 0.28939s (v2) vs  0.210336s (base)
Solve Timer: 1.20449s (v2) vs 0.910594s (base)
Actual search time: 1.72785s (v2) vs 1.3478s (base)

scanalyzer p11: 
Load Timer: 16.8642s (v2) vs 13.5181s (base)
Solve Timer: 70.6282s (v2) vs 60.9222s (base)
Actual search time: 113.912s (v2) vs 100.667s (base)

We also profiled cache misses on nomystery, and v2 seems to have less cache misses than base. This is probably connected to the fact that when I profiled the problem with callgrind v2 was also marginally faster (although here it was a bit slower, 59.2s search time for v2 vs 58.1s for base). I don't really know how to interpret the output, so I just copy paste it here:

==12535== I   refs:      13,256,187,515
==12535== I1  misses:         3,548,016
==12535== LLi misses:            18,053
==12535== I1  miss rate:           0.03%
==12535== LLi miss rate:           0.00%
==12535== D   refs:       6,271,138,087  (4,358,546,819 rd   + 1,912,591,268 wr)
==12535== D1  misses:       219,927,982  (  145,307,212 rd   +    74,620,770 wr)
==12535== LLd misses:           218,227  (       48,963 rd   +       169,264 wr)
==12535== D1  miss rate:            3.5% (          3.3%     +           3.9%  )
==12535== LLd miss rate:            0.0% (          0.0%     +           0.0%  )
==12535== LL refs:          223,475,998  (  148,855,228 rd   +    74,620,770 wr)
==12535== LL misses:            236,280  (       67,016 rd   +       169,264 wr)
==12535== LL miss rate:             0.0% (          0.0%     +           0.0%  )

==16500== I   refs:      13,261,760,492
==16500== I1  misses:         3,606,991
==16500== LLi misses:            18,080
==16500== I1  miss rate:           0.03%
==16500== LLi miss rate:           0.00%
==16500== D   refs:       6,272,656,549  (4,358,012,215 rd   + 1,914,644,334 wr)
==16500== D1  misses:       220,968,487  (  146,286,892 rd   +    74,681,595 wr)
==16500== LLd misses:           226,172  (       43,790 rd   +       182,382 wr)
==16500== D1  miss rate:            3.5% (          3.4%     +           3.9%  )
==16500== LLd miss rate:            0.0% (          0.0%     +           0.0%  )
==16500== LL refs:          224,575,478  (  149,893,883 rd   +    74,681,595 wr)
==16500== LL misses:            244,252  (       61,870 rd   +       182,382 wr)
==16500== LL miss rate:             0.0% (          0.0%     +           0.0%  )

To summarize, we are 99% certain that CPLEX is "at fault" and that we feed it exactly the same data, but we have no explanation why such a systematic difference can happen. Given that pretty much the only thing the code *does* change is the Exploration object, I could try out what happens if we delete it before the search (in both base and v2). But I'm not sure what (if any) information we could get out of this.
msg10710 (view) Author: salome Date: 2022-04-26.10:09:35
The soplex experiments are done:

We still loose 3 tasks, for these the base config required 295.64, 285.02 and 290.27 seconds respectively to solve them. Looking at the attached scatterplot (time_relative-soplex.pdf) the performance variation looks much more random now, so I'd conclude that the trend we saw was indeed due to the LP solver.

I will ask Florian if he has any ideas on the top of his hat what we can do to improve CPLEX's performance, but I could imagine that there is not too much we can do about it...
msg10709 (view) Author: salome Date: 2022-04-25.17:57:46
I didn't test it but they should get exactly the same LPs. We don't change anything in the search or landmark graph. The only thing that is different after the preprocessing is the Exploration object (has less/different members).

In the profiles for the nomystery problem about 2/3 is spent in the LP. I installed Soplex on the grid and local tests on the nomystery problem yield no difference between base and v2. I will start a full experiment with soplex (only this configuration since the others don't use LPs).
msg10708 (view) Author: malte Date: 2022-04-25.14:32:59
Do we feed the same LPs, or essentially the same LPs? It the bulk of the time spent inside the LP solver? A simple thing we can do is swap out the LP solver (use Soplex instead of CPLEX or vice versa) and see if we see the same trends.
msg10707 (view) Author: salome Date: 2022-04-25.14:18:59
I tried to further look into the seq-opt-bjolp-opt results. The attached relative scatterplot shows that there is indeed a systematic overhead for v2. I then looked at the most extreme point (with ~45% worse runtime), which is nomystery-opt08-strips/p06. I could reproduce the worse performance locally on my laptop and on the grid login node (see below), but when I ran a profiler the v2-version was only marginally slower (base: 14 571 278 551, v2: 14 574 852 506), and SearchEngine::search() is actually faster on v2 by a tiny margin. I will also attach the profiles.

I'm not sure where to go from here. Is there anything more we can gleam from the profiles? I'm wondering if it might be some oddity with the LP solver (since this is the only configuration that uses LPs). Any ideas?

Local runs of nomystery-opt08-strips/p06 on the grid login node (runtime in s):

msg10697 (view) Author: malte Date: 2022-04-11.19:28:52
I think the results for the seq-opt-bjolp-opt configuration are weird enough that we should try to understand a bit better what is going on there.
msg10696 (view) Author: salome Date: 2022-04-11.17:24:26
I found the bug and reran the experiments:

To me the results look good. We gain a bit of landmark generation time and a bit of memory.

In the optimal setting for bjolp with optimal cost partitioning, v2 was a bit unlucky and lost two tasks that base solved with less than 10 seconds to spare (5 minute timeout). What seemed a bit strange to me was that for this configuration v2 also performed worse across the board in the search time score, while for the others v2 was consistently better. Given that the differences are quite small in any case, this might however just be noise.

I will adress Clemens' comments tomorrow, but afterwards this issue is ready to be merged from my side.
msg10694 (view) Author: clemens Date: 2022-04-08.17:46:15
I also reviewed the changes and didn't spot anything that needs to be changed in my opinion. Except: there is no changelog entry yet. ;-)
msg10693 (view) Author: remo Date: 2022-04-04.16:17:53
I looked at the code and could not identify any notable issues. I added some minor 
comments on GitHub.
msg10692 (view) Author: salome Date: 2022-04-04.16:05:21
The results are in:

While there are some promising aspects (landmark generation time and memory goes down), we also obtain new segfaults in miconic-fulladl in the satisficing setting. Guess I'll go bug-hunting :).
msg10684 (view) Author: clemens Date: 2022-04-04.10:59:15
I planned to review this sometime soon. Unfortunately, this week is already pretty packed for me due to upcoming deadlines and the regular day to day activity during the semester, so I cannot predict when I'll have a chance to look at this.

Moreover, since I was involved in some of the decisions and implemented some changes myself, it might make sense to have somebody else look into it as well. Maybe Remo would be willing to have a look at the changes? He has been involved when we kicked the refactoring of the exploration off, so he already knows some relevant background.
msg10680 (view) Author: malte Date: 2022-04-02.13:05:44
I responded to the comments on github, bit did not yet do a code review. Perhaps someone else wants to do a code review first? Otherwise I hope I can do one next week if you remind me.
msg10679 (view) Author: salome Date: 2022-04-01.16:35:55
Clemens and I refactored the Exploration class according to the points stated here, as well as some other minor changes.

The pull request can be found here:
I added some small comments already that might be worth discussing.

I also started some experiments and will link them here as soon as they finish.
msg10547 (view) Author: malte Date: 2022-02-08.18:18:54
Another thing to note is that we do an h^max exploration but never use the actual h^max values, only the infinite/finite distinction. I would suggest to:

- Change the interface so that it only returns the boolean reachability information that the callers are actually interested in, and

- Change the implementation so that it doesn't keep track of these costs but only of reachability. (Note: we already have a similar qualitative reachability analysis in LM-Cut.)

- We should also be as clear as possible in which ways compute_reachability_with_excludes is exact and in which cases is approximate, and perhaps change the names of functions like "relaxed_task_solvable" that are approximate to reflect that they are approximate (and how).
msg10541 (view) Author: clemens Date: 2022-02-08.17:17:54
While working on issue1041, we took some time to look into the *landmarks::Exploration* class. We should act on some ideas to refactor it while our understanding of the code is not forgotten. After resolving issue1044 (removing dead code) we should address the following things:

- Initialize *lvl_var* inside *Exploration::compute_reachability_with_excludes()* instead of outside.
- Extend *excluded_op_ids* with all operators that achieve any of *excluded_props* unconditionally. This can be done along the lines of *LandmarkFactoryRelaxation::achieves_non_conditional()* which is currently the only place where this happens. The only other context where *compute_reachability_with_excludes()* is used is in *LandmarkFactoryRelaxation::is_causal_landmarks()* where *excluded_props* is empty (and therefore no operators need to be excluded for this reason).
Date User Action Args
2022-06-01 16:35:52salomesetstatus: chatting -> resolved
messages: + msg10730
2022-05-30 11:37:02maltesetmessages: + msg10729
2022-05-24 20:38:15salomesetmessages: + msg10722
2022-05-24 19:35:24maltesetmessages: + msg10721
2022-05-23 17:56:09clemenssetmessages: + msg10720
2022-05-23 17:23:16salomesetfiles: + time_relative-v3.1-opt.pdf
messages: + msg10719
2022-05-23 13:56:21maltesetmessages: + msg10718
2022-05-10 14:34:29salomesetfiles: + lmgraph_gen_time_relative-v3-opt.pdf
2022-05-10 14:34:22salomesetfiles: + lmgraph_gen_time_relative-v3-sat.pdf
2022-05-10 14:34:08salomesetfiles: + time_relative-v3-opt.pdf
messages: + msg10717
2022-05-04 12:48:35salomesetmessages: + msg10712
2022-05-04 12:44:05salomesetmessages: + msg10711
2022-04-26 10:09:35salomesetfiles: + time_relative-soplex.pdf
messages: + msg10710
2022-04-25 17:57:46salomesetmessages: + msg10709
2022-04-25 14:32:59maltesetmessages: + msg10708
2022-04-25 14:19:12salomesetfiles: + callgrind.out.v2-nomystery
2022-04-25 14:19:08salomesetfiles: + callgrind.out.base-nomystery
2022-04-25 14:18:59salomesetfiles: + time_relative.pdf
messages: + msg10707
2022-04-11 19:28:52maltesetmessages: + msg10697
2022-04-11 17:24:26salomesetmessages: + msg10696
2022-04-08 17:46:15clemenssetmessages: + msg10694
2022-04-04 16:17:53remosetmessages: + msg10693
2022-04-04 16:05:21salomesetmessages: + msg10692
2022-04-04 10:59:15clemenssetmessages: + msg10684
2022-04-02 13:05:44maltesetmessages: + msg10680
2022-04-01 16:35:56salomesetmessages: + msg10679
2022-02-09 09:43:31silvansetnosy: + silvan
2022-02-08 18:18:54maltesetmessages: + msg10547
2022-02-08 17:34:43thomassetnosy: + thomas
2022-02-08 17:17:54clemenscreate