Issue 1124: Improve SoPlex performance - Fast Downward issue tracker

Title	Improve SoPlex performance
Priority	wish	Status	chatting
Superseder		Nosy List	florian, jendrik, malte, silvan
Assigned To	florian	Keywords
Optional summary

Created on 2023-10-10.18:04:36 by florian, last changed by florian.

Messages
msg11462 (view)	Author: florian	Date: 2023-10-17.12:12:36
I ran experiments with diverse potential heuristics, an operator-counting heuristic, and the landmark CP heuristic. For these heuristics, I'm comparing 5 versions: * the old code running OSI and SoPlex 3.1.1, * the current code running SoPlex 6.0.3, a version running SoPlex 6.0.4* with the fix that avoids storing/clearing temporary information, * the version above that additionally stores the basis after every solve and restores it afterwards, and * one version that stores the basis and additionally switches off the scaler. For the last two versions, the assumption was that whenever the dimensions of the LP match the previous solve, it is OK to recover the previous basis. Unfortunately we got segfaults and changed expansion numbers from the landmark configuration in those cases, so maybe this is not the case. Switching off the scaler might still be positive but we cannot evaluate it with this experiment if the basis restoration is broken. For diverse potential heuristics, we already got a big performance improvement from switching off OSI (+24 tasks), the remaining changes only show minor changes in this config. Coverage and time score actually go down by ~2 with the fix in SoPlex. This doesn't make sense for the fix alone because it only avoids unnecessary effort but there are other changes going from 6.0.3 to 6.0.4. This config in general is somewhat hard to predict because it only solves LPs at the beginning and there is a non-deterministic choice of which optimal LP solution is used as a potential function. The number of expansions and the search behavior are thus different, making it more difficult to compare versions. Overall, I'd say with this config, both of the non-OSI versions are fine. Report: https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1124/data/issue1124-v2-eval/report-diverse-potentials.html For the landmark heuristic, I don't have results for the OSI version because of the name change of the heuristic. Looking at the SoPlex fix that avoids storing unnecessary data shows only a marginal improvement (coverage goes up by 2 but there are some domains with one additional task solved, some with one fewer. The time score is roughly the same. The memory score improves but this configuration doesn't normally run out of memory). Overall, this looks like noise to me. Report: https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1124/data/issue1124-v2-eval/report-lm-cp.html The operator-counting heuristic was the starting point for this issue, since we lost 13 tasks when switching off OSI. While memory performance increased a lot, this didn't change much as the config also rarely runs out of memory. The lost tasks all timed out. With the SoPlex fix, things speed up somewhat and we recover 3 of the lost tasks. Storing the basis seemed to be OK here (same number of expansions everywhere) and increased the overall time score by almost 7 (without an effect on coverage unfortunately). Additionally switching off the scaler also has a positive effect on runtime (+2 total time score, +2 coverage). Interestingly, the time score of the final config (665.33) is higher than that of the OSI config (661.36) but the number of timeouts is higher. My guess is that the impact of restoring the basis is very dependent on the domain. For example, it helps a lot in Miconic, where it increases the time score by 4 without solving additional tasks. Report: https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1124/data/issue1124-v2-eval/report-seq-lmcut.html Plot showing the effect on runtime in Miconic: https://ai.dmi.unibas.ch/_visualizer/?xattribute=total_time&yattribute=total_time&properties_file=https%3A%2F%2Fai.dmi.unibas.ch%2F_experiments%2Fai%2Fdownward%2Fissue1124%2Fdata%2Fissue1124-v2-eval%2Fproperties.tar.xz&groupby=domain&xsize=1000&ysize=1000&entries_list=soplex604x-basis-noscale-seq-lmcut+osi-seq-lmcut+seq-lmcut&y_range=%5B0.4580125258299728%2C+11.0%5D&x_range=%5B0.009000000000000001%2C+1100.0%5D&relative=true
msg11448 (view)	Author: florian	Date: 2023-10-10.18:04:36
When profiling SoPlex in issue1076, we noticed that its performance degraded compared to the version using OSI. Together with the OSI developers, I traced this down to a couple of issues: * The new code spends a lot of time in the "scaler" (a way to multiply all coefficients by a constant to get better numerical stability). This can be switched off and improved performance in the instance I used for profiling. * SoPlex cleared some temporary data in cases where this was not necessary. This is fixed by https://github.com/scipopt/soplex/pull/20 The pull request is merged but not released yet. * OSI (or the old SoPlex version) seems to warm-start in cases where the new SoPlex cannot do this. In particular, this happened in an operator-counting configuration with landmark constraints. From one solve to the next bounds were changed (not an issue for warm starts) but we also remove all landmark constraints and add new ones. In the example I looked at, these landmark constraints didn't change in a lot of cases (so we added the same constraints we removed). The new SoPlex invalidates the current solution and starts from scratch, while the old one could solve the LP with 0 simplex iterations (i.e., the old basis immediately led to an optimal solution). A quick fix for this is to actively store the basis after every solve and restore it before the next solve if the dimensions of the problem still match. This also improved performance in the instance where I profiled this. I plan to run some experiments to see if the performance increase holds up.

History
Date	User	Action	Args
2023-10-17 12:12:36	florian	set	messages: + msg11462
2023-10-17 12:12:16	florian	set	messages: - msg11461
2023-10-17 12:12:00	florian	set	status: unread -> chatting messages: + msg11461
2023-10-10 18:04:36	florian	create