Issue1076

Title Replace OSI with our own interface to LP/MIP solvers
Priority feature Status resolved
Superseder Nosy List florian, gabi, jendrik, malte, silvan, thomas
Assigned To florian Keywords
Optional summary
Pull Request:
* https://github.com/aibasel/downward/pull/151

Related issues:
* issue926
* issue972
* issue1010
* issue1047

Created on 2023-01-27.21:08:30 by florian, last changed by florian.

Summary
Pull Request:
* https://github.com/aibasel/downward/pull/151

Related issues:
* issue926
* issue972
* issue1010
* issue1047
Messages
msg11139 (view) Author: florian Date: 2023-07-20.16:04:30
The wiki is up to date now.
msg11136 (view) Author: florian Date: 2023-07-20.12:41:30
We decided to accept the lost performance of SoPlex and merged the code.

The remaining step is to update the wiki. We will update the Github actions in issue1093.
msg11114 (view) Author: florian Date: 2023-07-17.09:29:53
I ran experiments comparing the SoPlex interface to its OSI counterpart. In some preliminary results posted on Discord we lost hundreds of tasks in coverage. A profile showed that a lot of time was spend in the simplifier, which did not show up in the profile of the old code at all. I switched it off and this improved the situation in all configs. The preliminary results also had suboptimal plans but this was due to a bug that is now fixed. With these changes, the coverage of diverse potential heuristics actually increased by 28 tasks with clear improvements of time and memory usage in most tasks. However, the coverage of lmcount and operator-counting both dropped (-9 and -11 tasks) due to slower LP-solving times.

I profiled sokoban-opt08-strips/p04.pddl with operator counting to find out what is going on. The old interface spent 9.6B ticks in evaluating the heuristic where the new interface spent 16.2B. In both cases 2.8B of those are accounted for by computing LM-cut landmarks. The OSI-based code then spent most of the remaining time solving the LP (6.1B ticks), while the new code uses more time in two places:

1) changing the bounds: 2.3B ticks are spent changing constraint bounds, and 2.1B of those are in a function SLUFactorRational::clear(). I think this is related to exactly solving the LP over rationals. SoPlex has the option of solving the LP both over reals and over rationals. It can keep both representations synced. I explicitly switched off the rational mode and the syncing but the clear() is called unconditionally. I later found out that the default is to have these settings off anyway, so my change to switch them off explicitly did nothing.

2) solving: The new interface spends 10.8B ticks on solving where the old one only used 6.1B ticks. I cannot trace this down further because the old interface used the lagacy interface of Soplex <=3 while we require the current main branch (>6.0.3) to be compatible with C++-20. The functions called are different but as far as I can tell the main time loss comes from a place where the new code stores a discovered solution (SoPlexBase::_storeSolutionReal, 1.3B ticks) and from the part where Simplex makes a variable leave the basis (SoPlexBase::leave, 5.0B ticks vs. 1.2B with OSI).
msg11097 (view) Author: malte Date: 2023-07-03.18:06:54
Awesome, great news!
msg11096 (view) Author: florian Date: 2023-07-02.00:07:38
Domenico and I profiled this and found three differences to OSI that slowed things down in our code:

1) We forced dual simplex where OSI left the choice of primal/dual simplex up to CPLEX. It turns out dual simplex is a bad choice for diverse potential heuristics because we cannot warm start after changing the objective function. The solution was to also leave the choice up to CPLEX.

2) When loading an LP into CPLEX, OSI uses a column-wise representation which matches the internal data structures of CPLEX, whereas we used a row-wise representation. Again the solution was to switch to OSI's way.

3) Finally, OSI keeps a local copy of constraint upper and lower bounds. Looking them up through the interface was noticeably slowing things down in a profile, so I also switched to using a local copy. This makes for slightly more complex code because we have to keep our local copy up to date but payed off overall (8 additional tasks solved with lmcount).

I ran another round of experiments and the results look good to me now: all the good parts of msg11033 are still qualitatively true (the exact numbers change slighlty, for example the number of additional tasks solved with h^+ is now 84 instead of 85).

With lmcount, we still lose 3 tasks that before took ~90-100% of the runtime, but this is now offset by 3 tasks solved within ~90-100% of the time limit that the baseline doesn't solve. Overall coverage is now the same for lmcount while memory usage goes down as before. Compared to msg11033, the time score now goes up by 1.52 (distributed across all domains with no particular outliers). The issue here was loading data in column-form and caching the bounds.

For the operator-counting heuristic, these two things also had an effect and while we now only solve 3 instead of 5 additional tasks, the running time and memory usage now is better almost all tasks with no large exceptions.

For potential heuristics, we now actually compute the identical potential function because we call CPLEX in the same way and it is deterministic. This means, we also solve the same tasks. The time score goes down a little (-0.68 over all, distributed across domains) and 11 tasks that previously ran out of memory, now run out of time. However, I think this is just because we save on memory (memory score goes up by 5.38). The additional time seems to be spent in the search (search time score goes down by more than the total time score), which is unrelated to the changes in this issue. The changes are minor, so I suspect this is just some noise and maybe linker optimizations or code layout issues.

Details of the experiment are here, if you are interested:
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1076/data/issue1076-v5-eval/issue1076-v5-c2a93e68d-bdfaaa176-compare.html

All in all, I would consider the CPLEX interface done, which just leaves the SoPlex part.
msg11036 (view) Author: florian Date: 2023-03-13.17:36:09
> perhaps we could also ask Domenico for his thoughts?

I was just writing to him about this, when I noticed that this discussion was out of date, so the last update was just to bring everything up to date before sending a link. :-)
msg11034 (view) Author: malte Date: 2023-03-13.17:08:08
Hi Florian, thanks for the update, this in encouraging progress.
Universally good news would perhaps have been too much to ask for. ;-)

If we stop being able to make progress at some point where it's still mixed news, perhaps we could also ask Domenico for his thoughts?
msg11033 (view) Author: florian Date: 2023-03-13.16:57:53
The issue in msg11010 is now fixed (it turns out OSI handled this case wrong but not in a way that affected us).

I also ran some experiments which can be found here (ignore v1 and v2, they still had some bugs):
https://ai.dmi.unibas.ch/_experiments/ai/downward/issue1076/data/issue1076-v3-eval/

Sorry for the bad names:
 * c2a93e68d is the commit on the main branch, using OSI
 * 7a11a0fc is on the issue branch and uses CPLEX directly

There are a lot of unexplained errors but all of them are out of memory errors that are correctly recognized as oom errors. I suspect lab lists them as unexplained because of the output on stderr, where we print "Ran out of memory while calling CPLEX.".

I'll go config by config here as the change impacts them very differently:

# h^+ (as an operator-counting MIP)

Performance in h^+ improves drastically, with only positive effects and a total of 85 additional tasks solved.
Most of the additional tasks previously ran out of time, some out of memory. The number of evaluations/expansions stays the same, so this is indeed the same heuristic, just computed faster. I'm not sure exactly where this change comes from, but suspect the default MIP strategy of CPLEX is just better than the parameters set in OSI by default.


# Operator-Counting LP with h^SEQ and LM-cut constraints

Evaluations/Expansions are also the same and we also solve more tasks, but only 5 more. Time scores go up by about 5 and memory score by 9. Total time varies with factors between 0.4 and 1.5 in extreme cases but as the scores indicate, there is a positive trend and solving is faster on average. Memory is saved in almost all cases, using between 95% and 99% of the memory used by the baseline.


# Cost partitioning over Landmarks (still called lmcount in this revision, optimal/admissible version)

Evaluations/Expansions are also the same but here we lose 7 tasks and get 7 additional timeouts. As above, we also reduce memory usage by 1-5%, so the lost tasks are due to time. Two of these are duplicates (2011 and 2008 version of a task in pegsol and parcprinter), so I'd treat this as 5 tasks lost. The tasks in the baseline all take above 250s out of the 300s time limit, the worst case is in pegsol11/p14 with 255s. The overall and search time scores drop by about 2.75. There clearly is something slowing down here and the fact that this happens here and not in the operator-counting heuristics hints at that the problem is somehow with loading in the problem (happens once for operator-counting and once per state for lmcount).


# Diverse Potential Heuristics

This is somewhat harder to analyze because multiple optimal solutions to the LP can exist and each one will be a different potential heuristic that might be better or worse. Here the number of expansions and evaluations varies, so we get different heuristics that then also take up a different amount of time and memory. Over all, we lose 11 tasks.

The scores are better in some domains, worse in others, and of course tasks that ran out of time or memory while trying to find a potential function get a score of -1. Overall the time score is worse by 3 (evaluations) and 3.4 (expansions). From scrolling over the scores, the positive and negative scores look equally distributed except for the ones influenced by -1, so I don't think we get systematically works heuristics.

While we do save on memory on average (score +1.91), the time scores go down and we get 28 additional timeouts and 17 fewer memouts. The total time score drops by 10.31 while the search time score drops by 6.09. This makes sense as the LP-related stuff happens before the search starts for this configuration. With diverse potential heuristics, we also solve multiple LPs.

Since this configuration is non-deterministic in a way, I suggest we start with looking into the performance of lmcount and run potential heuristics again once we fix something there.
msg11010 (view) Author: florian Date: 2023-02-10.17:09:10
There is also an open issue in the CPLEX interface:
When setting up constraints in our code, we use lower and upper bounds, while CPLEX uses a different representation internally (right-hand side, sense, and range). In CPLEX's model, we cannot specify an unsatisfiable constraint like
  2 <= x <= 1
Of course we could catch such constraints before we pass them to CPLEX but if we then later change the bounds on the constraint (e.g., change the upper bound in the example above to 4), the constraint and the old bounds can become relevant again. We still have to figure out how OSI handled this case.
msg11007 (view) Author: florian Date: 2023-02-10.16:58:32
We started work on the new interface and I added a link to a pull request to the summary. Currently, the code is not ready for review yet. The general setup for the interface is done, the CMake scripts are adapted, and there is an initial version of the CPLEX interface. In this version, we assumed that CPLEX copies the data passed in to the interface which is not the case for all library functions. The next step is to find out which library calls copy the data and which ones do not. The SoPlex interface is not implemented yet.
msg10962 (view) Author: malte Date: 2023-02-02.15:54:41
Thanks! I think that's sufficient evidence that we can focus the new interface on Soplex and CPLEX only.
msg10961 (view) Author: florian Date: 2023-02-02.13:05:45
I installed Gurobi locally and compiled Fast Downward with it. I took some shortcuts to get this working (e.g., the CMake find scripts I wrote will not work on other systems), but it is sufficient for a local test.

I ran 4 configs with two tasks each on my laptop and measured total time, except for potential heuristics, where I measured the time to initialize the heuristic instead because the search itself doesn't use the LP solver. Mostly CPLEX was faster but there also was a case where Gurobi was faster and for the most part, the times seem to be in the same ball park. An exception is the potential heuristic on Airport where Gurobi is much slower.

                                 CPLEX      Gurobi
A*/h^SEQ                        171.31s    166.90s    miconic/s9-0.pddl
  (total time)                  361.51s    424.01s    termes-opt18-strips/p03.pddl

A*/div potentials                32.42s     95.99s    satellite/p30-HC-pfile10.pddl
  (time to initialize heuristic)  8.07s    200.56s    airport/p35-airport4halfMUC-p12.pddl

A*/opcount, pho, lmcut           31.26s     60.05s    woodworking-opt11-strips/p05.pddl
  (total time)                   24.76s     29.16s    freecell/p02.pddl

A*/h^+                           20.96s     30.92s    blocks/probBLOCKS-6-2.pddl
  (total time)                    7.33s      7.41s    depot/p01.pddl



All in all, I'd say Gurobi is still interesting to try out in a full experiment eventually but we do not have to support it as part of this issue.
msg10959 (view) Author: malte Date: 2023-02-01.23:41:58
I would say more advanced changes of the interface should not be part of this issue. I think lpsolve is an attractive nuisance that it might be good to explicitly *not* support -- otherwise there is a risk of the "pyperplan effect" where we see lots of people using lpsolve (and getting terrible runtime) because it's easier to set up than the alternatives.

As discussed, I think it would be nice to invest a bit of work in finding out how Gurobi would perform before we rip it out because of the strong performance numbers and because we currently claim to support it in our documentation. But if it turns out to be too much trouble, it shouldn't keep up this issue.

Regarding versions, upgrading to more recent versions sounds good to me; I don't know if there is a good reason to write code for an old version with the immediate plan of changing it again to move to a new version.
msg10949 (view) Author: florian Date: 2023-01-27.21:08:30
We want to write our own interface to drop the dependency on OSI. The first step
is to figure out the scope of this issue. Since we already have an interface to
OSI on our end, I propose to keep using this as-is and only changing how it
works internally, i.e., talk directly to the solvers rather than talking to OSI.

There are more advanced options that we also previously discussed. For example,
we could make the classes for LPConstraint and LPVariable smarter, so, for
example, bounds can be updated on them. I would see this and any other change
that would require changing the calling code as future work for now. This
probably means we have to copy the data into the solvers but we do so currently
as well.

Regarding supported solvers, I would aim for SoPlex and CPLEX for now with the
option of supporting Gurobi in the future. In recent tests I saw, Gurobi was
fantanstic specifically for LPs. We also have a module on teh grid for Gurobi
with a license that allows us to run experiments (the normal academic license is
limited to one parallel instance). So Gurobi is worth trying out but if we can
support SoPlex and CPLEX, I'd say that is good enough for this issue. Regarding
other solvers, maybe lpsolve is interesting because it could be interesting
because it can be installed through apt. I would like to keep this issue small,
though, so I suggest we limit it to the solver we currently use. I don't think
it makes sense to drop either of them from this issue, as we want CPLEX for
experiments and SoPlex for release containers.

There is another question about which versions of the solvers we want to support
and how we want to deal with multiple versions in the interface. For
comparability to the current code, it might make sense to support the same
versions that we currently use (although, we support multiple ones, so this is
not completely clear). On the other hand, it would be nice to support more
recent versions. SoPlex specifically had a API-breaking change after the version
we currently use, so we'd probably need changes in the interface (or two
interfaces) if we want support for both new and old versions. I guess we could
generally have one interface per solver and handle version differences with
#ifdefs but I'm not sure if there is a better way of handling this.

Finally, there are also a couple of issues that are related and we should update or fix on the fly:
* issue926 and issue972 discuss support for Gurobi and how to use it on the grid
* issue1010 is about switching to newer versions of SoPlex and CPLEX
* issue1047 is about exiting cleanly when using SoPlex as a MIP solver
History
Date User Action Args
2023-07-20 16:04:30floriansetstatus: chatting -> resolved
messages: + msg11139
2023-07-20 12:41:30floriansetmessages: + msg11136
2023-07-17 09:29:53floriansetmessages: + msg11114
2023-07-03 18:06:54maltesetmessages: + msg11097
2023-07-02 00:07:38floriansetmessages: + msg11096
2023-03-13 17:36:09floriansetmessages: + msg11036
2023-03-13 17:08:08maltesetmessages: + msg11034
2023-03-13 16:57:53floriansetmessages: + msg11033
2023-02-10 17:09:10floriansetmessages: + msg11010
2023-02-10 16:58:32floriansetmessages: + msg11007
summary: Related issues: * issue926 * issue972 * issue1010 * issue1047 -> Pull Request: * https://github.com/aibasel/downward/pull/151 Related issues: * issue926 * issue972 * issue1010 * issue1047
2023-02-02 15:54:41maltesetmessages: + msg10962
2023-02-02 13:05:45floriansetmessages: + msg10961
2023-02-01 23:41:58maltesetmessages: + msg10959
2023-01-30 14:49:18gabisetnosy: + gabi
2023-01-30 07:55:35thomassetnosy: + thomas
2023-01-27 21:08:30floriancreate