Issue443

Title Introduce interface class between LP user code and LP solvers
Priority wish Status resolved
Superseder Nosy List florian, jendrik, malte, silvan
Assigned To florian Keywords
Optional summary

Created on 2014-07-23.16:18:47 by florian, last changed by malte.

Files
File name Uploaded Type Edit Remove
callgrind.out.default florian, 2014-12-17.18:28:51 application/octet-stream
callgrind.out.experimental-v3 florian, 2014-12-19.21:34:14 application/octet-stream
callgrind.out.issue433-experimental florian, 2014-12-17.18:29:03 application/octet-stream
callgrind.out.p09.default florian, 2014-12-22.22:09:18 application/octet-stream
callgrind.out.p09.experimental-v3 florian, 2014-12-22.22:09:46 application/octet-stream
Messages
msg4057 (view) Author: malte Date: 2015-03-15.14:03:36
Hi Silvan, see issue508, especially msg4054.
msg4056 (view) Author: silvan Date: 2015-03-15.14:02:22
Hey guys, I am unable to compile the master version after the integration of
this issue.

In file included from /usr/include/c++/4.8/memory:81:0,
                 from lp_solver.h:7,
                 from lp_solver.cc:1:
/usr/include/c++/4.8/bits/unique_ptr.h: In instantiation of ‘void
std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = OsiSolverInterface]’:
/usr/include/c++/4.8/bits/unique_ptr.h:184:16:   required from
‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = OsiSolverInterface; _Dp =
std::default_delete<OsiSolverInterface>]’
lp_solver.h:97:5:   required from here
/usr/include/c++/4.8/bits/unique_ptr.h:65:22: error: invalid application of
‘sizeof’ to incomplete type ‘OsiSolverInterface’
  static_assert(sizeof(_Tp)>0,
                      ^
make: *** [.obj/lp_solver.release.o] Error 1
msg4044 (view) Author: florian Date: 2015-03-09.17:52:12
Merged and looking forward to the integration of more LP heuristics :-)
msg4041 (view) Author: malte Date: 2015-03-08.13:08:01
If you like the results, please merge!
msg4040 (view) Author: florian Date: 2015-03-07.11:30:18
I compared the current tip of the default branch with the issue branch. Results
still look good:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-v4-base-issue443-v4-compare.html

Should I merge?
msg4036 (view) Author: malte Date: 2015-03-06.16:33:17
Yes, experiments before merging would be good.
msg4033 (view) Author: florian Date: 2015-03-06.15:32:05
Thank you for the review. I updated the code in the bitbucket pull request with
your suggestions. Should I run another experiment now?
msg3989 (view) Author: florian Date: 2014-12-29.20:29:50
I updated the pull request with the new code version. Could you do a review as a
next step? Of course this is not urgent, I probably will not work on this until
February.

https://bitbucket.org/flogo/downward-issues-issue443/pull-request/1/issue443/diff

The results for this revision look similar to the last experimental version (+45
coverage and almost always faster):
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-v3-compare.html

I also opened issue506 for the changes we did not want to do right now.
msg3987 (view) Author: malte Date: 2014-12-25.12:00:01
Looks very good indeed! :-) Do the crashed nodes work again, and if not, have
they been reported? (If you report them, please say that they're not urgent.)

> I agree, this sounds good but can wait until later. Should I open a separate
> issue for this change?

Yes, sounds good.

> For now, I suggest that I reimplement experimental-v4 in the main issue branch
> (without all the hacks and debug code from the experimental branch) and run
> experiments for that version. I prefer experimental-v4 over experimental-v5
> because it uses a clearer interface. The advantage that experimental-v5 has
> (10-20%) from not copying the data is probably something that we can achieve in
> the new issue when we store all data in the LP class as you suggested.

Sounds good.
msg3986 (view) Author: florian Date: 2014-12-24.13:25:34
I added a second vector<LPConstraint> to LM-count and in each round copy all
non-empty LPConstraints to this vector and use the result to generate the LP
(experimental-v4). I also tried changing the interface to accept a
vector<LPConstraint *> to avoid the copy (experimental-v5). The results of both
version look very good:

== Comparing base to experimental-v4 ==
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v4-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v4.png

== Comparing base to experimental-v5 ==
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v5-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v5.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v5-no-psr.png

== Comparing experimental-v4 to experimental-v5 ==
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-experimental-v4-issue443-experimental-v5-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v4-v5.png
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v4-v5-no-psr.png


Notes:
Some runs crashed on the grid because 2 nodes stopped working during the
experiment. I'm ignoring this here because we will have to run another
experiment for the final version anyway and I think the trend is clear enough
without the crashed tasks.

The scatter plots for experimental-v5 have two strongly negative points from the
domain psr-small at the very left of the plot that throw of the scale a bit. I
added alternative plots that leave out this domain.



> I still like the idea where the actual data for the constraints lives right in
> the LP class, because I think it allows some more global optimizations and
> should also make it comparatively easy to add/remove/modify individual
> constraints in any order. I think this should be a separate evolution step,
though.

I agree, this sounds good but can wait until later. Should I open a separate
issue for this change?

For now, I suggest that I reimplement experimental-v4 in the main issue branch
(without all the hacks and debug code from the experimental branch) and run
experiments for that version. I prefer experimental-v4 over experimental-v5
because it uses a clearer interface. The advantage that experimental-v5 has
(10-20%) from not copying the data is probably something that we can achieve in
the new issue when we store all data in the LP class as you suggested.
msg3985 (view) Author: malte Date: 2014-12-23.14:48:27
I still like the idea where the actual data for the constraints lives right in
the LP class, because I think it allows some more global optimizations and
should also make it comparatively easy to add/remove/modify individual
constraints in any order. I think this should be a separate evolution step, though.

For now, I suggest you go with whatever appears easiest and matches the previous
performance (or at least gets close to it). Let's try to improve it in a
separate step.
msg3984 (view) Author: florian Date: 2014-12-23.14:38:47
This was subtle. The matrix can be different in one case: if constraints *at the
end* of the matrix are empty or the *last few* LP variables are not used, then
the old code does not create rows/columns form them. I don't think we can do
this in general, because we allow to modify an existing problem. If a variable
is not used, but a constraint that is added later uses it, we should make it
part of the LP from the beginning.

I think a solution for LM-count is to remove empty constraints in the LM-count
code and then just pass non-empty constraints to the LP. I'm not quite sure how
to make this efficient. We could change the interface to allow passing a
vector<LPConstraint *> instead of a vector<LPConstraint> but this seems a bit
awkward if we have some constraints and just want to pass them to the LP as they
are.
msg3983 (view) Author: malte Date: 2014-12-23.12:02:09
Odd. You could dump the matrices and bounds (or a hash value of the contents?)
to see if they are really identical?
msg3982 (view) Author: florian Date: 2014-12-23.11:56:52
I ran another profile for a version that normalizes the matrix in both the
default and experimental branch before loading it into Cplex. After
normalization, the matrix is column-ordered, there are no gaps, and each column
is sorted. The only way that I see how the call to Cplex can behave differently
is if the matrix has different coefficients but this should lead to different
heuristic values at least in some cases. Still, the profile of the experimental
version has more samples in almost all methods. This is most significant in
"CPXlpopt" (91050 vs. 52218) which is called by initial_solve(), but copying the
LP ("CPXcopylp") also takes longer (3793 vs. 2728).
msg3981 (view) Author: florian Date: 2014-12-22.22:19:03
Profiles for the opt11 task are somewhat confusing. The time spent in Cplex
methods is significantly different between the two version. However, the number
of evaluations, expansions, etc. is the same, so I think it still calculates the
same heuristic. The new version should call the same Cplex methods with the same
parameters. The difference should only affect our wrapper and the OSI-layer.
msg3980 (view) Author: florian Date: 2014-12-22.15:50:27
The results of experimental-v3 (intitial_solve, vectors, reuse temporary
objects) look better than before but are still worse than the baseline on average.

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v3-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v3-relative-scatter-time.png

We previously profiled sokoban task opt08/p04, which now runs faster than the
baseline.
I'll redo the profile on one of the sokoban tasks that still have worse
performance (e.g. opt11/p09) and try to find what's going on.
msg3975 (view) Author: florian Date: 2014-12-19.21:34:14
Re-using LPConstraints and the vectors for bounds and objective shows a clear
improvement in the profile.

                      base        experimental-v3  diff
cost_sharing_h_value  73711       74196            -485   (- 0.7%)
  assign wrapper          -       14512         
    assign             7606        7485            +121   (+ 1.6%)
    matrix             5382        4193            +1189  (+22.1%)
    self(wrapper)         0        2834            -2834

  initialSolve        58139       58352            -213   (- 0.4%)

  self(cost_sharing)   2584        1332            +1252  (+48.4%)

The new version (experimental-v3) only loses time compared to the base version
while copying the data into the temporary objects. The time mentioned under
"self(wrapper)" is the time spent in the method LpSolver::assign_problem() which
prepares the data for OSI. I'll run a larger set of experiments on this version.
msg3972 (view) Author: malte Date: 2014-12-19.16:45:23
> I think using initialSolve() after assigning a new problem is a natural thing
> to do

That's true, and it's good to stick to that while we're trying to find out why
performance dropped. In the big picture, though, it may be cleaner and faster to
reuse the LP in the lmcount heuristic, so that may be something worth looking at
later.
msg3971 (view) Author: florian Date: 2014-12-19.15:28:25
I had a look at the absolute numbers in the profiles. The following table gives
an overview. The numbers are in "million profile samples" and each percentage is
the difference relative to the measurement in "base".

                      base        experimental  diff
cost_sharing_h_value  73711       84454         -10743 (-14.5%)
  assign wrapper          -       15113         
    assign             7606        7685         -79    (- 1.0%)
    matrix             5382        4179         +1203  (+22.4%)
    self(wrapper)         0        3249         -3249

  initialSolve        58139       58365         -226   (- 0.4%)

  create temporaries      -        5102         -5102 LPConstraint::insert()
  remove temporaries      -        2019         -2019 ~LPConstraint()

  self(cost_sharing)   2584        3855         -1271 (-49.2%)



Almost no change in assign() and initialSolve() make sense, because they are
called with the same parameters.
Creating the matrix is faster in the experimental version because the parameter
"colordered" has a bad setting in the default branch. The remaining difference
seems to come from temporary objects in the assign wrapper and in the cost
partitioning (LPConstraints).

I'll try to reuse LPConstraint objects to avoid recreating them in every step.
msg3970 (view) Author: florian Date: 2014-12-19.13:52:47
> All our new versions still lose some performance in the generation of the matrix
> for the constraint solver, but we recover that (at least for score_total_time,
> not for coverage) by using "resolve" instead of "initialSolve". Does all this
> sound correct?

Using resolve() is very beneficial for some tasks and much slower for others,
but on average it seems to help, yes.

> This means that perhaps for a fair comparison to the baseline it might be worth
> using a configuration with "initialSolve" and trying to get that close in
> performance to the baseline, so that we really see our overhead in matrix
> generation. Or maybe we could look at a variant of the baseline that uses
> resolve to compare to that. What do you think?

I think using initialSolve() after assigning a new problem is a natural thing to
do, so I prefer to look for an efficient version of that. One way to get closer
to the baseline would be to generate matrix entries column-by-column instead of
row-by-row. I suggest that I try this with an interface where the user can just
add one column at a time to see if this is faster. If it is, we can think of a
way to get this performance with a nicer interface.
msg3969 (view) Author: malte Date: 2014-12-19.00:07:29
OK, so I think we can roughly summarize this as follows. We are mostly varying
two dimensions: how to solve ("resolve" vs. "initial") and how to generate the
matrix ("base matrix" vs. "COIN vectors" vs. "vectors").

Comparing our five versions and focusing on score_total_time, I think we have,
from best to worst:

experimental-v2 ("vectors/resolve"): best
baseline ("base/initial"): second-best
v1 ("COIN vectors/resolve"), experimental-v1 ("vectors/initial"): middle
category; close to each other
v2 ("COIN vectors/initial"): clearly worst

This suggests a rather clear picture with the following total orderings:
"base" better than "vectors" better than "COIN vectors"
"resolve" better than "initial"

All our new versions still lose some performance in the generation of the matrix
for the constraint solver, but we recover that (at least for score_total_time,
not for coverage) by using "resolve" instead of "initialSolve". Does all this
sound correct?

This means that perhaps for a fair comparison to the baseline it might be worth
using a configuration with "initialSolve" and trying to get that close in
performance to the baseline, so that we really see our overhead in matrix
generation. Or maybe we could look at a variant of the baseline that uses
resolve to compare to that. What do you think?
msg3968 (view) Author: florian Date: 2014-12-18.20:27:37
Results for v2 are generally worse than before:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-v2-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-v2-relative-scatter-time.png
msg3967 (view) Author: florian Date: 2014-12-18.11:31:29
Experimental version v2 is like the experimental version v1 but always uses
resolve() instead of initial_solve(). The results compared to the base version
are better on average but the variance is much higher. This explains why the
regular version v1 had a larger variance (it also used resolve() instead of
initial_solve()).

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v2-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v2-relative-scatter-time.png

Since the number of configurations is getting confusingly large, here is a summary:

v1:
  Uses CoinShallowPackedVectors to create the matrix and resolve() to solve the LP.
experimental-v1:
  (Re-)uses vectors (elements, indices, starts, lengths) to create the matrix
and initial_solve() to solve the LP.
experimental-v2:
  (Re-)uses vectors (elements, indices, starts, lengths) to create the matrix
and resolve() to solve the LP.
v2:
  Uses CoinShallowPackedVectors to create the matrix and intial_solve() to solve
the LP.

Experiments for v2 are running.
msg3960 (view) Author: florian Date: 2014-12-17.18:31:59
I added the profiles of the base and experimental version. I'll now set up
experiments for the experimental version without initial_solve() and a version
based on issue443-v1 that uses initial_solve().
msg3956 (view) Author: florian Date: 2014-12-17.12:15:04
We evaluated a first implementation, which unfortunately performed worse than
the original:

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-v1-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-v1-relative-scatter-time.png

We then experimented with different calls to the LP solver and different ways to
prepare the data before passing it to coin. Always calling initial_solve() for
new problems and only creating one solver interface recovered some but not all
of the lost performance. 

The last version we looked at, reuses vectors to copy the data of constraints
into the CoinPackedMatrix used by the solver. Compared to the last report, there
are fewer tasks that are much slower (> 150% total time) but also most of the
tasks are now somewhat slower (~110%-120%).

http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v1-compare.html
http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v1-relative-scatter-time.png

The next step is to create full profiles of the different versions and compare them.
msg3220 (view) Author: florian Date: 2014-07-23.18:19:17
We also want to extend the wrapper classes for the COIN interface to exit the
process with EXIT_CRITICAL_ERROR whenever a CoinError is thrown. This way the
classes using the interface avoid an identical try/catch block around every call.
msg3219 (view) Author: florian Date: 2014-07-23.16:18:47
In the code for LP heuristics we encapsulate some classes of COIN so the rest of
the code does not have to use COIN classes and compiles even if COIN is not
present. When we merge the LP code, we should use the wrappers in the landmark
code too. The goal of this is to have all code that uses "#ifdef USE_LP" in one
place.
History
Date User Action Args
2015-03-15 14:03:36maltesetmessages: + msg4057
2015-03-15 14:02:22silvansetnosy: + silvan
messages: + msg4056
2015-03-09 17:52:12floriansetstatus: chatting -> resolved
messages: + msg4044
2015-03-08 13:08:01maltesetmessages: + msg4041
2015-03-07 11:30:18floriansetmessages: + msg4040
2015-03-06 16:33:17maltesetmessages: + msg4036
2015-03-06 15:32:05floriansetmessages: + msg4033
2014-12-29 20:29:50floriansetmessages: + msg3989
2014-12-25 12:00:01maltesetmessages: + msg3987
2014-12-24 13:25:34floriansetmessages: + msg3986
2014-12-23 14:48:27maltesetmessages: + msg3985
2014-12-23 14:38:48floriansetmessages: + msg3984
2014-12-23 12:02:09maltesetmessages: + msg3983
2014-12-23 11:56:53floriansetmessages: + msg3982
2014-12-22 22:19:03floriansetmessages: + msg3981
2014-12-22 22:09:46floriansetfiles: + callgrind.out.p09.experimental-v3
2014-12-22 22:09:18floriansetfiles: + callgrind.out.p09.default
2014-12-22 15:50:27floriansetmessages: + msg3980
2014-12-19 21:34:14floriansetfiles: + callgrind.out.experimental-v3
messages: + msg3975
2014-12-19 16:45:23maltesetmessages: + msg3972
2014-12-19 15:28:25floriansetmessages: + msg3971
2014-12-19 13:52:47floriansetmessages: + msg3970
2014-12-19 00:07:29maltesetmessages: + msg3969
2014-12-18 20:27:37floriansetmessages: + msg3968
2014-12-18 11:31:29floriansetmessages: + msg3967
2014-12-17 18:31:59floriansetmessages: + msg3960
2014-12-17 18:29:03floriansetfiles: + callgrind.out.issue433-experimental
2014-12-17 18:28:51floriansetfiles: + callgrind.out.default
2014-12-17 12:15:04floriansetmessages: + msg3956
2014-12-12 09:55:51jendriksetnosy: + jendrik
2014-12-11 18:18:06floriansettitle: Reduce number of #ifdefs in LP code -> Introduce interface class between LP user code and LP solvers
2014-07-23 18:19:17floriansetmessages: + msg3220
2014-07-23 16:18:47floriancreate