Issue 443: Introduce interface class between LP user code and LP solvers

Title	Introduce interface class between LP user code and LP solvers
Priority	wish	Status	resolved
Superseder		Nosy List	florian, jendrik, malte, silvan
Assigned To	florian	Keywords
Optional summary

Created on 2014-07-23.16:18:47 by florian, last changed by malte.

Files
File name	Uploaded	Type	Edit	Remove
callgrind.out.default	florian, 2014-12-17.18:28:51	application/octet-stream
callgrind.out.experimental-v3	florian, 2014-12-19.21:34:14	application/octet-stream
callgrind.out.issue433-experimental	florian, 2014-12-17.18:29:03	application/octet-stream
callgrind.out.p09.default	florian, 2014-12-22.22:09:18	application/octet-stream
callgrind.out.p09.experimental-v3	florian, 2014-12-22.22:09:46	application/octet-stream

Messages
msg4057 (view)	Author: malte	Date: 2015-03-15.14:03:36
Hi Silvan, see issue508, especially msg4054.
msg4056 (view)	Author: silvan	Date: 2015-03-15.14:02:22
Hey guys, I am unable to compile the master version after the integration of this issue. In file included from /usr/include/c++/4.8/memory:81:0, from lp_solver.h:7, from lp_solver.cc:1: /usr/include/c++/4.8/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp) const [with _Tp = OsiSolverInterface]’: /usr/include/c++/4.8/bits/unique_ptr.h:184:16: required from ‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = OsiSolverInterface; _Dp = std::default_delete<OsiSolverInterface>]’ lp_solver.h:97:5: required from here /usr/include/c++/4.8/bits/unique_ptr.h:65:22: error: invalid application of ‘sizeof’ to incomplete type ‘OsiSolverInterface’ static_assert(sizeof(_Tp)>0, ^ make: ** [.obj/lp_solver.release.o] Error 1
msg4044 (view)	Author: florian	Date: 2015-03-09.17:52:12
Merged and looking forward to the integration of more LP heuristics :-)
msg4041 (view)	Author: malte	Date: 2015-03-08.13:08:01
If you like the results, please merge!
msg4040 (view)	Author: florian	Date: 2015-03-07.11:30:18
I compared the current tip of the default branch with the issue branch. Results still look good: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-v4-base-issue443-v4-compare.html Should I merge?
msg4036 (view)	Author: malte	Date: 2015-03-06.16:33:17
Yes, experiments before merging would be good.
msg4033 (view)	Author: florian	Date: 2015-03-06.15:32:05
Thank you for the review. I updated the code in the bitbucket pull request with your suggestions. Should I run another experiment now?
msg3989 (view)	Author: florian	Date: 2014-12-29.20:29:50
I updated the pull request with the new code version. Could you do a review as a next step? Of course this is not urgent, I probably will not work on this until February. https://bitbucket.org/flogo/downward-issues-issue443/pull-request/1/issue443/diff The results for this revision look similar to the last experimental version (+45 coverage and almost always faster): http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-v3-compare.html I also opened issue506 for the changes we did not want to do right now.
msg3987 (view)	Author: malte	Date: 2014-12-25.12:00:01
Looks very good indeed! :-) Do the crashed nodes work again, and if not, have they been reported? (If you report them, please say that they're not urgent.) > I agree, this sounds good but can wait until later. Should I open a separate > issue for this change? Yes, sounds good. > For now, I suggest that I reimplement experimental-v4 in the main issue branch > (without all the hacks and debug code from the experimental branch) and run > experiments for that version. I prefer experimental-v4 over experimental-v5 > because it uses a clearer interface. The advantage that experimental-v5 has > (10-20%) from not copying the data is probably something that we can achieve in > the new issue when we store all data in the LP class as you suggested. Sounds good.
msg3986 (view)	Author: florian	Date: 2014-12-24.13:25:34
I added a second vector<LPConstraint> to LM-count and in each round copy all non-empty LPConstraints to this vector and use the result to generate the LP (experimental-v4). I also tried changing the interface to accept a vector<LPConstraint *> to avoid the copy (experimental-v5). The results of both version look very good: == Comparing base to experimental-v4 == http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v4-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v4.png == Comparing base to experimental-v5 == http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v5-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v5.png http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v5-no-psr.png == Comparing experimental-v4 to experimental-v5 == http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-experimental-v4-issue443-experimental-v5-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v4-v5.png http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v4-v5-relative-scatter-time-v4-v5-no-psr.png Notes: Some runs crashed on the grid because 2 nodes stopped working during the experiment. I'm ignoring this here because we will have to run another experiment for the final version anyway and I think the trend is clear enough without the crashed tasks. The scatter plots for experimental-v5 have two strongly negative points from the domain psr-small at the very left of the plot that throw of the scale a bit. I added alternative plots that leave out this domain. > I still like the idea where the actual data for the constraints lives right in > the LP class, because I think it allows some more global optimizations and > should also make it comparatively easy to add/remove/modify individual > constraints in any order. I think this should be a separate evolution step, though. I agree, this sounds good but can wait until later. Should I open a separate issue for this change? For now, I suggest that I reimplement experimental-v4 in the main issue branch (without all the hacks and debug code from the experimental branch) and run experiments for that version. I prefer experimental-v4 over experimental-v5 because it uses a clearer interface. The advantage that experimental-v5 has (10-20%) from not copying the data is probably something that we can achieve in the new issue when we store all data in the LP class as you suggested.
msg3985 (view)	Author: malte	Date: 2014-12-23.14:48:27
I still like the idea where the actual data for the constraints lives right in the LP class, because I think it allows some more global optimizations and should also make it comparatively easy to add/remove/modify individual constraints in any order. I think this should be a separate evolution step, though. For now, I suggest you go with whatever appears easiest and matches the previous performance (or at least gets close to it). Let's try to improve it in a separate step.
msg3984 (view)	Author: florian	Date: 2014-12-23.14:38:47
This was subtle. The matrix can be different in one case: if constraints at the end of the matrix are empty or the last few LP variables are not used, then the old code does not create rows/columns form them. I don't think we can do this in general, because we allow to modify an existing problem. If a variable is not used, but a constraint that is added later uses it, we should make it part of the LP from the beginning. I think a solution for LM-count is to remove empty constraints in the LM-count code and then just pass non-empty constraints to the LP. I'm not quite sure how to make this efficient. We could change the interface to allow passing a vector<LPConstraint *> instead of a vector<LPConstraint> but this seems a bit awkward if we have some constraints and just want to pass them to the LP as they are.
msg3983 (view)	Author: malte	Date: 2014-12-23.12:02:09
Odd. You could dump the matrices and bounds (or a hash value of the contents?) to see if they are really identical?
msg3982 (view)	Author: florian	Date: 2014-12-23.11:56:52
I ran another profile for a version that normalizes the matrix in both the default and experimental branch before loading it into Cplex. After normalization, the matrix is column-ordered, there are no gaps, and each column is sorted. The only way that I see how the call to Cplex can behave differently is if the matrix has different coefficients but this should lead to different heuristic values at least in some cases. Still, the profile of the experimental version has more samples in almost all methods. This is most significant in "CPXlpopt" (91050 vs. 52218) which is called by initial_solve(), but copying the LP ("CPXcopylp") also takes longer (3793 vs. 2728).
msg3981 (view)	Author: florian	Date: 2014-12-22.22:19:03
Profiles for the opt11 task are somewhat confusing. The time spent in Cplex methods is significantly different between the two version. However, the number of evaluations, expansions, etc. is the same, so I think it still calculates the same heuristic. The new version should call the same Cplex methods with the same parameters. The difference should only affect our wrapper and the OSI-layer.
msg3980 (view)	Author: florian	Date: 2014-12-22.15:50:27
The results of experimental-v3 (intitial_solve, vectors, reuse temporary objects) look better than before but are still worse than the baseline on average. http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v3-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v3-relative-scatter-time.png We previously profiled sokoban task opt08/p04, which now runs faster than the baseline. I'll redo the profile on one of the sokoban tasks that still have worse performance (e.g. opt11/p09) and try to find what's going on.
msg3975 (view)	Author: florian	Date: 2014-12-19.21:34:14
Re-using LPConstraints and the vectors for bounds and objective shows a clear improvement in the profile. base experimental-v3 diff cost_sharing_h_value 73711 74196 -485 (- 0.7%) assign wrapper - 14512 assign 7606 7485 +121 (+ 1.6%) matrix 5382 4193 +1189 (+22.1%) self(wrapper) 0 2834 -2834 initialSolve 58139 58352 -213 (- 0.4%) self(cost_sharing) 2584 1332 +1252 (+48.4%) The new version (experimental-v3) only loses time compared to the base version while copying the data into the temporary objects. The time mentioned under "self(wrapper)" is the time spent in the method LpSolver::assign_problem() which prepares the data for OSI. I'll run a larger set of experiments on this version.
msg3972 (view)	Author: malte	Date: 2014-12-19.16:45:23
> I think using initialSolve() after assigning a new problem is a natural thing > to do That's true, and it's good to stick to that while we're trying to find out why performance dropped. In the big picture, though, it may be cleaner and faster to reuse the LP in the lmcount heuristic, so that may be something worth looking at later.
msg3971 (view)	Author: florian	Date: 2014-12-19.15:28:25
I had a look at the absolute numbers in the profiles. The following table gives an overview. The numbers are in "million profile samples" and each percentage is the difference relative to the measurement in "base". base experimental diff cost_sharing_h_value 73711 84454 -10743 (-14.5%) assign wrapper - 15113 assign 7606 7685 -79 (- 1.0%) matrix 5382 4179 +1203 (+22.4%) self(wrapper) 0 3249 -3249 initialSolve 58139 58365 -226 (- 0.4%) create temporaries - 5102 -5102 LPConstraint::insert() remove temporaries - 2019 -2019 ~LPConstraint() self(cost_sharing) 2584 3855 -1271 (-49.2%) Almost no change in assign() and initialSolve() make sense, because they are called with the same parameters. Creating the matrix is faster in the experimental version because the parameter "colordered" has a bad setting in the default branch. The remaining difference seems to come from temporary objects in the assign wrapper and in the cost partitioning (LPConstraints). I'll try to reuse LPConstraint objects to avoid recreating them in every step.
msg3970 (view)	Author: florian	Date: 2014-12-19.13:52:47
> All our new versions still lose some performance in the generation of the matrix > for the constraint solver, but we recover that (at least for score_total_time, > not for coverage) by using "resolve" instead of "initialSolve". Does all this > sound correct? Using resolve() is very beneficial for some tasks and much slower for others, but on average it seems to help, yes. > This means that perhaps for a fair comparison to the baseline it might be worth > using a configuration with "initialSolve" and trying to get that close in > performance to the baseline, so that we really see our overhead in matrix > generation. Or maybe we could look at a variant of the baseline that uses > resolve to compare to that. What do you think? I think using initialSolve() after assigning a new problem is a natural thing to do, so I prefer to look for an efficient version of that. One way to get closer to the baseline would be to generate matrix entries column-by-column instead of row-by-row. I suggest that I try this with an interface where the user can just add one column at a time to see if this is faster. If it is, we can think of a way to get this performance with a nicer interface.
msg3969 (view)	Author: malte	Date: 2014-12-19.00:07:29
OK, so I think we can roughly summarize this as follows. We are mostly varying two dimensions: how to solve ("resolve" vs. "initial") and how to generate the matrix ("base matrix" vs. "COIN vectors" vs. "vectors"). Comparing our five versions and focusing on score_total_time, I think we have, from best to worst: experimental-v2 ("vectors/resolve"): best baseline ("base/initial"): second-best v1 ("COIN vectors/resolve"), experimental-v1 ("vectors/initial"): middle category; close to each other v2 ("COIN vectors/initial"): clearly worst This suggests a rather clear picture with the following total orderings: "base" better than "vectors" better than "COIN vectors" "resolve" better than "initial" All our new versions still lose some performance in the generation of the matrix for the constraint solver, but we recover that (at least for score_total_time, not for coverage) by using "resolve" instead of "initialSolve". Does all this sound correct? This means that perhaps for a fair comparison to the baseline it might be worth using a configuration with "initialSolve" and trying to get that close in performance to the baseline, so that we really see our overhead in matrix generation. Or maybe we could look at a variant of the baseline that uses resolve to compare to that. What do you think?
msg3968 (view)	Author: florian	Date: 2014-12-18.20:27:37
Results for v2 are generally worse than before: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-v2-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-v2-relative-scatter-time.png
msg3967 (view)	Author: florian	Date: 2014-12-18.11:31:29
Experimental version v2 is like the experimental version v1 but always uses resolve() instead of initial_solve(). The results compared to the base version are better on average but the variance is much higher. This explains why the regular version v1 had a larger variance (it also used resolve() instead of initial_solve()). http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v2-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v2-relative-scatter-time.png Since the number of configurations is getting confusingly large, here is a summary: v1: Uses CoinShallowPackedVectors to create the matrix and resolve() to solve the LP. experimental-v1: (Re-)uses vectors (elements, indices, starts, lengths) to create the matrix and initial_solve() to solve the LP. experimental-v2: (Re-)uses vectors (elements, indices, starts, lengths) to create the matrix and resolve() to solve the LP. v2: Uses CoinShallowPackedVectors to create the matrix and intial_solve() to solve the LP. Experiments for v2 are running.
msg3960 (view)	Author: florian	Date: 2014-12-17.18:31:59
I added the profiles of the base and experimental version. I'll now set up experiments for the experimental version without initial_solve() and a version based on issue443-v1 that uses initial_solve().
msg3956 (view)	Author: florian	Date: 2014-12-17.12:15:04
We evaluated a first implementation, which unfortunately performed worse than the original: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-v1-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-v1-relative-scatter-time.png We then experimented with different calls to the LP solver and different ways to prepare the data before passing it to coin. Always calling initial_solve() for new problems and only creating one solver interface recovered some but not all of the lost performance. The last version we looked at, reuses vectors to copy the data of constraints into the CoinPackedMatrix used by the solver. Compared to the last report, there are fewer tasks that are much slower (> 150% total time) but also most of the tasks are now somewhat slower (~110%-120%). http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-base-issue443-experimental-v1-compare.html http://ai.cs.unibas.ch/_tmp_files/pommeren/issue443-issue443-experimental-v1-relative-scatter-time.png The next step is to create full profiles of the different versions and compare them.
msg3220 (view)	Author: florian	Date: 2014-07-23.18:19:17
We also want to extend the wrapper classes for the COIN interface to exit the process with EXIT_CRITICAL_ERROR whenever a CoinError is thrown. This way the classes using the interface avoid an identical try/catch block around every call.
msg3219 (view)	Author: florian	Date: 2014-07-23.16:18:47
In the code for LP heuristics we encapsulate some classes of COIN so the rest of the code does not have to use COIN classes and compiles even if COIN is not present. When we merge the LP code, we should use the wrappers in the landmark code too. The goal of this is to have all code that uses "#ifdef USE_LP" in one place.

History
Date	User	Action	Args
2015-03-15 14:03:36	malte	set	messages: + msg4057
2015-03-15 14:02:22	silvan	set	nosy: + silvan messages: + msg4056
2015-03-09 17:52:12	florian	set	status: chatting -> resolved messages: + msg4044
2015-03-08 13:08:01	malte	set	messages: + msg4041
2015-03-07 11:30:18	florian	set	messages: + msg4040
2015-03-06 16:33:17	malte	set	messages: + msg4036
2015-03-06 15:32:05	florian	set	messages: + msg4033
2014-12-29 20:29:50	florian	set	messages: + msg3989
2014-12-25 12:00:01	malte	set	messages: + msg3987
2014-12-24 13:25:34	florian	set	messages: + msg3986
2014-12-23 14:48:27	malte	set	messages: + msg3985
2014-12-23 14:38:48	florian	set	messages: + msg3984
2014-12-23 12:02:09	malte	set	messages: + msg3983
2014-12-23 11:56:53	florian	set	messages: + msg3982
2014-12-22 22:19:03	florian	set	messages: + msg3981
2014-12-22 22:09:46	florian	set	files: + callgrind.out.p09.experimental-v3
2014-12-22 22:09:18	florian	set	files: + callgrind.out.p09.default
2014-12-22 15:50:27	florian	set	messages: + msg3980
2014-12-19 21:34:14	florian	set	files: + callgrind.out.experimental-v3 messages: + msg3975
2014-12-19 16:45:23	malte	set	messages: + msg3972
2014-12-19 15:28:25	florian	set	messages: + msg3971
2014-12-19 13:52:47	florian	set	messages: + msg3970
2014-12-19 00:07:29	malte	set	messages: + msg3969
2014-12-18 20:27:37	florian	set	messages: + msg3968
2014-12-18 11:31:29	florian	set	messages: + msg3967
2014-12-17 18:31:59	florian	set	messages: + msg3960
2014-12-17 18:29:03	florian	set	files: + callgrind.out.issue433-experimental
2014-12-17 18:28:51	florian	set	files: + callgrind.out.default
2014-12-17 12:15:04	florian	set	messages: + msg3956
2014-12-12 09:55:51	jendrik	set	nosy: + jendrik
2014-12-11 18:18:06	florian	set	title: Reduce number of #ifdefs in LP code -> Introduce interface class between LP user code and LP solvers
2014-07-23 18:19:17	florian	set	messages: + msg3220
2014-07-23 16:18:47	florian	create

Issue443