Issue 960: Add debugging methods to LP solver interface

Title	Add debugging methods to LP solver interface
Priority	feature	Status	resolved
Superseder		Nosy List	augusto, florian, jendrik, malte, rik
Assigned To	rik	Keywords
Optional summary

Created on 2020-01-24.16:59:07 by rik, last changed by jendrik.

Files
File name	Uploaded	Type	Edit	Remove
issue960-experiments.zip	rik, 2020-09-21.16:32:41	application/x-zip-compressed
issue960-v3-experiments.zip	rik, 2020-09-29.11:43:13	application/x-zip-compressed
lp_internals.cc.diff	florian, 2020-02-27.16:26:07	text/x-patch
lp_solver.cc.diff	florian, 2020-02-27.16:25:57	text/x-patch

Messages
msg9744 (view)	Author: jendrik	Date: 2020-10-23.14:07:30
I added a changelog entry for this issue on the main branch. Feel free to edit/expand.
msg9743 (view)	Author: florian	Date: 2020-10-22.15:29:35
Warnings like these depend on the compiler version, so it is normal that you don't get some warnings locally that we see on the buildbot (that is why we test the build with different compilers). Looks like the buildbot is happy now, so I'll close the issue. Thanks again.
msg9742 (view)	Author: rik	Date: 2020-10-22.14:32:36
I already noticed the same problem and alleviated the problem and pushed a minute ago. I tried compiling the code locally after fixing the style issues, but didn't receive so much as a warning on my machine.
msg9741 (view)	Author: florian	Date: 2020-10-22.14:25:04
There is another error about an unused field "infinity". I think this is about the field in the class LinearProgram that we planned to add to avoid passing it in parallel all the time. We didn't do that in the end because it would have been a larger refactoring. Please just remove the field. http://buildbot.fast-downward.org/#/builders/10/builds/598
msg9740 (view)	Author: florian	Date: 2020-10-20.21:44:43
one more problem just popped up: looks like the style test do not pass. Rik, can you run them locally with tox and fix the issue? http://buildbot.fast-downward.org/#/builders/4/builds/288
msg9739 (view)	Author: malte	Date: 2020-10-20.21:43:40
It looks like the buildbot found a style violation. I did not look at the details.
msg9738 (view)	Author: florian	Date: 2020-10-20.21:42:56
Rik merged the change. There is one last small issue: the commit message of the merge was autogenerated by GitHub and broke our "[branchname] commit message" convention. I tried to fix it and force-pushed the change. If anyone pulled between 17:50 and now, please update your local copy. Rik, could you check the history in the main repository? If everything looks good to you, please make another update here to close the issue (set the field Status above to "resolved" and submit). Thanks to everyone involved in this. It'll be nice to have these debugging features available in the future.
msg9737 (view)	Author: malte	Date: 2020-10-20.17:18:51
Sounds good to me.
msg9736 (view)	Author: rik	Date: 2020-10-20.17:17:35
We have completed what this time really should be the last round of code review and I implemented some minor stylistic changes suggested by Florian and Malte. The code should now be ready to merge.
msg9735 (view)	Author: florian	Date: 2020-10-20.11:31:53
We discussed this live and decided not to care about the change in memory usage. Malte had some additional comments on the pull request (I added them in GitHub). After they are done and reviewed by Jendrik or me, we can merge.
msg9734 (view)	Author: jendrik	Date: 2020-10-20.10:46:43
I agree.
msg9733 (view)	Author: rik	Date: 2020-10-20.10:37:29
I have also tried to replicate memory issues by running the same configuration locally. The memory usage was not consistent with the memory usage on the grid. This goes for the new as well as the base version. However, the memory usage on the grid as well as locally does seem to be deterministic. I used version 0.107.9 of Osi and 3.1.1 of Soplex on both machines. Florian and I are not quite sure what to make of this, but feel like the difference is minor enough and does not noticeably skew in one direction that this could be merged regardless. Jendrik, would you agree to this?
msg9725 (view)	Author: rik	Date: 2020-09-29.11:43:12
I have rerun the experiments after incorporating all of the changes and the results look much the same. Evaluations and expansions are the same for the base and v3 version. Coverage once again went up slightly. Memory usage and search time show the same patterns as in the previous experiment. As announced, I ran a copy of the diverse potentials heuristic with soplex and compared it with itself. The search time shows the same phenomenon where most fall within 5% but a few instances are either almost twice as fast or twice as slow, so this seems to have some cause that does not relate to this issue. The memory usage is very consistent between the two copies however. It seems like the variability is caused by the changes in this issue. The versions with cplex as an lp solver does not show this variability in memory usage, so I imagine it's something subtle that influences the memory usage of soplex. The full results can be found in issue960-v3-experiments.zip
msg9724 (view)	Author: florian	Date: 2020-09-22.18:40:02
I did another iteration through the changes and noticed a few more small things. I also saw a few comments by Jendrik that were still open. I think in the "conversation" tab they are hidden inside a "more comments" block. In the "Files changed" tab everything shows up as expected. Once you resolved all open conversations, feel free to just run the experiments. No need to wait for an OK from us.
msg9723 (view)	Author: rik	Date: 2020-09-22.17:05:28
Thanks for the review Jendrik. I have addressed everything you brought up and run all style checks. If you and Florian sign off on this version, I can rerun the experiments to make sure this has not affected performance. I will also run the soplex variant of the potential heuristic twice to see if the variation in memory usage we see there is really just noise.
msg9719 (view)	Author: jendrik	Date: 2020-09-22.10:19:30
Ah, I forgot about that incompatibility. Tieing the names to the log level sounds like a good idea.
msg9718 (view)	Author: florian	Date: 2020-09-22.00:20:42
Thanks Jendrik. I don't think having names in debug mode would work because LP builds are incompatible with the debug build since the time we added the _GLIBCXX_DEBUG flag (the LP libraries are compiled without this flag which makes them incompatible on a binary level). We could tie it to the log-level in specific LP instances and have names only in non-silent log levels. But this shouldn't affect the current code.
msg9717 (view)	Author: jendrik	Date: 2020-09-21.23:54:42
I reviewed the code and left some comments. On a more general note: it seems the current design assumes that users name LP variables and constraints only when they're debugging something and remove the names afterwards. I think we should encourage users to leave the names in permanently, but only in debug builds. On never knows when it's time for debugging again and this also helps documenting the LP. I don't think any code changes are necessary for this, but we could rephrase the comment that explicitly warns against using names in "production code". I think we should also add names (in debug mode) for at least one of the files that uses the LP code.
msg9716 (view)	Author: florian	Date: 2020-09-21.17:32:17
The memory usage looks fine for all configurations except for the SoPlex variant of potential heuristics. There, the results are a bit surprising: there are some tasks (mostly freecell and spider) where the memory usage increases a lot (e.g., 148MB -> 194MB) although the number of expansions is the same. At the same time, there are tasks where the usage decreases and there does not seem to be a trend in one direction or the other. I would have expected that the potential heuristic found in the two runs might be different but since the evaluations are the same, this cannot be the cause. The timing looks very similar for all configurations: most tasks behave as expected but then some tasks vary a lot stronger than what I would expect: their factors are in the range of 0.5 - 2.0 instead of the range of 0.95 - 1.05. Again, there are cases with positive and negative impact and no clear trend. I cannot explain the higher variance. Did someone observe this in other experiments? Augusto, do you know what's up with the translator segfaults in organic synthesis? Anyway, since the patch does not touch the translator, it seems to be unrelated. Despite the strange variations, I think we could merge the patch if we don't spot anything in the review as the results do not seem systematically worse. I already reviewed the code. Jendrik, could you do a second iteration?
msg9715 (view)	Author: rik	Date: 2020-09-21.16:33:09
The pull request has now moved to github. Here is the new link: https://github.com/aibasel/downward/pull/12 I have run experiments to compare the performance of the new implementation of the LP wrapper class to the old one when the debugging functionality is not used. I tested it with three different heuristics which utilize lp solvers both with cplex and soplex as an lp solver. There seems to be no meaningful impact on performance. Coverage actually went up slightly (this is definitely a coincidence). The number of evaluations and expansions is identical for all domains and configurations, so we can be reasonably confident that this change did not affect the calculated heuristic values. There seems to be no significant impact on search time. For any given configuration, the score_total_time is at most 0.1% lower and for some configurations the summed score was slightly higher. The relative scatter plots show that the vast majority of problems differed by less than 15% in terms of total time. The plots indicate that this is down to noise. There is a strange phenomenon which causes a significant minority of problems to either be about 1.8x faster or slower across all configurations, domains, and problem sizes. There is a small but measurable impact on memory usage. Every configurations sees an increase in memory use on the order of 0.1%. The scatter plots show that there are very few outliers with. Only the diverse potentials heuristic with soplex as an lp solver contains runs where the memory usage differed by more than 5% and all of those problems were rather small. This configurations also saw some runs where the new version used more than 10% less memory. The experiments contain 20 unexplained errors (out of over 21,000 runs), but none of these relate to this issue in any way. 12 of these errors were due to the translator segfaulting while translating p11, p12 or p18 of the organic- synthesis-opt18-strips domain and the other 8 were down to the planner producing too much output when solving p12 and p14 of the parcprinter-08-strips domain when using the lmcount heuristic. The full tables and charts can be found in issue960-experiments.zip.
msg9211 (view)	Author: rik	Date: 2020-03-12.17:14:51
I have added a few more commits to the pull request to address the issues brought up in the way Florian and I agreed upon. Here is the link once again: https://bitbucket.org/rikdegraaff/fast-downward- issues/pull-requests/1/issue960/diff
msg9204 (view)	Author: florian	Date: 2020-02-27.16:25:56
Since we offering more information on the status of the LP, we should also add methods to test for infeasible and unbounded LPs. There is also a try-catch missing around create_lp_solver that we should add now. Additionally, adding methods to dump a constraint, a variable, a set of constraints, a set of variables, or a full LP, would be helpful. And finally, there is a CPLEX error that can sneak in as an unexplained error when CPLEX runs out of memory in the wrong place, we can fix that, too. I'll attach patches for these things but please don't take the code for dumping the constraints verbatim. It should be a function that also takes a double ("infinity") and only print bounds if they are not infinite.
msg9200 (view)	Author: florian	Date: 2020-02-14.17:09:00
I think adapting the operator-counting framework and the other places where we currently use LPs should be straight-forward. Actually, we might want to take this idea a step further: instead of two classes LPVariables and LPConstraints, we could have a single class LinearProgram that contains all required data. We could generate proxies for variables and constraints from it that in turn have the necessary getters and setters. We discussed something in this direction a while ago where we actually wanted to make these objects smarter, so you could for example change a bound on a constraint object and then re-solve the LP and the classes would keep track of the required changes internally. I think all of this would be too much to do in this issue (if we want to do it, we should create a new issue for it) but adding a LinearProgram with proxies for variables and constraints would be a step in this direction. Rik, to see an example of what I mean, you can have a look at the classes TaskProxy, VariablesProxy, and VariableProxy. We can also discuss this live if you prefer.
msg9199 (view)	Author: rik	Date: 2020-02-14.14:09:57
Adding the LPVariables and LPConstraints classes seems like the best option to me. It does lose some flexibility over simply having a vector of something. Considering that I don't think what I'm working on in the context of time-unrolling (which generates and modifies an LP in multiple phases) would require too much of a rewrite to be compatible with this system, I suppose the benefits outweigh the costs. Do you think adapting the operator counting framework would present a significant challenge, or would that be very straightforward? For posterity's sake, the solution I suggested was adding a separate, overloaded load_problem() function which accepts vector<NamedLPVariable>. The upside is that this has no impact on existing code and is incredibly easy to implement. The downside is that switching from debugging to not using debugging would be cumbersome because you need to adapt any function which touches the variabkes and constraints.
msg9191 (view)	Author: florian	Date: 2020-02-08.13:21:18
Ahh, of course another option would be to hide all this code with a preprocessor switch that is off by default and you'd just enable it for debugging. But so far, we mostly avoided this. Something like this: class LPVariable { // ... #ifdef LP_DEBUG string name; #endif }
msg9190 (view)	Author: florian	Date: 2020-02-08.13:17:19
I thought the size of empty string would be smaller but it looks like it would roughly double the size of LPVariable and LPConstraint in gcc > 5. (Interesting article, by the way: https://shaharmike.com/cpp/std-string/). We could reduce the overhead to a single pointer by using a unique_ptr<string> but it would still be an overhead. We could also go for functions in the LPSolver class that forward the setRowName and setColName functions. The disadvantage I see there is that these functions only make sense after loading the LP into the solver and our code often splits the generation of constraints from the loading. So the code that generates the constraints could not easily set the names for the variables and names it creates. Maybe another option would be to generate parallel structures LPVariableNames and LPConstraintNames similar to the way we generate names for enum options in the option parser. As for the option before, this would mean that existing code is not easy to adapt because we would have to modify the types of data passed between the class that generates LP variables and constraints and the class that uses them. The main use case I have in mind is when we already have code like this: vector<LPVariable> variables = generate_variables(task); vector<LPConstraint> constraints = constraint_generator.generate(task) lp_solver.load(variables, constraints); and now want to add names inside the methods generate_variables and constraint_generator.generate for debugging. Without adding a field to LPVariable and LPConstraint, we'd have to modify the interface if we want to debug this code. How about adding a new class like this: class LPVariables { vector<LPVariable> variables; vector<string> names; public: add_variable(double lower_bound, double upper_bound, double objective, string name = ""); } and replacing the existing cases of vector<LPVariable> with it (and analogously for constraints of course)?
msg9189 (view)	Author: malte	Date: 2020-02-07.17:49:46
Makes sense. Feel free to involve me in the discussion if you cannot find something that makes you sufficiently happy, but only then. (Much on the plate at the moment.)
msg9188 (view)	Author: rik	Date: 2020-02-07.17:47:57
I initially implemented a version which had no impact when it is not used, but Florian found it too cumbersome; perhaps I could try and find a different approach which offers the best of both worlds with him?
msg9187 (view)	Author: malte	Date: 2020-02-07.17:43:46
I am not sure we want to pay for storing a string with every constraint whether or not we use the feature. Usually we try to make debug features more lightweight.
msg9186 (view)	Author: rik	Date: 2020-02-07.17:36:23
I have created a pull request for this issue: https://bitbucket.org/rikdegraaff/fast-downward-issues/pull-requests/1/issue960/diff
msg9178 (view)	Author: rik	Date: 2020-01-24.16:59:07
We want to add the ability to assign custom variable and constraint names and write out the LP to analyse errors.

History
Date	User	Action	Args
2020-10-23 14:07:30	jendrik	set	messages: + msg9744
2020-10-22 15:29:35	florian	set	status: in-progress -> resolved messages: + msg9743
2020-10-22 14:32:36	rik	set	messages: + msg9742
2020-10-22 14:25:04	florian	set	messages: + msg9741
2020-10-20 21:44:44	florian	set	messages: + msg9740
2020-10-20 21:43:40	malte	set	messages: + msg9739
2020-10-20 21:42:56	florian	set	messages: + msg9738
2020-10-20 17:18:51	malte	set	messages: + msg9737
2020-10-20 17:17:35	rik	set	messages: + msg9736
2020-10-20 11:31:53	florian	set	messages: + msg9735
2020-10-20 10:46:43	jendrik	set	messages: + msg9734
2020-10-20 10:37:30	rik	set	messages: + msg9733
2020-09-29 11:43:14	rik	set	files: + issue960-v3-experiments.zip messages: + msg9725
2020-09-22 18:40:02	florian	set	messages: + msg9724
2020-09-22 17:05:28	rik	set	messages: + msg9723
2020-09-22 10:19:30	jendrik	set	messages: + msg9719
2020-09-22 00:20:42	florian	set	messages: + msg9718
2020-09-21 23:54:42	jendrik	set	messages: + msg9717
2020-09-21 17:32:17	florian	set	messages: + msg9716
2020-09-21 16:33:09	rik	set	messages: + msg9715 summary: The pull request has now moved to github. Here is the new link: https://github.com/aibasel/downward/pull/12 I have run experiments to compare the performance of the new implementation of the LP wrapper class to the old one when the debugging functionality is not used. I tested it with three different heuristics which utilize lp solvers both with cplex and soplex as an lp solver. There seems to be no meaningful impact on performance. Coverage actually went up slightly (this is definitely a coincidence). The number of evaluations and expansions is identical for all domains and configurations, so we can be reasonably confident that this change did not affect the calculated heuristic values. There seems to be no significant impact on search time. For any given configuration, the score_total_time is at most 0.1% lower and for some configurations the summed score was slightly higher. The relative scatter plots show that the vast majority of problems differed by less than 15% in terms of total time. The plots indicate that this is down to noise. There is a strange phenomenon which causes a significant minority of problems to either be about 1.8x faster or slower across all configurations, domains, and problem sizes. There is a small but measurable impact on memory usage. Every configurations sees an increase in memory use on the order of 0.1%. The scatter plots show that there are very few outliers with. Only the diverse potentials heuristic with soplex as an lp solver contains runs where the memory usage differed by more than 5% and all of those problems were rather small. This configurations also saw some runs where the new version used more than 10% less memory. The experiments contain 20 unexplained errors (out of over 21,000 runs), but none of these relate to this issue in any way. 12 of these errors were due to the translator segfaulting while translating p11, p12 or p18 of the organic- synthesis-opt18-strips domain and the other 8 were down to the planner producing too much output when solving p12 and p14 of the parcprinter-08-strips domain when using the lmcount heuristic. The full tables and charts can be found in issue960-experiments.zip. ->
2020-09-21 16:32:42	rik	set	files: + issue960-experiments.zip summary: The pull request has now moved to github. Here is the new link: https://github.com/aibasel/downward/pull/12 I have run experiments to compare the performance of the new implementation of the LP wrapper class to the old one when the debugging functionality is not used. I tested it with three different heuristics which utilize lp solvers both with cplex and soplex as an lp solver. There seems to be no meaningful impact on performance. Coverage actually went up slightly (this is definitely a coincidence). The number of evaluations and expansions is identical for all domains and configurations, so we can be reasonably confident that this change did not affect the calculated heuristic values. There seems to be no significant impact on search time. For any given configuration, the score_total_time is at most 0.1% lower and for some configurations the summed score was slightly higher. The relative scatter plots show that the vast majority of problems differed by less than 15% in terms of total time. The plots indicate that this is down to noise. There is a strange phenomenon which causes a significant minority of problems to either be about 1.8x faster or slower across all configurations, domains, and problem sizes. There is a small but measurable impact on memory usage. Every configurations sees an increase in memory use on the order of 0.1%. The scatter plots show that there are very few outliers with. Only the diverse potentials heuristic with soplex as an lp solver contains runs where the memory usage differed by more than 5% and all of those problems were rather small. This configurations also saw some runs where the new version used more than 10% less memory. The experiments contain 20 unexplained errors (out of over 21,000 runs), but none of these relate to this issue in any way. 12 of these errors were due to the translator segfaulting while translating p11, p12 or p18 of the organic- synthesis-opt18-strips domain and the other 8 were down to the planner producing too much output when solving p12 and p14 of the parcprinter-08-strips domain when using the lmcount heuristic. The full tables and charts can be found in issue960-experiments.zip.
2020-04-15 17:17:23	augusto	set	nosy: + augusto
2020-03-12 17:14:51	rik	set	messages: + msg9211
2020-02-27 16:26:07	florian	set	files: + lp_internals.cc.diff
2020-02-27 16:25:57	florian	set	files: + lp_solver.cc.diff messages: + msg9204
2020-02-14 17:09:00	florian	set	messages: + msg9200
2020-02-14 14:09:57	rik	set	messages: + msg9199
2020-02-08 13:21:18	florian	set	messages: + msg9191
2020-02-08 13:17:20	florian	set	messages: + msg9190
2020-02-07 17:49:46	malte	set	messages: + msg9189
2020-02-07 17:47:57	rik	set	messages: + msg9188
2020-02-07 17:43:47	malte	set	messages: + msg9187
2020-02-07 17:36:23	rik	set	messages: + msg9186
2020-01-27 13:38:07	rik	set	status: unread -> in-progress
2020-01-24 16:59:07	rik	create

Issue960