Issue983

Title Add operator-counting constraints for h+
Priority wish Status resolved
Superseder Nosy List clemens, florian, jendrik, malte, remo
Assigned To florian Keywords
Optional summary
depends on issue891 (for calculating h+)

Created on 2020-10-24.01:34:57 by florian, last changed by florian.

Summary
depends on issue891 (for calculating h+)
Files
File name Uploaded Type Edit Remove
cplex_commands florian, 2022-02-05.10:17:44 application/octet-stream
problem.lp florian, 2022-02-05.10:17:36 application/octet-stream
small.sas florian, 2022-02-05.10:17:09 application/octet-stream
Messages
msg10594 (view) Author: florian Date: 2022-02-11.09:01:03
Yes, maybe it is ignored by txt2tags. It shows up fine when written as "h^+^".
msg10588 (view) Author: malte Date: 2022-02-10.19:24:49
Doesn't this pass to an internal representation (text2tags)? Or is this something that text2tags doesn't touch? I think h+ would be OK. Everything that shows up as h+ or as h with a superscript plus on the wiki works for me.
msg10584 (view) Author: florian Date: 2022-02-10.18:03:15
We used h^+^ in the options and set up a follow-up issue (issue1047) for a better error when using SoPlex as a MIP solver.

With that, I think we can close this.
msg10578 (view) Author: florian Date: 2022-02-10.15:27:02
I merged and all checks passed. Thanks everyone.

One small hitch: on the website h^+ shows up wrong (https://www.fast-downward.org/Doc/ConstraintGenerator#Delete_relaxation_constraints). The moinmoin syntax for that would be "h^+^" so with a trailing ^. Should we use this in the internal documentation or would h+ or h\^+ be better?
msg10577 (view) Author: malte Date: 2022-02-10.15:02:58
I think reporting the wrong configuration is our duty, we shouldn't leave it to OSI. So yes, it would be good to open an issue for this.

Regarding where/how to document this: I think it's best to have this as part of the automatically generated options rather than, say, a wiki page like the old LPBuildInstructions.

To me the obvious place is https://www.fast-downward.org/Doc/Evaluator#Operator_counting_heuristic, but I think this is currently a mess because the description of the heuristic presents this as an LP thing, not an LP/MIP thing, and also the option is called lpsolver, but that's not really what it is. If it weren't for these issues, I think a reasonable place would be the list of LP/MIP solvers there, where we already have one line per solver. We could add the info there which ones do or don't support IPs. But that makes little sense in other places where we use LP solvers. The option parsing code probably needs a distinction between LP and MIP solvers somewhere to generate different info for the two cases.

Regarding SCIP support: perhaps this should be connected to the idea of moving away from OSI. We already have an issue there, right? Or an LP/IP/MIP solver metaissue?
msg10570 (view) Author: florian Date: 2022-02-10.10:23:56
The code exits with a message from OSI saying that branch and bound is not available in SoPlex. Not the best way of exiting but I think it is reasonably clear. Should I open an issue for a better way of reporting this error?

I don't think our documentation mentions it. Where should we put this?

Regarding SCIP, Remo found a confusing statement in the SoPlex documentation that says that branch and bound is only available through some interfaces and mentions OSI as an example. But I don't see a way in OSI to do this. The SoPlex interface just produces the warning when you try to call branch and bound. The other OSI Interfaces are all for different solvers, none of them SCIP.
msg10564 (view) Author: malte Date: 2022-02-09.18:25:05
> We realized that SoPlex does not support IPs.

That's good to know. ;-) Does the code deal gracefully with this? Does our documentation mention this?

It's a pity because it means that the binary releases of Fast Downward won't have full functionality. Would it be possible to change this (e.g., integrate SCIP)? If yes, I would suggest opening an issue for this.
msg10563 (view) Author: malte Date: 2022-02-09.18:19:56
If you're happy with it, I don't need to look at it again. My main concern was whether the mip gap setting changed global behaviour rather than local behaviour so that one component could break another one, and from what you wrote and the previous version of the patch I looked at, this all looked good.

BTW, I wouldn't prefix change log entries with "for users:". The change log is written for users by default, and they are addressed by default. We use prefixes like "for developers" to make clear to the users that they can ignore certain entries unless they are developers.

There were already existing examples of "for users", but I think they have all been added after the last release and are hence not finalized. Our general policy is that we edit the change log entries at release time for clarity, logical grouping and similar aspects.
msg10556 (view) Author: florian Date: 2022-02-09.14:20:25
Thanks for the review Remo.

I went through Malte's comments on Github and think that Clemens' changes from this morning already fixed everything that required changing. I answered the remaining points but don't think anything needs to change there. One thing we could do as follow-up is now issue1046.

From my side, this is ready to be merged. Malte, are you OK with this?
msg10553 (view) Author: remo Date: 2022-02-09.10:36:13
We realized that SoPlex does not support IPs. Thus we don't have to check whether the tolerance problem we experienced with CPLEX also occurs with SoPlex.
msg10551 (view) Author: remo Date: 2022-02-08.22:48:26
I compared the constraints that are constructed in 
operator_counting/delete_relaxation_constraints.cc with the description of 
the basic IP formulation of a delete-free task in "Imai and Fukanaga (2015): On 
a Practical, Integer-Linear Programming Model 
for Delete-Free Tasks and its Use as a Heuristic for Cost-Optimal Planning. 
JAIR, 54:631-677". I concluded that the 
constraints in our implementation are equivalent to those presented in the 
paper.
msg10549 (view) Author: clemens Date: 2022-02-08.22:07:45
Here are the experimental results:
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue983-v2-cplex-issue983-v1-issue983-v2-compare.html

The analysis compares the two versions v1 and v2 where the main difference between the two is setting the MIP gap to 0 when using CPLEX in v2. We had the hypothesis that operator-counting configurations should not be affected negatively in most cases, since the default MIP gap is too small to make a difference in the heuristic values except in domains with huge operator costs (e.g. parcprinter where this issue originated) or really long plans.

The hypothesis is confirmed in most configurations, as the scores (especially time scores) are virtually unaffected except in the case of "post-hoc-opt-int". There, we lose 8 tasks in the overall coverage and the scores decrease between 2 and 5 points in the summary. Looking at individual tasks, we find that the main reason is indeed parcrpinter where tasks are solved slower (sometimes timing out) and require more memory, whereas before these tasks were solved within seconds. We assume the (potentially inadmissible) heuristic pruned a part of the search tree in v1 that is now fully expanded in v2 due to the exact and correct heuristic value.

After the changes towards v2, the initial heuristic values are now all admissible (except in trucks-strips where the translator is non-deterministic).

The unexplained errors are mainly due to v1 (as discussed in msg9754) and do not occur in v2 anymore.

I hope I mentioned the most relevant observations. We think the experiment data looks good enough to merge, despite the performance drop in post-hoc-opt-int as it is caused by a bug that is now fixed. We should definitely do a cleanup in the code before merging, though. Furthermore, we discussed to start an experiment checking whether SoPlex yields admissible initial heuristic values or whether there might be a similar MIP gap issue there. (Reading the documentation didn't reveal anything obvious.)
msg10534 (view) Author: malte Date: 2022-02-08.13:18:24
I looked at the parts of the pull request that are not related to h+ (i.e., the changes to existing files) and left some comments. Apologies for the many messages, I probably should have used "start review" instead.
msg10526 (view) Author: florian Date: 2022-02-07.21:26:05
I think it is a setting that affects the "CPLEX environment" and that this environment is unique to our OSICpxSolverInterface (or our wrapper around it). That is, you could theoretically have multiple different MIPs in memory at the same time through the same LPSolver instance and setting the option once would affect all of them. But it wouldn't affect MIPs created through a different LPSolver instance or MIPs created by users that use CPLEX on the command line at the same time. Since the LPSolver instance is unique to the operator-counting heuristic, I think we are safe with setting it once within this heuristic.

In my local tests, I once had a program creating an OSI interface to CPLEX and getting an "env_" pointer from there and also using the CPLEX C interface to create a second "env_" pointer. The rest of the program then was the same and I could switch behavior by using a different environment. So, yes, I'm reasonably sure the option is set only for the one solver instance.

Clemens and Remo implemented the fix and are already running experiments.
msg10522 (view) Author: malte Date: 2022-02-07.19:39:54
Is the MIP gap set locally per call, or is it a global CPLEX option such that setting it in the operator-counting heuristics would affect what happens to other customers later during the execution?

In any case, the basic idea sounds good to me. Thanks to you and the others that helped with this for the thorough analysis.

Is this ready to be tested, and do you have a pull request?
msg10513 (view) Author: florian Date: 2022-02-07.11:01:32
We had a meeting about this and decided that for now, we'd only implement the MIP gap parameter in the operator-counting heuristics. The other two heuristics that use the LP solver (potentials, lmcount) do not use integer variables and maximize, so they are not affected by a MIP gap and any suboptimal solution would not be inadmissible (just worse than the theoretical value of the heuristic).

This should hopefully fix the problem we saw in this issue. If we want to consider more tolerances, we should open a follow-up issue.
msg10512 (view) Author: florian Date: 2022-02-05.10:56:16
We were able to isolate the problem and used the minimizer to find a smaller task with the same issue. The attached task (small.sas) produces a heuristic value of 1004000 in the initial state even though h^* is 1003900.

The attached MIP (problem.lp) is the problem solved by the heuristic. Calling CPLEX directly on this problem has a different behavior because FastDownward and OSI set some parameters different from the default: Fast Downward sets the number of threads to 1 (if we don't do this, CPLEX will run a portfolio and some component will find the correct solution). OSI switches off a setting to start the solver with an existing basis [1] in the first solve. This shouldn't be related to our bug, but if we don't set this option, the bug doesn't occur (I assume the bug is rare and so small changes make it disappear).

If we set those two options in the same way, we can reproduce the problem with CPLEX directly. The commands for this are in the attached file cplex_commands. It can be run with the CPLEX commandline tool and requires problem.lp in the working directory:

------------
$ cplex -f cplex_commands
[snip]
        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

      0     0     3900.0000     3                   3900.0000       10         
      0     0     3911.1111    13                     Cuts: 8       20         
*     0+    0                      1004000.0000     3911.1111            99.61%
      0     0     4000.0000     9  1004000.0000      Cuts: 10       28   99.60%
      0     0    45716.6667    13  1004000.0000       Cuts: 2       33   95.45%
      0     0  1003900.0000     4  1004000.0000   ZeroHalf: 3       43    0.01%

[snip]
MIP - Integer optimal, tolerance (0.0001/1e-06):  Objective =  1.0040000000e+06
Current MIP best bound =  1.0039000000e+06 (gap = 100, 0.01%)
------------

From the output, we can see that it found a lower bound (a solution to the LP relaxation of the MIP) of 1003900, but still has 4 variables that are not integers (column "IInf"). The best integer solution it found at this point has value 1004000. This is an upper bound, so there is a gap between the upper and lower bound of 0.01% (last column). This happens to be the default tolerance at which CPLEX accepts a solution as optimal.

We can change the tolerance with "set mip tolerances mipgap 0.00001". The command file already contains this line for the default value of 0.0001, so to change it, just add a 0. The MIP then finds the correct solution.

Looking through the options, I found several such tolerances and I can imagine all of them causing problems. In addition to the MIP gap [2] there is a tolerance on the absolute gap [3]. I would have thought that setting this to 1 would mean that the current gap of 100 is not possible but its default value is 1e-6. There are also tolerances of the LP solver: for example there is a tolerance when simplex will consider an LP optimal [4] or integer [5]. There are even more parameter listed here [6]. So I guess the question now is: which tolerances should we change and to what value?


[1] https://www.ibm.com/docs/en/icos/12.10.0?topic=parameters-advanced-start-switch
[2] https://www.ibm.com/docs/en/icos/12.8.0.0?topic=parameters-relative-mip-gap-tolerance
[3] https://www.ibm.com/docs/en/icos/12.8.0.0?topic=parameters-absolute-mip-gap-tolerance
[4] https://www.ibm.com/docs/en/icos/12.8.0.0?topic=parameters-optimality-tolerance
[5] https://www.ibm.com/docs/en/icos/12.8.0.0?topic=parameters-integrality-tolerance
[6] https://www.ibm.com/docs/en/icos/12.8.0.0?topic=parameters-tolerances
msg9754 (view) Author: florian Date: 2020-10-30.16:13:56
I ran an experiment with the new constraints and compared them to running LM-cut on the relaxed problem. The report can be seen here:
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue983-comparativereport.html

First, there are lots of unexplained errors. One cause was that CPLEX ran out of memory in a way that we didn't recognize so far. I updated the code, so this should not happen in the future. There are also a couple of segfaults in very large tasks (Organic Synthesis, Satellite) that we have seen before in these heuristics. Finally, the relaxed plans could not be verified and I'm not sure how to prevent Fast Downward from trying this (or to get it to verify them against the relaxed instance). I think this is it, but it's hard to tell.

What worries me more is the table "hplus". This contains the initial heuristic value for the operator-counting heuristics and the cost for the configuration optimally solving the relaxed problem. This value should be the same for all configurations but there is a single problem, where this is not the case (two actually, but the PDDL files are identical): here operator counting is too high, making it inadmissible on the relaxed problem. The strange thing is that adding LM-cut constraints fixes this. Adding constraints to an operator-counting heuristic should not decrease the heuristic value.

I tried to reproduce this locally and also get the inadmissible value. However, when I write the MIP to a file and solve that with CPLEX directly, I get the correct value. Also, solving the MIP twice in the code returns the correct heuristic value after the second solver call even though the problem is not modified in between the calls. There is no indication that anything went wrong after the first call (the solver reports that the problem was solved optimally).

The issue might be the high action costs in parcprinter leading to numerical issues in CPLEX, but I'm not sure how to check this or what to do about it.


Apart from that, everything works as expected:
Adding LM-cut constraints to the h^+ constraints computes the same values but faster on average and many more problems can be solved this way:
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue983-v1-issue983-v1-issue983-v1-total_time-opcount-hplus-opcount-hplus-lmcut.png
Searching in the relaxed problem and evaluating the MIP are orthogonal with strengths and weaknesses in different tasks:
https://ai.dmi.unibas.ch/_tmp_files/pommeren/issue983-v1-issue983-v1-issue983-v1-total_time-opcount-hplus-lmcut-relaxed-lmcut.png
msg9746 (view) Author: florian Date: 2020-10-24.02:55:55
I added a pull request with that change. I also used the opportunity to fold "infinity" into "LinearProgram" which we already considered doing in issue960.
https://github.com/aibasel/downward/pull/14/files
msg9745 (view) Author: florian Date: 2020-10-24.01:34:57
I got several requests for a way to compute the h^+ heuristic or about the related operator-counting constraints by Imai and Fukunaga. Adding the constraints would also give us a way of computing the heuristic, if we also complete issue891 to have integer-valued MIP variables.

I already have an implementation of the from my thesis. I think the main problem besides issue891 was that they require auxiliary LP variables and the operator-counting framework currently doesn't support them. This should be relatively simple to add, though.
History
Date User Action Args
2022-02-11 09:01:03floriansetmessages: + msg10594
2022-02-10 19:24:49maltesetmessages: + msg10588
2022-02-10 18:03:15floriansetstatus: chatting -> resolved
messages: + msg10584
2022-02-10 15:27:02floriansetmessages: + msg10578
2022-02-10 15:02:58maltesetmessages: + msg10577
2022-02-10 10:23:56floriansetmessages: + msg10570
2022-02-09 18:25:05maltesetmessages: + msg10564
2022-02-09 18:19:56maltesetmessages: + msg10563
2022-02-09 14:20:25floriansetmessages: + msg10556
2022-02-09 10:36:13remosetmessages: + msg10553
2022-02-08 22:49:52remosetnosy: + clemens
2022-02-08 22:48:26remosetnosy: - clemens
messages: + msg10551
2022-02-08 22:07:45clemenssetnosy: + clemens, remo
messages: + msg10549
2022-02-08 13:18:24maltesetmessages: + msg10534
2022-02-07 21:26:05floriansetmessages: + msg10526
2022-02-07 19:39:54maltesetmessages: + msg10522
2022-02-07 11:01:32floriansetmessages: + msg10513
2022-02-05 10:56:16floriansetmessages: + msg10512
2022-02-05 10:17:44floriansetfiles: + cplex_commands
2022-02-05 10:17:36floriansetfiles: + problem.lp
2022-02-05 10:17:09floriansetfiles: + small.sas
2020-10-30 16:13:57floriansetmessages: + msg9754
2020-10-24 02:55:55floriansetmessages: + msg9746
2020-10-24 01:34:57floriancreate