Message6567

Author malte
Recipients jendrik, malte, silvan
Date 2017-10-19.16:29:37
Content
> The problem is the definition of "everything succeeded". Currently, we only
> return 0 if a plan was found, and any other non-zero exit code (as given by
> the search) otherwise, e.g. if the task is unsolvable or if we reached the
> imposed time limit. Why is in an "error" if the task is unsolvable or the time
> limit was reached? If I impose a limit of 10s then I am well aware that in
> many cases, no plan will be found.

The opposite of "success" is not "error". I think it's completely within the
Unix philosophy to say that "I couldn't find a plan" means that there was no
success.

For example, "grep", "locate" or "which" return 0 if a match was found and
non-zero if that wasn't the case. This doesn't mean that there was an error
looking for the pattern; it just means that the search did not succeed.

An example similar to the timeout example is rsync, which uses non-zero exit
codes to signal that not everything was synced, also in cases where there was no
error (e.g. because the command line options included a limit on how many files
were to be deleted or if the command line options included a timeout like in our
case).

Basically, the point of exit codes is to provide users (especially script
writers) with a signal of what the outcome of the command was, so that they can
make a decision what to do next. In a script calling a planner, the most basic
thing that they will care about is whether a plan was found because this will
almost always affect how the script will proceed afterwards.

> Maybe we can redefine "everything succeeded" to be those cases where no
> "unexplained" error occurred, where unexplained errors are those where the
> search returns "CRITICAL_ERROR", "INPUT_ERROR" "UNSUPPORTED_ERROR". This means
> that fast-downward.py would return 0 for *all* instances if nothing went
> *really* wrong.

I think we would just lose very relevant information for no good reason. The
current approach is to tell the user something like

"A plan was found."
"Search completed and no plan was found."
"We ran out of time."
"We ran out of memory."
(plus other options)

I think this is much more useful (and much more in line with the philosophy of
exit codes) than telling them something like

"A plan was found or search completed and no plan was found or we ran out of
time or we ran out of memory."

> Well, what is a meaningful exit code? Passing on the search's exit code as we
> currently do? As explained above, I don't think that this is "meaningful"
> currently.

A meaningful exit code is one that summarizes to the user what happened, with 0
for the canonical success case. A good design could use some grouping based on
the "kind" of outcome (e.g. error outcome vs. non-error outcome) or on the
component involved (e.g. translator vs. search vs. validator vs. driver), or both.

I feel that both proposals go strongly against how exit codes are intended to be
used (I mean in general, not in our planner), so I would be very unhappy to see
either of these changes. But of course we can discuss this further. (Perhaps not
in the tracker, as this takes so much time. I think it's been at least an hour
for me for this message and the previous one.)

> Also because it doesn't return anything meaningful at all if the
> search is not run.

That's an issue of the current implementation (which has a lot to do with the
fact that until very recently, we had no different exit codes in the translator
other than "OK" or "not OK"), but I think we should be able to address this in a
principled way.

For example, if translator and search both use unique non-success exit codes, a
first idea could be that whenever one of the components returns non-zero, no
further components will not be run and the exit codes becomes the exit code of
the driver. (Is there any situation where after a non-zero translator result, it
makes sense to continue with planning? Is there any situation where after a
non-zero planner result, it makes sense to continue with validation?)

The portfolio code needs more care, where we need to distinguish between exit
codes of the search component that indicate failure to find a plan vs. serious
errors where the whole run should be aborted.

(BTW, I think it would also make more sense to change the translator to use an
"unsolvable" exit code when it detects unsolvability rather than creating a
dummy output file. The current implementation is due to the fact that it was
easier to work with back when we had much more primitive driver scripts and
tended to run the translator in separate experiments from the preprocessor and
search component. I would like to see this changed, but of course it's not very
high priority.)
History
Date User Action Args
2017-10-19 16:29:37maltesetmessageid: <1508423377.48.0.463824524126.issue739@unibas.ch>
2017-10-19 16:29:37maltesetrecipients: + malte, jendrik, silvan
2017-10-19 16:29:37maltelinkissue739 messages
2017-10-19 16:29:37maltecreate