Issue739

Title Driver: return exit codes in a way that allows distinguishing between translator and search
Priority feature Status reviewing
Superseder Nosy List cedric, jendrik, malte, silvan
Assigned To silvan Keywords
Optional summary

Created on 2017-10-16.14:03:57 by silvan, last changed by cedric.

Messages
msg6753 (view) Author: malte Date: 2017-12-14.13:33:08
I'm very sorry, but I'm afraid it has to go into the queue. :-(
msg6752 (view) Author: silvan Date: 2017-12-14.11:22:35
> I can have a look outside the regular queue. If you don't get feedback by
> Sunday, feel free to remind me (or put it into the queue).
This is a friendly reminder.
msg6747 (view) Author: silvan Date: 2017-12-08.14:37:14
Thanks, done with addressing them.
msg6746 (view) Author: jendrik Date: 2017-12-08.13:26:19
I'm done with my comments.
msg6745 (view) Author: silvan Date: 2017-12-08.11:07:53
Sure :)
msg6744 (view) Author: jendrik Date: 2017-12-08.11:07:27
I could do another round of comments first, if you like.
msg6742 (view) Author: malte Date: 2017-12-07.12:30:44
> Other than that, this issue "only" needs another round of reviewing. Malte, do
> you also want to look into the code? If yes, should I queue the issue or would
> you rather take it separately?

I can have a look outside the regular queue. If you don't get feedback by
Sunday, feel free to remind me (or put it into the queue).
msg6741 (view) Author: malte Date: 2017-12-07.11:53:52
We don't need to change the hard limit. The soft limit is there to alert us of
anomalous behaviour. The hard limit is there to prevent the hard disk from going
full on a job that has gone bonkers. For this purpose, the stderr limit doesn't
have to be lower than the stdout hard limit.
msg6740 (view) Author: silvan Date: 2017-12-07.11:50:35
Soft limit 64KB, hard limit higher, e.g. 512KB, or should 64KB be the hard limit?
msg6739 (view) Author: jendrik Date: 2017-12-07.10:35:24
It can happen in the same pull request.
msg6738 (view) Author: silvan Date: 2017-12-07.10:00:00
Jendrik, should that change happen in the same lab issue or rather in a separate
(default branch?) commit?

Other than that, this issue "only" needs another round of reviewing. Malte, do
you also want to look into the code? If yes, should I queue the issue or would
you rather take it separately?
msg6734 (view) Author: malte Date: 2017-12-06.10:40:36
In that case, I think the communicate() approach is fine. I would generally not
do this kind of buffering for anything above a few KB (say, 64 KB).

Given that our policy is not to write on stderr except for final error
reporting, I suggest we set the stderr soft limit in lab drastically lower than
the stdout limit, say at 64 KB. That way, we won't accidentally miss this design
aspect if we ever make a change that leads to much more output on stderr.
msg6733 (view) Author: silvan Date: 2017-12-06.09:20:24
I currently only redirect stderr using subprocess.PIPE but not stdout, so as you
say, output looks "normal". I think that we do *not* want to use stderr for
warnings, for this reason, lab nowadays treats output to err as a failure. For
the same reason, we also changed all warning in the search to not write to
stderr but to stdout (e.g. unusual options in M&S). We only write to stderr if
exiting immediately afterwards.
msg6732 (view) Author: malte Date: 2017-12-06.08:55:55
The size is fine, but I think there is another point we did not consider: if we
use something like communicate() or check_output(), there will be no output
visible to the user until the translator terminates. If we only redirect stderr
to a pipe but not stdout, then this is probably not an issue because it is only
output on stderr that is deferred, and such output currently only happens upon
termination of the translator.

But it would perhaps still be cleaner to use something like the code from lab in
case we ever get stderr output that happens earlier in execution, e.g., some
sort of warning message. (If we want to reuse the code in lab, we should make it
its own reusable module that can be identical in both places and includes a
comment that it is used in both projects to help with keeping it in sync.)
msg6730 (view) Author: silvan Date: 2017-12-06.08:44:30
What I meant is that 100 runs together don't need more than rounded 2MB, I
thought this was small enough. But you are right, most output files are 4 KB,
but I also found some 8 KB ones. The same is true if adding the options
--keep-unimportant-variables, --keep-unreachable-facts, and --full-encoding.

http://ai.cs.unibas.ch/_tmp_files/sieverss/issue739-v4-translate-with-options.html
msg6728 (view) Author: malte Date: 2017-12-05.23:32:48
I think the limit should be much lower than 2 MB if we use something which
stores everything in memory like Popen.subprocess does, but fortunately I think
it is. I looked at the output file of issue739-v4-translate, and the largest one
is smaller than 6 KB.
msg6727 (view) Author: silvan Date: 2017-12-05.23:11:39
I implemented the suggested approach with Popen to filter stderr output. No more
output-to-error errors :-)
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue739-v4-translate.html

I also checked the size of the experimental output, and no 100 runs need more
than 2 MB. However, the configuration with additional translator options was
wrong and needs to be re-run, so I don't have data for it yet.
msg6721 (view) Author: malte Date: 2017-12-05.12:08:28
I think (but someone should perhaps verify this experimentally, also for the
various translator options) that the translator never produces excessive amounts
of output. In that case, the best way is to use Popen.communicate, which is
quite easy to use.

The lab code is complicated because it wants to avoid reading everything into
one string and hence essentially reimplements Popen.communicate.
msg6720 (view) Author: jendrik Date: 2017-12-05.11:58:04
I have recently implemented output redirection and filtering in Lab (see 
lab/calls/call.py), but I'm not sure we whether we want to duplicate this code in 
the driver.
msg6719 (view) Author: silvan Date: 2017-12-05.11:44:31
Before this issue against leads to too much typing, I'm also happy to discuss
this further in person tomorrow :-)
msg6718 (view) Author: silvan Date: 2017-12-05.11:43:27
After searching a bit, from what I understand, we cannot use
subprocess.check_call or check_output to filter stderr (e.g. with a PIPE) but
would need to use POPEN instead. Not being familiar with that at all, is that
the right direction to go for? Would check_output with redirecting stderr to
stdout, then processing the result and printing it to stdout be an alterative or
doesn't this work because we would store the entire output of the translator as
a result?
msg6716 (view) Author: malte Date: 2017-12-05.10:44:15
> From within the driver, that would mean to know where the stdout/stderr is
> written. Do we have that information in the driver?

This is not how redirection works. The driver doesn't need to know which kinds
of redirections its callers perform; it can just do its thing (e.g. filter the
stderr channel of what it calls) directly.

From each process's perspective, it has a stdin stream, a stdout stream and a
stderr stream, and it doesn't have to care what kind of streams they are or what
sort of redirections and processing are performed by its calling process. This
is encapsulated. Jendrik is right: the clean way is always to do the processing
as locally as possible.
msg6715 (view) Author: silvan Date: 2017-12-05.10:33:54
I'm not sure the driver is the better place. It is the caller, yes, but lab
decides where stderr is written (or more precisely, the slurm experiment script
do), and what we would like to do in the postprocess step is to delete the
run.err file if deemed appropriate. From within the driver, that would mean to
know where the stdout/stderr is written. Do we have that information in the driver?
msg6712 (view) Author: jendrik Date: 2017-12-05.09:55:52
I think the driver would be the better place for this since it's the direct 
caller.
msg6711 (view) Author: malte Date: 2017-12-05.09:54:19
Having the calling process postprocess the output is often the cleanest way to
handle things like these. It may be cleaner to do this in the driver rather than
lab. Not sure; whatever works for you.
msg6710 (view) Author: silvan Date: 2017-12-05.09:51:44
(Although I'm not sure if all lines related to these MemoryErrors will always
contain 'MemoryError', but so far, everything I saw did.)
msg6709 (view) Author: silvan Date: 2017-12-05.09:48:42
I think it is still valuable to have the correct exit codes as we do now. Maybe
we can think about adapting lab so that it doesn't "throw" an output-to-err
exception in case each line printed to stderr contains "MemoryError". I admit
this is a bit hacky but after all, the translator exits correctly in all cases
but it just additionally writes to stderr in some cases.
msg6708 (view) Author: malte Date: 2017-12-05.09:45:27
Recovering from running out of memory is generally difficult. There may be cases
where Python doesn't manage to raise the exception cleanly, and I'm not sure how
much we can do about that. Not sure how to proceed.
msg6707 (view) Author: silvan Date: 2017-12-05.09:37:26
Thanks for all the input so far!

I finally ran some experiments. The first one only runs search configurations
and is intended to compare the before-after behavior of exit codes. I manually
translated the legacy to the new exit codes. The comparison looks fine:
http://ai.cs.unibas.ch/_tmp_files/sieverss/issue739-v2-search-issue739-base-issue739-v2-compare.html

For translator configurations that restrict time, everything is fine, too, but
for catching MemoryErrors, things are different. We successfully catch
MemoryError (in one task, the small memory limit led to a segfault) and return
the correct exit code, but python sometimes prints to stderr even though the
MemoryError is caught. When testing this locally, I cannot consistently
reproduce the behavior: sometimes, for the same task, there is output to stderr,
sometimes there isn't. The output also varies, I've seen lines like
"MemoryError" or "Exception MemoryError: MemoryError() in <object repr() failed>
ignored", and also cases where several such lines occurred (presumably due to
running out of memory again while printing, even though we reserve 10MB of
emergency memory).

http://ai.cs.unibas.ch/_tmp_files/sieverss/issue739-v2-translate.html

Any suggestions how to address this?
msg6662 (view) Author: jendrik Date: 2017-11-30.17:37:45
I'm done with this round of comments.
msg6652 (view) Author: silvan Date: 2017-11-30.12:30:11
I updated the pull request (basically a complete rework):
https://bitbucket.org/SilvanS/fd-dev/pull-requests/30/issue739/diff

The documentation of exit codes can be found here:
http://www.fast-downward.org/ExitCodes
msg6569 (view) Author: malte Date: 2017-10-24.17:17:08
I made one comment on the tracker, but I suggest that we discuss the behaviour
in person before we merge. We should also update the documentation for this.
msg6568 (view) Author: silvan Date: 2017-10-24.16:56:13
Ok, I understand that using exit code 0 for translator and search if they
succeed makes sense.

> (Perhaps not
> in the tracker, as this takes so much time. I think it's been at least an hour
> for me for this message and the previous one.)
Please don't feel urged to answer quickly here, I though it was better to post
the discussion here than leading it by email. If you think that the discussion
gets too long, you can always suggest to move it to the next Fast Downward
meeting or the next print.

> For example, if translator and search both use unique non-success exit codes, a
> first idea could be that whenever one of the components returns non-zero, no
> further components will not be run and the exit codes becomes the exit code of
> the driver. (Is there any situation where after a non-zero translator result, it
> makes sense to continue with planning? Is there any situation where after a
> non-zero planner result, it makes sense to continue with validation?)
The exit codes are unique except for SIGXCPU, which is the same for both
translator and search. I still think that not running the next component after a
non-zero exit code is a good thing to do.

I implemented this change and adapted the driver to exit with the error code of
the last component that ran or with 0 if all components succeeded. This
hopefully matches the previous behavior where the exit code of the search was
returned as long as validate succeeded (which it does if on plan is given, if I
recall correctly). Additionally, the driver now prints the exit codes of the
components after running them, so we can parse the outcome of them with lab.

Here is a pull request:
https://bitbucket.org/SilvanS/fd-dev/pull-requests/30/issue739/diff
msg6567 (view) Author: malte Date: 2017-10-19.16:29:37
> The problem is the definition of "everything succeeded". Currently, we only
> return 0 if a plan was found, and any other non-zero exit code (as given by
> the search) otherwise, e.g. if the task is unsolvable or if we reached the
> imposed time limit. Why is in an "error" if the task is unsolvable or the time
> limit was reached? If I impose a limit of 10s then I am well aware that in
> many cases, no plan will be found.

The opposite of "success" is not "error". I think it's completely within the
Unix philosophy to say that "I couldn't find a plan" means that there was no
success.

For example, "grep", "locate" or "which" return 0 if a match was found and
non-zero if that wasn't the case. This doesn't mean that there was an error
looking for the pattern; it just means that the search did not succeed.

An example similar to the timeout example is rsync, which uses non-zero exit
codes to signal that not everything was synced, also in cases where there was no
error (e.g. because the command line options included a limit on how many files
were to be deleted or if the command line options included a timeout like in our
case).

Basically, the point of exit codes is to provide users (especially script
writers) with a signal of what the outcome of the command was, so that they can
make a decision what to do next. In a script calling a planner, the most basic
thing that they will care about is whether a plan was found because this will
almost always affect how the script will proceed afterwards.

> Maybe we can redefine "everything succeeded" to be those cases where no
> "unexplained" error occurred, where unexplained errors are those where the
> search returns "CRITICAL_ERROR", "INPUT_ERROR" "UNSUPPORTED_ERROR". This means
> that fast-downward.py would return 0 for *all* instances if nothing went
> *really* wrong.

I think we would just lose very relevant information for no good reason. The
current approach is to tell the user something like

"A plan was found."
"Search completed and no plan was found."
"We ran out of time."
"We ran out of memory."
(plus other options)

I think this is much more useful (and much more in line with the philosophy of
exit codes) than telling them something like

"A plan was found or search completed and no plan was found or we ran out of
time or we ran out of memory."

> Well, what is a meaningful exit code? Passing on the search's exit code as we
> currently do? As explained above, I don't think that this is "meaningful"
> currently.

A meaningful exit code is one that summarizes to the user what happened, with 0
for the canonical success case. A good design could use some grouping based on
the "kind" of outcome (e.g. error outcome vs. non-error outcome) or on the
component involved (e.g. translator vs. search vs. validator vs. driver), or both.

I feel that both proposals go strongly against how exit codes are intended to be
used (I mean in general, not in our planner), so I would be very unhappy to see
either of these changes. But of course we can discuss this further. (Perhaps not
in the tracker, as this takes so much time. I think it's been at least an hour
for me for this message and the previous one.)

> Also because it doesn't return anything meaningful at all if the
> search is not run.

That's an issue of the current implementation (which has a lot to do with the
fact that until very recently, we had no different exit codes in the translator
other than "OK" or "not OK"), but I think we should be able to address this in a
principled way.

For example, if translator and search both use unique non-success exit codes, a
first idea could be that whenever one of the components returns non-zero, no
further components will not be run and the exit codes becomes the exit code of
the driver. (Is there any situation where after a non-zero translator result, it
makes sense to continue with planning? Is there any situation where after a
non-zero planner result, it makes sense to continue with validation?)

The portfolio code needs more care, where we need to distinguish between exit
codes of the search component that indicate failure to find a plan vs. serious
errors where the whole run should be aborted.

(BTW, I think it would also make more sense to change the translator to use an
"unsolvable" exit code when it detects unsolvability rather than creating a
dummy output file. The current implementation is due to the fact that it was
easier to work with back when we had much more primitive driver scripts and
tended to run the translator in separate experiments from the preprocessor and
search component. I would like to see this changed, but of course it's not very
high priority.)
msg6566 (view) Author: silvan Date: 2017-10-19.15:44:39
> I don't really understand the motivation in the first two paragraphs. If I
> understand correctly, you are saying that us following the universal convention
> of "0" for "everything succeeded" would be inconsistent. In what way?
The problem is the definition of "everything succeeded". Currently, we only
return 0 if a plan was found, and any other non-zero exit code (as given by the
search) otherwise, e.g. if the task is unsolvable or if we reached the imposed
time limit. Why is in an "error" if the task is unsolvable or the time limit was
reached? If I impose a limit of 10s then I am well aware that in many cases, no
plan will be found.

Maybe we can redefine "everything succeeded" to be those cases where no
"unexplained" error occurred, where unexplained errors are those where the
search returns "CRITICAL_ERROR", "INPUT_ERROR" "UNSUPPORTED_ERROR". This means
that fast-downward.py would return 0 for *all* instances if nothing went
*really* wrong.

> Having the driver output the individual exit codes of the components could be
> quite useful. 
Ok, outputting the individual exit codes and parsing them with lab seems like a
good way to go.

> But I think that doesn't imply that we shouldn't attempt to have
> it produce a meaningful exit code itself.
Well, what is a meaningful exit code? Passing on the search's exit code as we
currently do? As explained above, I don't think that this is "meaningful"
currently. Also because it doesn't return anything meaningful at all if the
search is not run.
msg6565 (view) Author: malte Date: 2017-10-19.14:12:05
I don't really understand the motivation in the first two paragraphs. If I
understand correctly, you are saying that us following the universal convention
of "0" for "everything succeeded" would be inconsistent. In what way?

Both using a non-zero exit code on success (thus preventing users from using
things like "set -e" or "subprocess.check_call") and requiring all users to
parse output to detect whether the planner succeeded (which in the worst case
requires callers to do complex dances with select etc. to avoid buffering the
whole output in memory just to determine the exit condition) don't sound good to
me. If there's a strong reason for it, we can discuss it, but I don't see it yet.

Having the driver output the individual exit codes of the components could be
quite useful. But I think that doesn't imply that we shouldn't attempt to have
it produce a meaningful exit code itself.
msg6564 (view) Author: silvan Date: 2017-10-19.13:44:43
Jendrik and I discussed this a bit further:

If we wanted to use exit codes consistently in the driver, then we should no
longer return "0" as a synonym to "search found a plan", and treat all other
results like "unsolvable", "out of time" etc (independently of the component) as
"errors" in the sense that a non-zero exit code is returned. In the current
implementation of the test module of the driver, we always expect exit code 0
when using pyhton's subprocess module, independently of which component is being
called. This makes testing of exit codes inconsistent, treating some as
exceptions and others not. (For example, also a call to ./fast-downward.py
without any arguments returns 0.) 

Changing the driver to always return a unique non-zero exit code depending on
the result of the translator and the search would make this more consistent. The
drawback of this approach would be that the driver basically never returns with
zero, hence making it more difficult to be used in other scripts, since
"non-zero" usually means "error". I personally would still prefer this solution
over the current situation, where exit codes of the search components are just
passed through the driver (and translator exit codes are not at all considered).

An alternative is to not use exit codes for the driver at all, but to write or
print the results of the components (which should still continue use different
exit codes) to some file or stdout. 

What do you think of these solutions, Malte?
msg6563 (view) Author: malte Date: 2017-10-16.14:05:52
Makes sense.
msg6562 (view) Author: silvan Date: 2017-10-16.14:03:57
Now that issue738 is integrated (which fixes translator crashs due to hitting
the memory limit), we could in principle distinguish between running out of
memory with the translator or the search. For other "results", such as hitting
the time limit (SIGXCPU, -24) or exiting with "0", we cannot make this
distinction (e.g. with lab). We would like to modify the driver to return
different exit codes for the translator and the search even if the exit code
returned by the components would be the same. This can help preventing problems
with the translator in future, such as the one we had when switching to slurm
where tasks could not be translated, instead of just assuming that "out of
memory" or "out of time" always means "search ran out time/memory".
History
Date User Action Args
2018-01-11 15:26:10cedricsetnosy: + cedric
2017-12-14 13:33:08maltesetmessages: + msg6753
2017-12-14 11:22:35silvansetmessages: + msg6752
2017-12-08 14:37:14silvansetmessages: + msg6747
2017-12-08 13:26:19jendriksetmessages: + msg6746
2017-12-08 11:07:53silvansetmessages: + msg6745
2017-12-08 11:07:27jendriksetmessages: + msg6744
2017-12-07 12:30:44maltesetmessages: + msg6742
2017-12-07 11:53:52maltesetmessages: + msg6741
2017-12-07 11:50:35silvansetmessages: + msg6740
2017-12-07 10:35:24jendriksetmessages: + msg6739
2017-12-07 10:00:00silvansetmessages: + msg6738
2017-12-06 10:40:36maltesetmessages: + msg6734
2017-12-06 09:20:24silvansetmessages: + msg6733
2017-12-06 08:55:55maltesetmessages: + msg6732
2017-12-06 08:44:30silvansetmessages: + msg6730
2017-12-05 23:32:48maltesetmessages: + msg6728
2017-12-05 23:11:39silvansetmessages: + msg6727
2017-12-05 12:08:28maltesetmessages: + msg6721
2017-12-05 11:58:04jendriksetmessages: + msg6720
2017-12-05 11:44:31silvansetmessages: + msg6719
2017-12-05 11:43:27silvansetmessages: + msg6718
2017-12-05 10:44:15maltesetmessages: + msg6716
2017-12-05 10:33:54silvansetmessages: + msg6715
2017-12-05 09:55:52jendriksetmessages: + msg6712
2017-12-05 09:54:19maltesetmessages: + msg6711
2017-12-05 09:51:44silvansetmessages: + msg6710
2017-12-05 09:48:42silvansetmessages: + msg6709
2017-12-05 09:45:27maltesetmessages: + msg6708
2017-12-05 09:37:26silvansetmessages: + msg6707
2017-11-30 17:37:45jendriksetmessages: + msg6662
2017-11-30 12:30:11silvansetstatus: chatting -> reviewing
assignedto: silvan
messages: + msg6652
2017-10-24 17:17:08maltesetmessages: + msg6569
2017-10-24 16:56:14silvansetmessages: + msg6568
2017-10-19 16:29:37maltesetmessages: + msg6567
2017-10-19 15:44:39silvansetmessages: + msg6566
2017-10-19 14:12:05maltesetmessages: + msg6565
2017-10-19 13:44:43silvansetmessages: + msg6564
2017-10-16 14:05:52maltesetmessages: + msg6563
2017-10-16 14:03:57silvancreate