Issue738

Title Translator: use distinguishable exit codes if running out of memory or time
Priority feature Status resolved
Superseder Nosy List jendrik, malte, silvan
Assigned To silvan Keywords
Optional summary

Created on 2017-10-11.14:11:01 by silvan, last changed by silvan.

Files
File name Uploaded Type Edit Remove
test.py silvan, 2017-10-13.21:33:28 text/x-python
Messages
msg6561 (view) Author: silvan Date: 2017-10-13.22:05:47
Thanks for the review! This is done.
msg6560 (view) Author: silvan Date: 2017-10-13.21:45:59
Ok, I added some separating lines to distinguish the output.

The changes can be found here:
https://bitbucket.org/SilvanS/fd-dev/pull-requests/29/issue738/diff
msg6559 (view) Author: malte Date: 2017-10-13.21:40:57
> By the way, I decided to print the traceback on stdout and not stderr so
> that lab does not complain about the error-logfile being non-empty.

I suppose that makes sense if it is an "expected condition" to some extent and
not an error. But then we should also make clear in the output that this is a
traceback we're reporting as part of normal program execution rather than an
error condition, i.e., it should not look exactly like an uncaught exception.
I'd add some lines before/after the traceback to make that clear.

> Do you have any preferences regarding which number we use as exit code?

No.
msg6558 (view) Author: silvan Date: 2017-10-13.21:35:23
By the way, I decided to print the traceback on stdout and not stderr so that
lab does not complain about the error-logfile being non-empty.

Do you have any preferences regarding which number we use as exit code?
msg6557 (view) Author: silvan Date: 2017-10-13.21:33:28
The following minimalistic solution prints the traceback (if there is any) and
the exception and then exists using a custom exit code:

if __name__ == "__main__":
    try:
        main()
    except MemoryError:
        print("Translator ran out of memory")
        traceback.print_exc(file=sys.stdout)
        exit(100)


(Updated test script attached.)
msg6556 (view) Author: malte Date: 2017-10-13.21:32:46
> 2. What do you mean by "the exception that we would otherwise get"? By catching
> only MemoryError, we only catch that exception, don't we?

I meant to write "traceback", not "exception".

> So would you be happy if we only catch the MemoryError and manage to print the
> same output as python would usually do?

Yes.
msg6555 (view) Author: silvan Date: 2017-10-13.21:24:00
1. Yes, it returns 152. No, I didn't want to output anything necessarily. The
only change that I would really like to have is to not get exit code 1 when
getting a MemoryError. If only running the translator (not the search), lab
interprets this as a "critical error". I would rather like to be able to be sure
that there was only a memory error.

2. What do you mean by "the exception that we would otherwise get"? By catching
only MemoryError, we only catch that exception, don't we? And return we a
customary exit code rather than the built in 1. Or do you refer to the missing
printing of something? So far, the usual output was not really helpful if a
MemoryError occurred, but I'll see if we can print the stack.

3. If we don't want to catch SIGXCPU in the first place, we wouldn't have this
problem.

4. Sorry, the RuntimeError is some leftover of my previous testing. In the
translator, we would have the translator's main method of course.

5. Ok. As I said, I don't even need to catch SIGXCPU necessarily, since as you
pointed out, it produces its own exit code we can use.

So would you be happy if we only catch the MemoryError and manage to print the
same output as python would usually do?
msg6554 (view) Author: malte Date: 2017-10-13.18:42:36
The goal of distinguishing different errors is good.

Regarding the details:

1. I don't see much point in catching a signal like SIGXCPU and just converting
it to a different exit code. Every signal generates its own exit code already
anyway. Does the planner really return 1 when SIGXCPU is raised? There's nothing
in the code that should do that, and I don't think Python catches the signal by
default. I would except to see an exit code of 152 instead (128 - (-24), because
SIGXCPU = 24).

The reason why we catch SIGXCPU in the search code is so that we have an
opportunity to print memory statistics before terminating. Do you want to do the
same here, too?

2. Catching MemoryError is a good idea, but if we catch the exception and exit,
we lose the exception that we would otherwise get, which I think is a useful
diagnostic. It's possible to print the stack trace for an exception manually,
and I think this would make sense here. (However, MemoryError is a bit special
because running out of memory is such a severe condition that I'm not sure
Python is able to provide a stack trace anyway.)

We might also consider doing a similar thing as we do in C++ and allocate (and
on error, release) some emergency memory to make sure this goes through. But
this is the sort of thing that needs some trial-and-error to get right.

3. Things like the signal module are not portable across all platforms we
support, so it would need to be wrapped by platform-specific blocks.

4. Parts of the test script (the attached one, not the one in your message) look
a bit strange to me. The part in the main block that is supposed to catch the
translator running out of memory and print a message will not be executed,
because memory error is already caught elsewhere, leading to termination without
a message. I don't understand the rationale between catching RuntimeError or why
it should lead to the same exit code as SIGXCPU.

5. Regarding catching more signals: I think we should only catch the ones we're
essentially creating ourselves, like SIGXCPU which we create by using ulimit in
the driver script. Catching (and hence in some sense silencing) signals widely
is usually bad policy because whoever causes them to be raised usually also
wants to see the effect of that. "Errors should never pass silently" etc.
msg6553 (view) Author: silvan Date: 2017-10-13.17:29:40
We could intercept SIGXCPU and catch MemoryErrors as follows for any python
script containing a main method:

def handle_signal(signal, stack):
    print 'Caught signal {}'.format(str(signal))
    exit(101)


if __name__ == "__main__":
    signal.signal(signal.SIGXCPU, handle_signal)
    try:
        main()
    except MemoryError:
        print("Translator ran out of memory")
        exit(100)

I couldn't think of any better exit codes than "starting" with 1000 for all
translator related errors. Find attached a complete test script to test this,
e.g. using ulimit -St 1 or ulimit -Sv 100000 to limit time to 1s or memory to 1MB.

What do you think of adding this to translate.py? Should we catch more signals?
(See dir(signal) for a list of signals in python.)
msg6549 (view) Author: silvan Date: 2017-10-11.14:11:01
When working on a lab issue to better support translator-only experiments, we
realized that the translator simply returns exit code 1 whenever it fails. We
would like to intercept MemoryError and SigXCPU and return appropriate exit
codes to allow parsing exit reasons with lab also for the translator (and so
that exit code 1 does not overlap with 'critical error' of the search).
History
Date User Action Args
2017-10-13 22:05:47silvansetstatus: reviewing -> resolved
messages: + msg6561
2017-10-13 21:45:59silvansetstatus: chatting -> reviewing
messages: + msg6560
2017-10-13 21:40:57maltesetmessages: + msg6559
2017-10-13 21:35:23silvansetmessages: + msg6558
2017-10-13 21:33:28silvansetfiles: + test.py
messages: + msg6557
2017-10-13 21:33:02silvansetfiles: - test.py
2017-10-13 21:32:46maltesetmessages: + msg6556
2017-10-13 21:24:00silvansetmessages: + msg6555
2017-10-13 18:42:36maltesetmessages: + msg6554
2017-10-13 17:29:40silvansetfiles: + test.py
assignedto: silvan
messages: + msg6553
2017-10-11 14:11:01silvancreate