Issue 738: Translator: use distinguishable exit codes if running out of memory or time

Title	Translator: use distinguishable exit codes if running out of memory or time
Priority	feature	Status	resolved
Superseder		Nosy List	jendrik, malte, silvan
Assigned To	silvan	Keywords
Optional summary

Created on 2017-10-11.14:11:01 by silvan, last changed by silvan.

Files
File name	Uploaded	Type	Edit	Remove
test.py	silvan, 2017-10-13.21:33:28	text/x-python

Messages
msg6561 (view)	Author: silvan	Date: 2017-10-13.22:05:47
Thanks for the review! This is done.
msg6560 (view)	Author: silvan	Date: 2017-10-13.21:45:59
Ok, I added some separating lines to distinguish the output. The changes can be found here: https://bitbucket.org/SilvanS/fd-dev/pull-requests/29/issue738/diff
msg6559 (view)	Author: malte	Date: 2017-10-13.21:40:57
> By the way, I decided to print the traceback on stdout and not stderr so > that lab does not complain about the error-logfile being non-empty. I suppose that makes sense if it is an "expected condition" to some extent and not an error. But then we should also make clear in the output that this is a traceback we're reporting as part of normal program execution rather than an error condition, i.e., it should not look exactly like an uncaught exception. I'd add some lines before/after the traceback to make that clear. > Do you have any preferences regarding which number we use as exit code? No.
msg6558 (view)	Author: silvan	Date: 2017-10-13.21:35:23
By the way, I decided to print the traceback on stdout and not stderr so that lab does not complain about the error-logfile being non-empty. Do you have any preferences regarding which number we use as exit code?
msg6557 (view)	Author: silvan	Date: 2017-10-13.21:33:28
The following minimalistic solution prints the traceback (if there is any) and the exception and then exists using a custom exit code: if __name__ == "__main__": try: main() except MemoryError: print("Translator ran out of memory") traceback.print_exc(file=sys.stdout) exit(100) (Updated test script attached.)
msg6556 (view)	Author: malte	Date: 2017-10-13.21:32:46
> 2. What do you mean by "the exception that we would otherwise get"? By catching > only MemoryError, we only catch that exception, don't we? I meant to write "traceback", not "exception". > So would you be happy if we only catch the MemoryError and manage to print the > same output as python would usually do? Yes.
msg6555 (view)	Author: silvan	Date: 2017-10-13.21:24:00
1. Yes, it returns 152. No, I didn't want to output anything necessarily. The only change that I would really like to have is to not get exit code 1 when getting a MemoryError. If only running the translator (not the search), lab interprets this as a "critical error". I would rather like to be able to be sure that there was only a memory error. 2. What do you mean by "the exception that we would otherwise get"? By catching only MemoryError, we only catch that exception, don't we? And return we a customary exit code rather than the built in 1. Or do you refer to the missing printing of something? So far, the usual output was not really helpful if a MemoryError occurred, but I'll see if we can print the stack. 3. If we don't want to catch SIGXCPU in the first place, we wouldn't have this problem. 4. Sorry, the RuntimeError is some leftover of my previous testing. In the translator, we would have the translator's main method of course. 5. Ok. As I said, I don't even need to catch SIGXCPU necessarily, since as you pointed out, it produces its own exit code we can use. So would you be happy if we only catch the MemoryError and manage to print the same output as python would usually do?
msg6554 (view)	Author: malte	Date: 2017-10-13.18:42:36
The goal of distinguishing different errors is good. Regarding the details: 1. I don't see much point in catching a signal like SIGXCPU and just converting it to a different exit code. Every signal generates its own exit code already anyway. Does the planner really return 1 when SIGXCPU is raised? There's nothing in the code that should do that, and I don't think Python catches the signal by default. I would except to see an exit code of 152 instead (128 - (-24), because SIGXCPU = 24). The reason why we catch SIGXCPU in the search code is so that we have an opportunity to print memory statistics before terminating. Do you want to do the same here, too? 2. Catching MemoryError is a good idea, but if we catch the exception and exit, we lose the exception that we would otherwise get, which I think is a useful diagnostic. It's possible to print the stack trace for an exception manually, and I think this would make sense here. (However, MemoryError is a bit special because running out of memory is such a severe condition that I'm not sure Python is able to provide a stack trace anyway.) We might also consider doing a similar thing as we do in C++ and allocate (and on error, release) some emergency memory to make sure this goes through. But this is the sort of thing that needs some trial-and-error to get right. 3. Things like the signal module are not portable across all platforms we support, so it would need to be wrapped by platform-specific blocks. 4. Parts of the test script (the attached one, not the one in your message) look a bit strange to me. The part in the main block that is supposed to catch the translator running out of memory and print a message will not be executed, because memory error is already caught elsewhere, leading to termination without a message. I don't understand the rationale between catching RuntimeError or why it should lead to the same exit code as SIGXCPU. 5. Regarding catching more signals: I think we should only catch the ones we're essentially creating ourselves, like SIGXCPU which we create by using ulimit in the driver script. Catching (and hence in some sense silencing) signals widely is usually bad policy because whoever causes them to be raised usually also wants to see the effect of that. "Errors should never pass silently" etc.
msg6553 (view)	Author: silvan	Date: 2017-10-13.17:29:40
We could intercept SIGXCPU and catch MemoryErrors as follows for any python script containing a main method: def handle_signal(signal, stack): print 'Caught signal {}'.format(str(signal)) exit(101) if __name__ == "__main__": signal.signal(signal.SIGXCPU, handle_signal) try: main() except MemoryError: print("Translator ran out of memory") exit(100) I couldn't think of any better exit codes than "starting" with 1000 for all translator related errors. Find attached a complete test script to test this, e.g. using ulimit -St 1 or ulimit -Sv 100000 to limit time to 1s or memory to 1MB. What do you think of adding this to translate.py? Should we catch more signals? (See dir(signal) for a list of signals in python.)
msg6549 (view)	Author: silvan	Date: 2017-10-11.14:11:01
When working on a lab issue to better support translator-only experiments, we realized that the translator simply returns exit code 1 whenever it fails. We would like to intercept MemoryError and SigXCPU and return appropriate exit codes to allow parsing exit reasons with lab also for the translator (and so that exit code 1 does not overlap with 'critical error' of the search).

History
Date	User	Action	Args
2017-10-13 22:05:47	silvan	set	status: reviewing -> resolved messages: + msg6561
2017-10-13 21:45:59	silvan	set	status: chatting -> reviewing messages: + msg6560
2017-10-13 21:40:57	malte	set	messages: + msg6559
2017-10-13 21:35:23	silvan	set	messages: + msg6558
2017-10-13 21:33:28	silvan	set	files: + test.py messages: + msg6557
2017-10-13 21:33:02	silvan	set	files: - test.py
2017-10-13 21:32:46	malte	set	messages: + msg6556
2017-10-13 21:24:00	silvan	set	messages: + msg6555
2017-10-13 18:42:36	malte	set	messages: + msg6554
2017-10-13 17:29:40	silvan	set	files: + test.py assignedto: silvan messages: + msg6553
2017-10-11 14:11:01	silvan	create

Issue738