Issue825

Title memory limits not enforced on Mac OS; add driver exit codes
Priority bug Status resolved
Superseder Nosy List cedric, malte, silvan
Assigned To silvan Keywords driver
Optional summary

Created on 2018-09-17.14:57:39 by malte, last changed by silvan.

Messages
msg7574 (view) Author: silvan Date: 2018-09-19.15:07:31
Merged and pushed. The buildbot still complains because we use hard-coded time
and memory limits for running VAL. issue775 suggests adding
--validate-memory-limit and --validate-time-limit options, and I'll use that
issue for resolving this problem.
msg7497 (view) Author: malte Date: 2018-09-17.19:45:27
I left some comments on bitbucket.
msg7495 (view) Author: silvan Date: 2018-09-17.18:02:05
This turned out to be a larger issue, because our driver currently does not have
its own exit codes, which means that we cannot use our small tests to test
against a specific exit code that corresponds to trying to use some unsupported
feature of the driver, such as limiting memory on macOS.

As a first step, I added exit codes to the driver and tried to use them
everywhere (grepping for "raise" and "sys.exit"):
https://bitbucket.org/SilvanS/fd-dev/pull-requests/39/default/diff
I left one comment myself at this pull-request.

The next step then will be to use the proper exit code for "unsupported driver
request" when setting a memory limit on macOS.
msg7485 (view) Author: malte Date: 2018-09-17.15:06:03
In practice, this means that we cannot limit memory on the Mac. The message is
quite old and our Mac buildbot is quite old, so this might conceivably have
changed in the meantime. But we also tested this on the most recent OS X as of
today, version 10.13.6 (High Sierra), and it has remained the same.

This explains why our Mac buildbot is currently failing: the tests that test tha
the memory limit is enforced notice that it isn't. :-)

Suggested changes:

1) In the driver, abort with a critical error if a memory-limiting option is
used on OS X (=> sys.platform == "darwin").

2) Update the buildbot to look for these critical errors if these tests are run
on the Mac.
msg7484 (view) Author: malte Date: 2018-09-17.15:00:03
The corresponding information is hard to find, but Mac OS seems to
*intentionally* provide these APIs (ulimit on the shell, resource.setrlimit in
Python) but ignore them. There is an article on this on lists.apple.com, which
is currently only available on the Wayback Machine:

https://web.archive.org/web/20150908044241/https://lists.apple.com/archives/unix-porting/2005/Jun/msg00115.html

In case this will no longer be available in the future, here is the most
relevant part:

======================================================================================

The setrlimit() administrative limits are not enforced except for certain ones
which are enforcible in BSD code. Those which would require enforcement in Mach
code (VM-based enforcement or scheduler- based enforcement) are not currently
enforced at all.

These administrative limits were never intended to save the system from fragile
code; the way to do that is to make the code less fragile. They historically
date back to the days when CPU time and memory usage were billable resources,
and process accounting and system accounting were used on old timeshare machines
that were purchased by groups, such as several University departments, because
there was no way for a single University department to be able to afford its own
PDP-8. Today, they are basically knobs that have to be there so old code can be
ported, but which don't actually have to do anything, for the most part.

The limits are considered administrative, as they are not intended to permit the
creation of security policy dependencies, nor are they intended to mitigate
problems related to application memory leaks.

The following is the current(0) MacOS X resource limit support table:


    ------------- ---------    ---------
    Limit name    Standard?    Enforced?
    ------------- ---------    ---------
    RLIMIT_CPU      YES          NO
    RLIMIT_FSIZE    YES          YES(1)
    RLIMIT_DATA     YES          NO
    RLIMIT_STACK    YES          YES(1)
    RLIMIT_CORE     YES          YES
    RLIMIT_AS       YES          NO
    RLIMIT_RSS      NO(2)        NO
    RLIMIT_MEMLOCK  NO           NO
    RLIMIT_NPROC    NO           YES(3)
    RLIMIT_NOFILE   YES          YES
    ------------- ---------    ---------


(0) This table is subject to change.


(1) Not entirely correctly enforced; specifically, boundary conditions are
clipped before or after the boundary, rather than exactly at it.

(2) RLIMIT_RSS is a MACOS X-specific alias for RLIMIT_AS.


(3) Whether this limit is the upper bounds depends on current sysctl settings on
the machine.

See also:


http://www.opengroup.org/onlinepubs/009695399/functions/ setrlimit.html

To determine whether or not a limit is enforced by an OS, rather than using the
return code of setrlimit(), you must first use getrlimit() and compare the
result to RLIM_INFINITY; if it's RLIM_INFINITY, then you should not set a limit
(you can, but you will confuse child processes which call getrlimit() themselves
to check for enforcement, if you do).

--


FWIW, the reason your test program bogs down the system is because it starts
swapping, so you end up limited to disk speed; the sample program you give is a
degenerate case.


If you actually *need* to kill a process when its resource usage gets to a
certain point, and you can only make minor modifications to the process, one way
to do this would be to start a watchdog thread that sleeps wakes up
periodically, and calls kill(getpid(), SIGTERM); if any of the fields from
getrusage(RUSAGE_SELF, &rusage); end up geting "too large".

If you need an external watchdog, you'll have to get creative with something like:

char cmd[256];
int rss;
FILE *fp;
int rss_max_allowable = XXX; /* some upper bound */
int pid_to_watch = YYY; /* pid that leaks memory */
...
for(;;) {
sleep(300); /* wake every 5 minutes */
...
sprintf(cmd, "ps -p %d -o rss | tail -1", pid_to_watch);
if ((fp = popen(cmd)) != NULL) {
if (fscanf(fp, "%d", &rss) == 1) {
if (rcss > rss_max_allowable) {
kill(pid_to_watch, SIGHUP); /* "reset" process; may use other signal or SIGKILL */
}
}
pclose(fp);
}
}
...

Obviously, you can use this with anything ps can find out about a process, or
any combination of things it can find out. Using the signal system this way lets
you be a lot more flexible than the process simply crashing when it hits the
limit. For example, maybe you could add a SIGHUP handler that reset the process
instead of killing it (this would work with e.g. bind or sendmail, which both
use dying children to do memory garbage collection), etc..

Hope that helps...


-- Terry
msg7483 (view) Author: malte Date: 2018-09-17.14:57:39
The memory limits that we can set in the driver for the translator and search
component don't seem to be enforced on Mac OS. (More diagnosis follows.)
History
Date User Action Args
2018-09-19 15:07:31silvansetstatus: in-progress -> resolved
messages: + msg7574
summary: When merging, update the documentation at www.fast-downward.org/ExitCodes. ->
2018-09-19 08:25:51cedricsetnosy: + cedric
2018-09-17 19:45:27maltesetmessages: + msg7497
2018-09-17 18:03:03silvansettitle: memory limits not enforced on Mac OS -> memory limits not enforced on Mac OS; add driver exit codes
summary: When merging, update the documentation at www.fast-downward.org/ExitCodes.
2018-09-17 18:02:05silvansetstatus: chatting -> in-progress
assignedto: silvan
messages: + msg7495
2018-09-17 15:06:03maltesetmessages: + msg7485
2018-09-17 15:00:03maltesetmessages: + msg7484
2018-09-17 14:57:39maltecreate