Issue 469: out_of_memory_handler is called more than once

Title	out_of_memory_handler is called more than once
Priority	bug	Status	resolved
Superseder		Nosy List	florian, jendrik, malte
Assigned To	florian	Keywords
Optional summary

Created on 2014-09-18.17:51:08 by florian, last changed by jendrik.

Files
File name	Uploaded	Type	Edit	Remove
planner.cc	malte, 2014-09-20.02:20:32	text/x-c++src

Messages
msg3545 (view)	Author: jendrik	Date: 2014-09-25.15:38:17
I'd say it's easiest to wait with removing EXIT_TIMEOUT until after issue414 is merged. No issue necessary for this, I'll add it to my TODO list.
msg3527 (view)	Author: malte	Date: 2014-09-24.13:23:18
Signals are meant to live in a different parameter ranges than "normal" exit codes (in the range 0..127), a signal suggests that something was terminated due to external reasons that it could do nothing about. (A process can choose to ignore a soft time limit by blocking the signal, but that will only lead to an unblockable kill once the hard limit is reached.) Since the portfolio code terminates itself without involving external time limits and similar shenanigans, I think raising a signal isn't appropriate in that case. So I'm happy with the status quo regarding the signals and exit codes that are used. Removing EXIT_TIMEOUT in the planner may be a good idea. People might expect it to be used when specifying a search timeout with our new options, but it isn't. Once the new driver script is merged, we should then adapt http://www.fast-downward.org/PlannerUsage?highlight=%28timeout%29 since this currently documents the exit codes as part of the search executable, but the portfolio-only ones would be part of the driver script instead.
msg3523 (view)	Author: jendrik	Date: 2014-09-24.11:13:03
Maybe we should remove EXIT_TIMEOUT from the planner code then? Currently, the portfolio code exits with EXIT_TIMEOUT if all configurations exited with SIGXCPU. Should it exit with SIGXCPU instead, too? It seems odd to have too different exit codes for the same thing, but I understand that one is calculated and one is a signal and probably we shouldn't mix the too. If we don't act on the points above, I think both the portfolio code and lab already handle SIGXCPU exit codes correctly.
msg3519 (view)	Author: malte	Date: 2014-09-24.00:56:09
If a process receives a signal, then the default behaviour is to terminate with that signal, so there is a certain consistency in that. If we don't have a particular reason to change this, I think it makes sense to keep the default behaviour. (Also, it's hard or impossible to guarantee that our signal handler will always work as desired under all circumstance, e.g. on extremely loaded machines, and in cases where it doesn't work as intended, we'll also receive a SIGXCPU result that we have no control over.)
msg3517 (view)	Author: jendrik	Date: 2014-09-23.22:09:08
In case we hit the soft time limit, why do we want the planner to "exit with" signal SIGXCPU instead of with EXIT_TIMEOUT?
msg3515 (view)	Author: malte	Date: 2014-09-23.19:34:28
> I don't know about any other ways we use to determine a timeout. If there are > any in lab, I suggest we open an issue in the lab tracker. Jendrik, what do you > think? utilities.{h,cc} and the portfolio code define an exit code called "EXIT_TIMEOUT", but it seems that it's currently impossible for the planner to generate this.
msg3513 (view)	Author: florian	Date: 2014-09-23.19:12:58
> we should now always receive a raw_memory value -- if we don't, > this should count as an unexplained error. I used a custom parser in this experiment, so a regular lab experiment will not contain this value. I'll send a pull request to Jendrik to include it in lab. > This would mean that there would be a new signal with > which the code could terminate (SIGXCPU), so this should be interpreted as a > regular error signifying "timeout" in the portfolio error code analysis and lab > error code analysis. As far as I know, lab already tests for the exit code 128 + 24 (signal SIGXCPU) and considers it a timeout. Our signal handler just prints a message and then raises the original signal again, so the exit code is not changed by the soft limit. > In fact, once this works, other error codes should > perhaps no longer be considered valid indications of timeout. (Don't know which > criteria we use for this at the moment.) How do we proceed with this? Open a new > issue, or is it simple enough that we just add the additional line? I don't know about any other ways we use to determine a timeout. If there are any in lab, I suggest we open an issue in the lab tracker. Jendrik, what do you think?
msg3512 (view)	Author: malte	Date: 2014-09-23.18:43:42
Given that this issue was already landed (I wrote my comment simultaneously with yours), I have now gone ahead and added the additional line to the codebase. I don't think we necessarily need to add an experiment for this, but of course it wouldn't hurt, and lab + portfolio should be adapted for the new behaviour. This means that we should now always receive a raw_memory value -- if we don't, this should count as an unexplained error.
msg3511 (view)	Author: malte	Date: 2014-09-23.18:41:58
Excellent! > Reporting raw memory usage does not work if the task is killed by the grid > engine for exhausting its time limit. I made a little test: the planner would terminate gracefully at "soft" time limits if we applied the following patch: $ hg diff diff -r 1b71442ead73 src/search/utilities.cc --- a/src/search/utilities.cc Tue Sep 23 12:04:24 2014 +0200 +++ b/src/search/utilities.cc Tue Sep 23 18:39:13 2014 +0200 @@ -42,6 +42,7 @@ signal(SIGTERM, signal_handler); signal(SIGSEGV, signal_handler); signal(SIGINT, signal_handler); + signal(SIGXCPU, signal_handler); } #if OPERATING_SYSTEM == LINUX \|\| OPERATING_SYSTEM == OSX IIRC, lab already uses a soft + hard limit, so I don't think we would need to change anything in lab. This would mean that there would be a new signal with which the code could terminate (SIGXCPU), so this should be interpreted as a regular error signifying "timeout" in the portfolio error code analysis and lab error code analysis. In fact, once this works, other error codes should perhaps no longer be considered valid indications of timeout. (Don't know which criteria we use for this at the moment.) How do we proceed with this? Open a new issue, or is it simple enough that we just add the additional line?
msg3510 (view)	Author: florian	Date: 2014-09-23.18:38:41
Merged.
msg3508 (view)	Author: florian	Date: 2014-09-23.17:06:03
Results look good for blind search on all tasks from satisficing and optimal suites. There are no more unexplained errors in the new version and memory is reported correctly even if the planner runs out of memory. http://ai.cs.unibas.ch/_tmp_files/pommeren/issue469-issue469.html Reporting raw memory usage does not work if the task is killed by the grid engine for exhausting its time limit.
msg3501 (view)	Author: florian	Date: 2014-09-22.22:38:13
For the record: we looked at the results for this experiment offline and it showed errors for low reserves (4, 8, 16 KiB) and high reserves (128, 256, 512). Low reserves likely cause an error because the free memory is not sufficient to allocate the file stream's buffer. We only have have of the reserve in this experiment and need a little over 8 KiB (msg3476). High reserves can lead to the failed assertion that started this issue (see point 3. in msg3473). We are now running experiments on a version that uses no buffer for the stream and a 32 KiB memory buffer.
msg3493 (view)	Author: malte	Date: 2014-09-21.18:32:14
No matter what I do, I have trouble reproducing the issue with the current code. You said that you had to play around with the revision cache in the previous experiment. Is it possible that the compile that was ultimately used happened in a slightly different environment than most experiments, maybe compiled on a different machine? The ingredients for the failure seem to be there; it just doesn't seem to arise any more. I've pushed some changes that might make it more likely for the problem to arise. Can you run some additional experiments with the new code? There is a new option "--additional-memory-pressure". To minimize the number of configurations, I recommend to always set it to min(32768, memory_reserve / 2) in this experiment. Let's only look at the 2 GiB memory limit, but keep the variation of the reserve memory.
msg3492 (view)	Author: malte	Date: 2014-09-21.17:49:29
Well, this is tricky stuff. Static vs. dynamic compilation doesn't make a difference. What does seem to make a difference is when the memory reserve is allocated. With our new code, the memory reserve is no longer allocated with mmap, even if we set it to the rather large 512 KiB. It seems that the reason is because we allocate it later during execution. Before allocating the memory reserve, we read the input, for which a buffer of 1 MiB is allocated but not released. It seems that this buffer is then reused by general memory management, and hence our reserve buffer is allocated internally rather than via mmap. So to reproduce the problematic behaviour, we should allocate the reserve memory before reading the input. By making this change, I have been able to see the mmap/munmap behaviour again (which means that our released memory buffer doesn't help), but even then I've had a hard time reproducing the problem. I think the reason for this is that even without usable reserve memory, we can get lucky and have 8K available memory if the last failed memory request required more memory than this.
msg3489 (view)	Author: malte	Date: 2014-09-21.15:13:02
> One possible explanation is that starting the task without an external address > space limit causes the mmap area to be allocated further out than it is with > the 2 GiB external limit, and hence we don't hit it any more. This should be > quite easy to test with strace and some manual runs. I'll look into it once I > know where to find the experiment. No, that's not the case. Both with an internally and an externally set limit, at some point we hit the mmapped area and brk() starts failing. There's an important difference in the strace compared to the previous experiment: once we run out of memory and release the memory buffer, there is no more memory management activity with the OS at all until termination: no munmap to release the memory buffer to the OS and no attempt to obtain more memory again with brk or mmap2. I think this means that we keep the memory from the buffer around internally now, and hence when we need memory later for determining peak memory, it is available in our internal memory pool. Regarding why the difference in behaviour, I think the reason is due to the way we compile now: previously we used a dynamically linked debug compile. Now we use a statically linked release compile. Presumably the memory management is different in the two cases, perhaps because the machine on which we compile has slightly different library versions than the ones on which we run. I'll try to test this hypothesis.
msg3488 (view)	Author: malte	Date: 2014-09-21.14:19:30
> Where can I find the logs on maia? Never mind, found it. (Of course it's lying next to the other experiment.) There seems to be not a single case where we get the warning that peak memory could not be determined, which supports the theory that the memory environment is different here in some way. (We had 194 such warnings in the previous experiment.)
msg3487 (view)	Author: malte	Date: 2014-09-21.13:45:41
Well, it's good to at least see part of what we're expecting to see. :-) The disappearance of the errors with the 512 KiB reserve is odd indeed, though. Where can I find the logs on maia? Some considerations: 1. At first glance, the code looks good. 2. Apart from the failed assertions, another symptom of the same issue is when the code emits a warning that it cannot determine the peak memory limit. We have ignored this in the past, but it's really just a slightly different failure mode for the same issue (running out of memory when trying to read the status file, but failing more gracefully). In the past, I think the reports always showed the "memory" value, while now they only seem to show them for successful tasks. So we cannot find these "-1" cases. Would it be possibly to either add a new attribute that always shows the memory value or change the existing one to also include failed tasks? (I think we removed the values for failed cases mainly because of these -1 values.) In the medium term, I think it would be good to have several attributes (which don't have to be part of the report by default, of course). Depending on situation, useful information could include: - peak memory for successful tasks, nothing for others - raw peak memory - a binary attribute indicating if we ran out of memory (our existing exitcode-based attributes might already serve this purpose) [After I wrote this, I like the option of leaving the existing memory attribute as is and adding a "raw peak memory" attribute that always contains the peak memory info from the log, including failed runs and cases where peak memory is reported as -1.] 3. It's possible that we're just luckier this time and our sample size of 50 tasks is too small. But if I recall correctly, we had multiple failures with Airport before, so this does seem a bit unlikely. One possible explanation is that starting the task without an external address space limit causes the mmap area to be allocated further out than it is with the 2 GiB external limit, and hence we don't hit it any more. This should be quite easy to test with strace and some manual runs. I'll look into it once I know where to find the experiment.
msg3486 (view)	Author: florian	Date: 2014-09-21.13:11:33
I ran experiments for all combinations and the results where a bit unexpected. Using a memory reserve <= 8 KiB leads to the expected error in about half the tasks but there were no assertion errors for larger reserves. Limit Reserve Unexplained errors 2 GiB 8 KiB 24 2 GiB 4 KiB 27 1 GiB 8 KiB 26 1 GiB 4 KiB 29 64 MiB 8 KiB 25 64 MiB 4 KiB 31 256 MiB 8 KiB 31 256 MiB 4 KiB 32 Full report: http://ai.cs.unibas.ch/_tmp_files/pommeren/issue469-airport-buffered.html I guess there should at least be errors for the original setting of 2 GiB and 512 KiB, so I think I might have introduced a bug with the parameters. I ran the experiment with a 3 GiB limit from lab, so only the internal limit should have mattered. Could you have a look at the diff? https://bitbucket.org/flogo/downward-issues-issue469/branch/issue469-experimental
msg3478 (view)	Author: jendrik	Date: 2014-09-20.11:22:16
Interesting stuff! I wanted to work on issue414 over the weekend. Will let you know how it goes.
msg3477 (view)	Author: malte	Date: 2014-09-20.02:43:27
PS: Disabling buffering when reading the status file (with pubsetbuf like here: http://stackoverflow.com/questions/16605233/how-to-disable-buffering-on-a-stream) eliminated essentially all the memory use. Strangely, though, when report peak memory, the program now seems to use less memory when doing nothing. This seems to be either a flaw with the methodology or the code triggers the release of some internal buffer of the library. Whatever happens, it may be a good idea to disable the buffering here, considering that this code is usually called in low-memory conditions. > Regarding open/write/close: I haven't made changes to the code yet, as I have > mostly been looking at the strace output. If you can run additional experiments > with different memory limits and memory reserves, I can take care of this, though. I now think that we've seen all we need to see from this output, so I'd rather not do this and instead suggest to rip the code out again completely. So my suggestion is to (in the experimental brranch): - remove these output lines again (or not, they don't really harm either) - change the Makefile to remove the NDEBUG flag in release mode - add the options for limiting memory and for setting the amount of reserve memory and then run an experiment on the Airport tasks with blind search. At this point I don't yet want to remove the buffering because we want to see the effect of the different memory limits and reserves and might lose this if we remove the buffering. Once we have results for this, the next step could be to see what happens if we remove the buffering. Does this make sense? Can you set up such an experiment? (Happy to help if desired.) Based on what we know now, I'd suggest trying the memory limits: - memory limits: 2 GiB (careful, doesn't fit into an int -- does fit into a size_t), 1 GiB, 256 MiB, 64 MiB - memory reserves: 2 MiB, 1 MiB, 512 KiB (our current amount), 256 KiB, 128 KiB, 64 KiB, 32 KiB, 16 KiB, 8 KiB, 4 KiB I don't think we need all 40 combinations (= 2000 tasks), but I guess it does't really hurt to use them either. The experiment should be rather quick in any case.
msg3476 (view)	Author: malte	Date: 2014-09-20.02:20:32
OK, ltrace didn't really help (and dtrace in the other message was a typo). It does contain much information about allocations and such, but not in the parts we're interested in. I don't really know why; perhaps much of this stuff is inlined, perhaps the tracing doesn't catch calls within a given library (which makes sense from the way I understand it works). Instead I used a different method for estimating how much memory we need: writing a simple program that exhausts memory and checking how soon it is exhausted if we include print_peak_memory() vs. if we don't. The program is attached and also includes an example on how to set a memory limit from within C++. To test it in a "realistic" setting, it is meant to replace the planner main file. (Everything else apart from utilities.h can then be removed from the Makefile.) I've tried it with two different kinds of allocations, and in both cases, I came up with roughly 8.5 additional memory KiB needed when we use print_peak_memory.
msg3475 (view)	Author: malte	Date: 2014-09-20.01:05:04
PS: Instead of hacking the revision cache, I suggest removing "-DNDEBUG" from the Makefile in the experimental branch. Then we can use release mode directly. (Running experiments in debug mode is also something that will become easier once issue414 is merged and one lab is adapted to use the new script.)
msg3474 (view)	Author: malte	Date: 2014-09-20.01:03:20
> For the different memory reserves, I guess we need a parameter (or hack it > into the code before compiling). I think a parameter in the code is perhaps the easiest way if we only want the functionality in the experimental branch. See for example how the random seed is set ("--random-seed" in option_parser.cc). > For different memory limits, there is no way around a separate experiment for > each limit. They can be set up by adding the parameter "limits={'search_memory': > 1024}" to the experiment. We can support this better once issue414 is merged, which is currently waiting for portfolio integration. (Jendrik, do you have time for this in the near future? Otherwise I can also try to take over the issue again.) In the meantime, for this issue, it's perhaps easiest to simply set a memory limit from within the program based on an option inside the planner, just like the option for the memory reserve. See the second-last message on http://www.linuxquestions.org/questions/programming-9/setrlimit-81206/ for an example. (We want to limit RLIMIT_AS.)
msg3473 (view)	Author: malte	Date: 2014-09-20.00:55:34
10 * 2 * 50 is quite a bit more than 384, though, while 10 * 2 * 15 would be less. But I don't mind running all 50 airport tasks if that saves work for us. Actually, for simplicity, for now I think it's enough to run each airport task only once. We're no longer stabbing completely in the dark (see below), and to test some quick hypotheses, I think running each task only once will give us enough chances to trigger bogus behaviour. But once we have something that seems to work more reliably, perhaps we might want to do a more thorough test (or perhaps running all tasks would be that thorough test). Regarding open/write/close: I haven't made changes to the code yet, as I have mostly been looking at the strace output. If you can run additional experiments with different memory limits and memory reserves, I can take care of this, though. The strace output reveals some interesting things. I ran the strace call from my previous email with memory limits (ulimit -v) of 64 MiB, 128 MiB, 256 MiB, 512 MiB, 1 GiB and 2 GiB. Judging from the strace output, memory management on the maia machines seems to work roughly as follows: 1. "Small" memory requests are handled by the C++ standard library internally. It parcels the heap, and when it runs out of space, it calls brk() to extend the heap, usually by 132-136 KiB. (Presumably it's something like: whatever we need straight away plus 128 KiB for the future, in increments of 4 KiB, which is the page size.) Hence, when small objects are released, their memory should be usable directly by other small objects, without the OS being involved in any way. (Of course, memory fragmentation is an issue here. If we release two 1 KiB objects, this doesn't give us space for a 2 KiB object, unless they happened to be adjacent in memory and memory management detects this, which I assume it would.) 2. When brk() fails for some reason, it attempts fallback measures for providing memory for small objects despite being unable to increase the heap. Specifically, mmap2() is called with various options to get chunks of 1 MiB or 2 MiB of memory. If these also fail, the out-of-memory handler is called. 3. "Large" memory requests (not sure how large they need to be to count as large, but from the strace output, 196 KiB seems to be large enough -- also see below) are satisfied directly by mmap2(). So for large objects, we go directly to the OS, and when the memory for such an object is released, the memory is directly returned to the OS with munmap(). For example, for the 1 GiB run, roughly 48 MiB -- almost 5% of the usable memory -- seem to be needed in one contiguous chunk at some point. This is likely the main hash table of the global state registry. With more than 5 million evaluated states at that point, this seems reasonable. Here is what strace tells us about the cases when we run the task with a memory limit of 1 GiB or below (none of these showed the error): We see many brk() calls, using up the available memory in chunks of roughly 132-136 KiB. These are interspersed with the occasional mmap2()/munmap() for large objects. At some point brk() cannot provide more memory, and neither can the mmap2() fallback. At this point, our out-of-memory handler releases our reserve memory. This is a large object (512 KiB), so it was allocated with mmap2() rather than brk(). This means that the memory it frees is released with munmap() and cannot be used directly by the program; it is returned to the OS. However, this means we are now again 512 KiB (actually 516 KiB because the mapped memory is a bit larger, presumably for book-keeping information) below our memory limit, so the next brk() attempt succeeds as there is available memory again. One further brk() call providing 132 KiB is enough to gracefully take the program to termination. In most cases, this brk() call is needed right after the call to open the status file from which we read the used memory. In other cases, it's right before this call. In any case, right around this call seems to be where we need additional memory (presumably for buffering, but perhaps not exclusively for buffering given that brk() isn't always called from exactly the same place). Here is what is different with the 2 GiB limit: After using around 1.3-1.4 GiB, in other words long before we reach the memory ulimit, the brk() calls start failing. At this point, memory management starts using mmap2() with chunks of 1 MiB instead after each failed called to brk(). This continues until we run out of memory and release our reserved memory block. After this, brk() still fails, and the fallback mmap2() also fails because it wants a chunk of 1 MiB but we only released roughly half of that. We get the recursive invocations of the handler, and at some point these lead to a segmentation fault. (I don't really know why. Perhaps we run out of stack space, although for this the number of recursive invocations appears low. Perhaps the kernel detects that the SIGABRTs from the failed assertion trigger the signal handler recursively and doesn't like this.) So there are two main problems here: 1. We return the reserve memory directly to the OS rather than keeping it around for our own use. 2. When we ask the OS for more memory after returning it some, we ask for a larger chunk than what is available. Problem #2 could presumably be fixed by reserving 1 MiB or more of memory, but this does appear somewhat wasteful (it's probably 1000x more than what is actually needed) and hence I think this would be a pity. Problem #1 could perhaps be fixed by using a smaller reserve memory, less than the amount where allocations go straight to mmap2(). I think that something like 32 KiB might do the trick. (Some googling shows that 128 KiB might be the magic number. [I have since confirmed this -- "man 3 malloc", search for "MMAP_THRESHOLD".]) None of these is a stable solution of course, as the magic constants of 1 MiB, 128 KiB or whatever can change at any time. But if they work, they would be good enough for me. Alternatively, we could also try to implement our out-of-memory handlers in a way that they don't need dynamic memory at all. This should be quite possible, too, by avoiding going through C++ libraries and the iostream library there. (It's not that we do terribly complicated processing there.) One interesting question is why brk() starts failing in the 2 GiB case. Heap space must be contiguous, and my guess was that at some point we can't linearly increase the heap because we hit a part of the address space that is already in use by mmap. Going through the strace output for the 2 GiB case, the last successful brk() call sets the heap barrier to address 0x55536000. Extending it to 0x55557000 fails. Looking at the strace() output again, we see that address 0x55576000 and later are used for mmap2() calls -- these addresses are corect for all memory limits, not just the 2 GiB limit. This is awfully close to 0x55557000, so indeed there is a limit there that brk() cannot go beyond. With the small memory limits, this means that the initial brk() value (which was 0xb0c5000 or less for all memory limits) and the mmap() area are far enough apart that there is always enough space to grow the heap. The difference between 0x55576000 and 0xb0c5000 is more than 1 GiB, so heap and mmap area will never collide. However, with our 2 GiB memory limit, they can. (Presumably the kernel's heuristic for placing these things is good enough for all but the most memory-hungry applications.) Even if all this is correct, one slightly unsatisfying thing that remains is that we have to guess how much of a memory reserve we actually need in order to terminate the program gracefully. To gather more information about this, strace is not good enough -- we would also need to trace the allocations within our program that don't hit the OS to see how much memory is actually asked for. I will see if I can find out some relevant information with dtrace.
msg3472 (view)	Author: florian	Date: 2014-09-19.20:34:53
As far as I know, the easiest way to run a config more than once is to add it multiple times with different config nicks. Running all airport tasks is also slightly easier to set up than only 15 tasks, so as long as we are staying under 384 tasks per experiment, we may want to stick with the full domain. For different memory limits, there is no way around a separate experiment for each limit. They can be set up by adding the parameter "limits={'search_memory': 1024}" to the experiment. For the different memory reserves, I guess we need a parameter (or hack it into the code before compiling). If I should do some of these changes, could you push your updated code to the bitbucket repository?
msg3471 (view)	Author: malte	Date: 2014-09-19.20:07:31
I agree running just Airport is enough. The results don't seem to be well reproducible, though, so ideally I think later exploratory experiments should perform multiple runs (10?) for each task if this feasible. (If not, perhaps we can also just live with the randomness, but then I'd rather have a larger task set to make sure we do catch the cases that only strike somewhat rarely.) Only blind search and Airport seems fine, and I'd limit it even further to just those 15 airport tasks for which we've seen problems if we can do the 10 repetitions. Here are some things that I think might convey some useful information. (This doesn't mean we have to run all these:) - Trying different memory limits. We currently use 2 GiB. I'd also be interested in results for 1 GiB, 256 MiB or 64 MiB. - Trying different sizes for the memory reserve. I'd be interested what happens with 1000 KiB, 1 MB, 2 MB, 256 KiB, 128 KiB, 32 KiB, or 8 KiB. (There are reasons why the small limits might behave differently in unintuitive ways -- will elaborate if we see something interesting.)
msg3470 (view)	Author: malte	Date: 2014-09-19.19:54:51
> Sorry about the wrong functions, I'm not at all familiar with these low-level > functions and thought fputs would be more low-level than a file stream. However, > there are outputs for entering the new_handler. If fputs allocates memory, > shouldn't these immediately lead to an endless loop? Anyway, you should have > write access to the repository, so you can push any changes there. No problem! Just thought it would be good to clarify. (C-style file output is generally buffered, just like C++ output is. I'm not sure if there are any meaningful differences between C stdio and C++ iostreams apart from the fact that one interface is procedural and the other is object-oriented.) > I didn't try to reproduce the effects locally for this issue. Last time I looked > at this code, I ran experiments to determine the overhead of the python wrapper. > Back then, I tried local experiments and had no problems locally. The grid > seemed to behave differently. Maybe this was because of the older > compiler/library versions? Did the static build make a difference for the memory > issues? Memory management is part of the standard library and hence glibc-dependent. (Or whatever the C++ equivalent is called, but I assume it defers to glibc at the end.) So it could well be something about different library versions. The main reason I wanted a static compile is because I wanted to work on my machine. Only statically linked binaries from my machine work on maia. (Apart from this point, static vs. dynamic linking may make a difference in so far as statically linked executables will use libraries from the compiling machine, while dynamically linked ones will use libraries from the executing machine. Also, static linking will prevent using tools like ltrace, if we want to try that out later.)
msg3469 (view)	Author: florian	Date: 2014-09-19.19:48:17
About the grid experiment: from the report it looks like running one domain, e.g. airport, should be enough. This would make the experiment much faster.
msg3468 (view)	Author: malte	Date: 2014-09-19.19:47:31
Thanks for all the data, Florian! I had a somewhat closer look. I have been able to reproduce this on maia, outside of lab, on the ase01 machine (not recommended to use it directly unless we know it's not being used at the moment), as follows: ulimit -v 2097152 /infai/pommeren/projects/downward/issues/issue469/experiments/issue469/data/issue469-blind/code-3372a2d5249d/search/downward-release --search 'astar(blind())' < /infai/pommeren/lab/preprocessed-tasks/3372a2d5249d-3372a2d5249d/airport/p18-airport3-p6.pddl/output This is an Airport task, the first failing one in Florian's report. I verified that the machine was otherwise not busy, so it cannot be an issue with the Linux OOM killer or something similar. (I guess this would give very different output anyway, just thought it might be good to mention this.) However, "reproduce" is perhaps saying too much. Repeating the above command 10 times, the outcome was: - 3x failed assertion - 4x warning that peak memory couldn't be determined - 3x success (including correct peak memory output) It would be nice to reproduce it on a local machine; maybe this is possible with a bit more persistence. i had no luck reproducing it with a lower memory limit. I then tried tracing the program with strace. The first thing this showed is that we call "times" all the time, making unfiltered stracing impossible. I think it would be useful to change this behaviour. Is this perhaps the search timeout? If yes, it might be a good idea to change the code to only look at the time if a finite timeout has been set. I then tried a filtered strace: strace -etrace=\!times /infai/pommeren/projects/downward/issues/issue469/experiments/issue469/data/issue469-blind/code-3372a2d5249d/search/downward-release --search 'astar(blind())' < /infai/pommeren/lab/preprocessed-tasks/3372a2d5249d-3372a2d5249d/airport/p18-airport3-p6.pddl/output. (To be continued -- waiting for some more data from a second strace run and don't want to lose the partially typed message in the meantime if something bad happens. :-))
msg3467 (view)	Author: florian	Date: 2014-09-19.19:45:39
Sorry about the wrong functions, I'm not at all familiar with these low-level functions and thought fputs would be more low-level than a file stream. However, there are outputs for entering the new_handler. If fputs allocates memory, shouldn't these immediately lead to an endless loop? Anyway, you should have write access to the repository, so you can push any changes there. I didn't try to reproduce the effects locally for this issue. Last time I looked at this code, I ran experiments to determine the overhead of the python wrapper. Back then, I tried local experiments and had no problems locally. The grid seemed to behave differently. Maybe this was because of the older compiler/library versions? Did the static build make a difference for the memory issues?
msg3466 (view)	Author: malte	Date: 2014-09-19.18:19:44
Can't reproduce this locally. I guess that's the same for you? Experimenting this is made a bit more difficult for me because the debug build is not static. We already have an option to control whether the release build is static -- maybe we can have a similar option for the debug build in the future, too. For now, I've just removed "-DNDEBUG" from the release build locally so that I can test this with a release build.
msg3465 (view)	Author: malte	Date: 2014-09-19.18:04:48
Thanks! Had a brief look at the code. When I wrote this: > maybe we can avoid such issues by writing out debugging information with > low-level Unix commands (open/write/close) directly to files rather than to > stdio/stderr and making such we're not using heap memory there. I really meant it. fopen/fclose can interact with program-internal memory management in the same way as the iostream library. I'll try with open/write/close instead and see if it makes a difference. (Perhaps not, but we should rule out our debugging statements as sources of error.)
msg3464 (view)	Author: florian	Date: 2014-09-19.18:02:15
The debug code is in the branch issue469-experimental which I planned not to push to the main repository. https://bitbucket.org/flogo/downward-issues-issue469/branch/issue469-experimental To run the experiment in debug mode, you need to switch the binary in the revision cache before building the experiment (make debug && mv downward-debug downward-release). Unfortunately, this is the easiest option for debug experiments at the moment. You should also be able to access my log files on maia at /infai/pommeren/projects/downward/issues/issue469/experiments/issue469/data/issue469-blind I uploaded the report that contains run_dirs for all unexplained errors to http://ai.cs.unibas.ch/_tmp_files/pommeren/issue469-blind.html#unexplained-errors
msg3462 (view)	Author: malte	Date: 2014-09-19.17:55:06
> I think the stream constructor is the more likely problem, but it seems strange > that it should run out of memory. Indeed. Even if it were fully buffered by the ifstream, /proc/self/status should only be roughly 1K in size. > This loop is repeated a few times (5 in the case I looked at) and then stops in > the memory handler for no apparent reason. Maybe the process is killed by the > system? This appears a bit odd, too. > It looks like the condition would always evaluate to false, but in the second > nested call, it evaluates to true. Maybe the confusion is about how static variables of functions work? Their initialization (in this case to 0) is only performed the first time the function is called. After that, they retain their value from function application to function application. They are essentially global variables with function scope and delayed initialization (only initialized when their function is first called). Can you give me access to your test code for this?
msg3460 (view)	Author: florian	Date: 2014-09-19.17:46:12
I peppered utilities.cc with debug output and ran an experiment. Steps 1) through 7) are called as intended, the second failed call to new comes from one of the following two lines. The debug output above the first line was printed and instead of printing the debug output below the second line, the output for entering the new_handler function was printed. ifstream procfile("/proc/self/status"); string word; I think the stream constructor is the more likely problem, but it seems strange that it should run out of memory. After that, the output shows we are running into a loop: The new_handler aborts because of the failed assertion The signal_handler catches the signal and tries to output peak memory get_peak_memory_in_kb tries to allocate memory and control ends up in the new_handler again This loop is repeated a few times (5 in the case I looked at) and then stops in the memory handler for no apparent reason. Maybe the process is killed by the system? Another thing I found strange in the code was in the signal_handler. It seems to work, but I do not understand why: static volatile sig_atomic_t handler_in_progress = 0; if (handler_in_progress) { raise(signal_number); } handler_in_progress = 1; It looks like the condition would always evaluate to false, but in the second nested call, it evaluates to true.
msg3446 (view)	Author: malte	Date: 2014-09-18.18:35:25
One more thing: we cannot really trust that output on stdout/stderr corresponds to the intended output here. Things can get lost, duplicated, or appear out of order in interrupted code like this due to interactions with buffering. (stderr is probably safer than stdout here since it is usually not buffered, but I don't know how redirection I/O to a file as we do affects this.)
msg3445 (view)	Author: malte	Date: 2014-09-18.18:33:05
Signal 6 is SIGABRT, which is (among other things) raised by failed assertions. It is also raised by our ABORT macro, but these cases should be quite obvious. 512 KB of memory is a lot for doing a tiny bit of I/O, so I think the first thing we should do is find out where this goes wrong. The tricky bit is that adding debug output might itself trigger an out of memory condition... maybe we can avoid such issues by writing out debugging information with low-level Unix commands (open/write/close) directly to files rather than to stdio/stderr and making such we're not using heap memory there. I think it might be a good idea to add a tracing line start from right before "handler_in_progress = 1" in signal_handler and after each following line in this function and the functions it calls (directly or indirectly), and also before/after every line in out_of_memory_handler and exit_with. This is a bit awkward, but probably still one of the simplest ways to find out what is going on here.
msg3442 (view)	Author: florian	Date: 2014-09-18.17:51:08
When running in debug mode on the grid, we got a lot of cases (~2500) where our out-of-memory-handler is called more than once which violates an assertion. The following steps are supposed to happen when the planner runs out of memory: 1) 'new' tries to reserve memory and fails. 2) Our new_handler 'out_of_memory_handler()' is called. 3) The memory handler asserts that the variable 'memory_padding' is non-zero. 4) The memory handler frees the memory reserved by 'memory_padding' and sets the variable to zero. 5) Debug output: "Failed to allocate memory. Released memory buffer." 6) exit_with(EXIT_OUT_OF_MEMORY) outputs: "Memory limit has been reached." and calls 'exit(EXIT_OUT_OF_MEMORY)' 7) 'exit_handler()' is called which in turn calls 'print_peak_memory()' From the output it looks like steps 1) to 6) work: stdout: ======= Failed to allocate memory. Released memory buffer. Memory limit has been reached. Peak memory: -1 KB caught signal 6 -- exiting ======= stderr: ======= downward-release: utilities.cc:91: void out_of_memory_handler(): Assertion `memory_padding' failed. downward-release: utilities.cc:91: void out_of_memory_handler(): Assertion `memory_padding' failed. Unexpected error. warning: could not determine peak memory ../../code-a38b046fe1e5/search/downward: line 193: 61001 Aborted ======= The output "caught signal 6 -- exiting" comes from the 'signal_handler()' which should not be called. I'm not sure whether this is caused by 'exit(EXIT_OUT_OF_MEMORY)' (EXIT_OUT_OF_MEMORY = 6) or whether this is caused by the failed assertion. As far as I know, exit code 6 is not the same as signal 6, so the second option seems more likely to me. One explanation could be that step 6) runs out of memory and starts step 1) again. We currently have a memory padding of 512 * 1024 chars.

History
Date	User	Action	Args
2014-09-25 15:38:17	jendrik	set	messages: + msg3545
2014-09-24 13:23:18	malte	set	messages: + msg3527
2014-09-24 11:13:03	jendrik	set	messages: + msg3523
2014-09-24 00:56:09	malte	set	messages: + msg3519
2014-09-23 22:09:08	jendrik	set	messages: + msg3517
2014-09-23 19:34:28	malte	set	messages: + msg3515
2014-09-23 19:12:58	florian	set	messages: + msg3513
2014-09-23 18:43:42	malte	set	messages: + msg3512
2014-09-23 18:41:58	malte	set	messages: + msg3511
2014-09-23 18:38:41	florian	set	status: chatting -> resolved messages: + msg3510
2014-09-23 17:06:03	florian	set	messages: + msg3508
2014-09-22 22:38:49	florian	set	assignedto: florian
2014-09-22 22:38:14	florian	set	messages: + msg3501
2014-09-21 18:32:14	malte	set	messages: + msg3493
2014-09-21 17:49:29	malte	set	messages: + msg3492
2014-09-21 15:13:02	malte	set	messages: + msg3489
2014-09-21 14:19:30	malte	set	messages: + msg3488
2014-09-21 13:45:41	malte	set	messages: + msg3487
2014-09-21 13:11:33	florian	set	messages: + msg3486
2014-09-20 11:22:17	jendrik	set	messages: + msg3478
2014-09-20 02:43:27	malte	set	messages: + msg3477
2014-09-20 02:20:32	malte	set	files: + planner.cc messages: + msg3476
2014-09-20 01:05:04	malte	set	messages: + msg3475
2014-09-20 01:03:20	malte	set	messages: + msg3474
2014-09-20 00:55:34	malte	set	messages: + msg3473
2014-09-19 20:34:53	florian	set	messages: + msg3472
2014-09-19 20:07:31	malte	set	messages: + msg3471
2014-09-19 19:54:51	malte	set	messages: + msg3470
2014-09-19 19:48:17	florian	set	messages: + msg3469
2014-09-19 19:47:31	malte	set	messages: + msg3468
2014-09-19 19:45:39	florian	set	messages: + msg3467
2014-09-19 18:19:44	malte	set	messages: + msg3466
2014-09-19 18:04:48	malte	set	messages: + msg3465
2014-09-19 18:02:15	florian	set	messages: + msg3464
2014-09-19 17:55:06	malte	set	messages: + msg3462
2014-09-19 17:46:12	florian	set	messages: + msg3460
2014-09-18 18:45:13	jendrik	set	nosy: + jendrik
2014-09-18 18:35:25	malte	set	messages: + msg3446
2014-09-18 18:33:05	malte	set	status: unread -> chatting messages: + msg3445
2014-09-18 17:51:08	florian	create

Issue469