msg8624 (view) |
Author: jendrik |
Date: 2019-03-06.09:58:06 |
|
I addressed it and merged the branch again.
|
msg8623 (view) |
Author: malte |
Date: 2019-03-06.09:16:08 |
|
Works for me. I left a small comment.
|
msg8622 (view) |
Author: jendrik |
Date: 2019-03-06.09:09:13 |
|
I think your latter suggestion is a good workaround. I pushed the change to
https://bitbucket.org/jendrikseipp/downward/pull-requests/122 .
|
msg8621 (view) |
Author: malte |
Date: 2019-03-06.00:39:26 |
|
I think that's unsafe, there's no reason why the compiler should have any
particular file name. On my machine I have g++ version 5, but it's just called
/usr/bin/g++. We could call the compiler executable with --dumpversion and hope
that doesn't do anything stupid on non-g++, but that's not perfect either.
It's unfortunate that the CMAKE_CXX_COMPILER_ID (and version) are missing. I'm
sure there would be a way to get at them programmatically if we knew what we
were doing...
It looks like "cmake -LA -N ." includes all variables that *we* set, so we might
be able to work around this by adding an assignment into the CMake file that
assigns CMAKE_CXX_COMPILER_ID (and the version) to some externally visible name
of our choosing if it's internally visible, and then just use "cmake
--get-variable=JUERGEN_POTZKOTHEN" (or whatever variable name we choose). Or use
CMAKE_COMPILER_IS_GNUCXX. (See our function "fast_downward_set_compiler_flags"
in the CMake files.)
|
msg8619 (view) |
Author: jendrik |
Date: 2019-03-06.00:11:35 |
|
Thanks for the link! I searched around a bit more, but didn't find anything
else. Using the command from your link I get:
builds/release$ cmake -LA -N ../../src/ | grep COMPILER
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/g++-5
CMAKE_CXX_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar-5
CMAKE_CXX_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib-5
CMAKE_C_COMPILER:FILEPATH=/usr/bin/gcc-5
CMAKE_C_COMPILER_AR:FILEPATH=/usr/bin/gcc-ar-5
CMAKE_C_COMPILER_RANLIB:FILEPATH=/usr/bin/gcc-ranlib-5
The CMAKE_CXX_COMPILER_ID and CMAKE_CXX_COMPILER_VERSION are missing however.
Are you fine with checking that the last part of CMAKE_CXX_COMPILER:FILEPATH is
"g++-5"?
|
msg8617 (view) |
Author: malte |
Date: 2019-03-05.23:10:04 |
|
I'm reasonably sure cmake has an interface for querying these, which would be a
cleaner way. This is the first hit I found; didn't investigate more deeply:
https://stackoverflow.com/questions/8474753/how-to-get-a-cmake-variable-from-the-command-line
|
msg8616 (view) |
Author: jendrik |
Date: 2019-03-05.17:55:08 |
|
In my setup the file builds/release/CMakeFiles/3.10.2/CMakeCXXCompiler.cmake
starts with the following lines:
set(CMAKE_CXX_COMPILER "/usr/bin/g++-5")
set(CMAKE_CXX_COMPILER_ARG1 "")
set(CMAKE_CXX_COMPILER_ID "GNU")
set(CMAKE_CXX_COMPILER_VERSION "5.5.0")
set(CMAKE_CXX_COMPILER_VERSION_INTERNAL "")
set(CMAKE_CXX_COMPILER_WRAPPER "")
What do you think about globbing
"builds/release/CMakeFiles/*/CMakeCXXCompiler.cmake"
to find the CMakeCXXCompiler.cmake file and then grepping the file for
CMAKE_CXX_COMPILER_ID and CMAKE_CXX_COMPILER_VERSION?
|
msg8615 (view) |
Author: malte |
Date: 2019-03-05.17:40:26 |
|
I think that's preferable. It's not great because we cannot be sure that "gcc"
is the compiler that was configured to perform this build. If we can make at
least a minimal effort to figure out which compiler was configured (by checking
environment variables like CC or whatever CMake's most basic mechanism for
determining the compiler is), this would be preferable. Perhaps it's possible to
directly query CMake which compiler it uses for a given build. That would be the
cleanest solution.
|
msg8614 (view) |
Author: jendrik |
Date: 2019-03-05.17:30:29 |
|
I agree that I was a little too quick to
follow stackoverflow here. Should I just test
the GCC version with -dumpversion instead?
|
msg8613 (view) |
Author: malte |
Date: 2019-03-05.17:17:58 |
|
I don't think using "strings" to grep through the binary for traces of GCC is
the right way to handle this.
|
msg8612 (view) |
Author: silvan |
Date: 2019-03-05.17:11:01 |
|
From my side, yes. (The changes are technical and I can't judge if they work
just from reading the file. So if they do, it's fine.)
|
msg8611 (view) |
Author: jendrik |
Date: 2019-03-05.15:53:04 |
|
The patch at https://bitbucket.org/jendrikseipp/downward/pull-requests/122
should fix the problem. (I reopened branch issue703 for this.) Can I merge it?
|
msg8606 (view) |
Author: malte |
Date: 2019-03-05.10:55:29 |
|
Setting this back to in-progress since it looks like we'll want further commits.
I'm not sure what the best place to work around bugs specific to particular gcc
and valgrind version combinations is. Keeping this in the Fast Downward
repository makes it a bit more "official", and I'm not sure how easy it is to
define the suppression in such a way that we can reasonably confident that it
will not lead to false negatives.
But I suppose either alternative works for me. If it ends up in the repository,
it should be very clear from the code which versions of the tools this is a
workaround for, and it should include the links from Jendrik's earlier message.
That way, it will be easier to take this out again once we stop supporting this
gcc version.
|
msg8602 (view) |
Author: jendrik |
Date: 2019-03-05.08:50:11 |
|
Unfortunately, adapting the Docker containers won't help users who run the tests
locally. I therefore suggest to bake the workaround from msg8587 into the
test-memory-leaks.py script.
|
msg8601 (view) |
Author: florian |
Date: 2019-03-05.08:16:44 |
|
Adding a file to some but not all Docker containers should be easy enough. We
already do this for the cplex installer. I can help with this once I'm in the
office.
|
msg8597 (view) |
Author: malte |
Date: 2019-03-04.22:47:33 |
|
I looked around a bit more in the gcc and valgrind trackers; looks good. Yes, a
suppression file would be nice. It would be good if we could use it *only* for
gcc5, if our setup allows that, either by customizing that docker container in
some way (probably the cleaner version), or by dynamically testing the version
using something like
if gcc -dumpversion | grep -q '^5\.'; then
do special gcc5 thing
fi
People with better shell-fu can probably replace the call to grep with some
shell-internal thing here, but I think this one wouldn't be too terrible either.
|
msg8596 (view) |
Author: jendrik |
Date: 2019-03-04.22:16:53 |
|
Here is what I found:
https://stackoverflow.com/questions/30393229/new-libstdc-of-gcc5-1-may-allocate-large-heap-memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66339
It seems that GCC 5 allocates memory that is not freed. This was fixed in a
later compiler version.
Should I let Valgrind generate a suppression file and apply it?
|
msg8595 (view) |
Author: malte |
Date: 2019-03-04.19:19:34 |
|
Doesn't "new" in most or at least many implementations end up calling "malloc",
so that this might well be an actual leak? So a filter that says one malloc leak
is fine doesn't seem safe to me. Are you confident of your diagnosis that
everything is alright and it isn't our code that is responsible? What is it
based on?
Even if it is the case that "our" code cannot cause complaints about malloc, our
code could trigger leaks that are reported as complaints about malloc, e.g. when
we use standard library functions that use malloc internally or when we create
memory leaks via the LP solver interface.
The clean way to handle things like these are valgrind suppression files. I hope
that it should be reasonably easy to define one here, but first I'd like to be
sure what is actually going on.
|
msg8594 (view) |
Author: jendrik |
Date: 2019-03-04.19:11:44 |
|
The new test works fine for our Clang and GCC-8 workers, but fails for GCC-5:
http://buildbot.fast-downward.org/#/builders/17/builds/301
The error is the same for all configurations. It seems that Valgrind reports its
own malloc memory allocation. What do you think about a solution that checks the
Valgrind output and ignores the error if both " malloc " and "ERROR SUMMARY: 1
errors from 1 contexts (suppressed: 0 from 0)" occur in the output? It should be
safe to ignore errors related to malloc since we don't use it ourselves. Of
course I'd be happy if someone knows a less hacky solution :-)
|
msg8593 (view) |
Author: malte |
Date: 2019-03-04.19:06:07 |
|
Sounds good. Since we don't have a lot of experience with these yet, it may be a
good idea for Florian to review the changes to the infrastructure repository,
but I'll leave it to the two of you whether you want to do that. No need to have
me in the loop.
|
msg8592 (view) |
Author: jendrik |
Date: 2019-03-04.18:56:37 |
|
You're right, thanks for the heads up! I forgot to push the changes and will do
so tomorrow from the build worker.
|
msg8591 (view) |
Author: malte |
Date: 2019-03-04.18:50:44 |
|
Doesn't this require updating the buildbot setup (docker/docker-compose recipes
etc.)?
|
msg8590 (view) |
Author: jendrik |
Date: 2019-03-04.16:19:09 |
|
I resolved the two TODOs from the summary and merged the code (the wiki simply
links to bitbucket-pipelines.yml to document the tools for automated tests, so
there wasn't anything to update there).
|
msg8578 (view) |
Author: jendrik |
Date: 2019-03-02.00:23:55 |
|
Awesome!
We still need to install valgrind on the buildbots, I guess, so I'll wait with
merging until I'm back in the office. (Added this to the summary.)
|
msg8576 (view) |
Author: malte |
Date: 2019-03-01.23:38:07 |
|
Yes, looks good to merge. Thanks!
|
msg8575 (view) |
Author: jendrik |
Date: 2019-03-01.23:32:23 |
|
I agree that it would be nice to get rid of more raw pointers and I can see
myself looking into this when time allows. However, I don't think this should
block making progress on this issue. Do you think we can merge the code?
|
msg8573 (view) |
Author: malte |
Date: 2019-03-01.22:34:22 |
|
No, no, in the paragraph "I think this is a useful addition, but...", focus more
on the first 7 words.
By saying I would prefer we shouldn't rely on valgrind I didn't mean we
shouldn't use it, just that we should also look at memory allocation more
systematically.
> Converting all raw pointers to std::unique_ptr will require quite some work,
> I think. In my opinion, this issue should only be about releasing all memory
> before the planner exits.
If getting rid of (more) manual new/deletes is too much work for now, I'm fine
with not doing it now. It's a bit of a pity because I think we already deferred
it once or twice.
|
msg8571 (view) |
Author: jendrik |
Date: 2019-03-01.20:19:56 |
|
I dealt with your comments on Bitbucket.
Judging by your msg8567, the issue title is wrong and we don't want to use
valgrind after all?
Converting all raw pointers to std::unique_ptr will require quite some work, I
think. In my opinion, this issue should only be about releasing all memory
before the planner exits.
|
msg8567 (view) |
Author: malte |
Date: 2019-03-01.17:21:47 |
|
I count two leaks and one false positive, but that's still an improvement. :-)
I think this would be a useful addition, but I also want to point out that these
were both intentionally implemented this way back when we didn't care yet about
releasing objects with (essentially) global lifetime because they caused no harm
and before unique_ptr and friends, it was harm to deal with these gracefully.
Now that we care more about this, I'd be much happier if we looked at these
legacy memory allocations systematically rather than relying on valgrind to
catch them.
Searching for the word "new" in the code finds about 30-40 uses of the operator
(unfortunately plus lots of occurrences of the word "new" in comments) and a
similar amount of uses of the operator "delete". Some of these are intentional,
but it seems that quite a few are not; from a quick glance, it sounds like
searching for these and using unique_ptrs etc. as appropriate would find the
same code issues and more.
|
msg8566 (view) |
Author: silvan |
Date: 2019-03-01.16:46:22 |
|
I left one comment. Otherwise, it looks fine to me.
|
msg8565 (view) |
Author: jendrik |
Date: 2019-03-01.15:14:34 |
|
I looked into this and created a pull request at
https://bitbucket.org/jendrikseipp/downward/pull-requests/122 .
It includes a test that runs valgrind for some "standard" configs. (I created
issue897 for updating the set of tested configurations.) The test uncovered
three memory leaks that are now fixed. At least for the tested combinations of
tasks and configurations, the planner doesn't have any memory leaks anymore.
|
msg5991 (view) |
Author: malte |
Date: 2016-12-24.15:07:39 |
|
Now that we're better about freeing memory, we should run valgrind to check that
there are no leaks. While we're at it, we should also look at other memory
issues that valgrind can detect. One problem is that any individual planner run
only covers a small amount of search algorithm, heuristics, etc., so we would
also need to think about code coverage. Perhaps this is also interesting for
other reasons (e.g. when testing whole-planner changes, where we currently test
more or less ad-hoc configs).
We might want to link dynamically for this. (See Silvan's comment at issue662.)
|
|
Date |
User |
Action |
Args |
2019-03-06 09:58:06 | jendrik | set | messages:
+ msg8624 |
2019-03-06 09:16:08 | malte | set | messages:
+ msg8623 |
2019-03-06 09:09:13 | jendrik | set | messages:
+ msg8622 |
2019-03-06 00:39:26 | malte | set | messages:
+ msg8621 |
2019-03-06 00:11:35 | jendrik | set | messages:
+ msg8619 |
2019-03-05 23:10:04 | malte | set | messages:
+ msg8617 |
2019-03-05 17:55:08 | jendrik | set | messages:
+ msg8616 |
2019-03-05 17:40:27 | malte | set | messages:
+ msg8615 |
2019-03-05 17:30:29 | jendrik | set | messages:
+ msg8614 |
2019-03-05 17:17:58 | malte | set | messages:
+ msg8613 |
2019-03-05 17:11:01 | silvan | set | messages:
+ msg8612 |
2019-03-05 15:53:04 | jendrik | set | messages:
+ msg8611 |
2019-03-05 10:55:29 | malte | set | messages:
+ msg8606 |
2019-03-05 08:50:11 | jendrik | set | messages:
+ msg8602 |
2019-03-05 08:16:44 | florian | set | messages:
+ msg8601 |
2019-03-04 22:47:33 | malte | set | messages:
+ msg8597 |
2019-03-04 22:16:53 | jendrik | set | messages:
+ msg8596 |
2019-03-04 19:19:34 | malte | set | messages:
+ msg8595 |
2019-03-04 19:11:44 | jendrik | set | messages:
+ msg8594 |
2019-03-04 19:06:07 | malte | set | messages:
+ msg8593 |
2019-03-04 18:56:37 | jendrik | set | messages:
+ msg8592 |
2019-03-04 18:50:44 | malte | set | messages:
+ msg8591 |
2019-03-04 16:19:09 | jendrik | set | status: in-progress -> resolved messages:
+ msg8590 summary: TODO: Install valgrind on buildbots.
TODO: Document valgrind dependency in the wiki. -> |
2019-03-02 00:23:55 | jendrik | set | status: reviewing -> in-progress messages:
+ msg8578 summary: TODO: Install valgrind on buildbots.
TODO: Document valgrind dependency in the wiki. |
2019-03-01 23:38:07 | malte | set | messages:
+ msg8576 |
2019-03-01 23:32:23 | jendrik | set | messages:
+ msg8575 |
2019-03-01 22:34:22 | malte | set | messages:
+ msg8573 |
2019-03-01 20:19:56 | jendrik | set | messages:
+ msg8571 |
2019-03-01 17:21:47 | malte | set | messages:
+ msg8567 |
2019-03-01 16:46:22 | silvan | set | messages:
+ msg8566 |
2019-03-01 15:14:34 | jendrik | set | status: chatting -> reviewing assignedto: jendrik messages:
+ msg8565 |
2017-01-09 11:05:53 | florian | set | nosy:
+ florian |
2017-01-03 10:58:30 | silvan | set | nosy:
+ silvan |
2016-12-27 19:33:25 | haz | set | nosy:
+ haz |
2016-12-24 16:21:10 | jendrik | set | nosy:
+ jendrik |
2016-12-24 15:07:40 | malte | set | messages:
+ msg5991 summary: Now that we're better about freeing memory, we should run valgrind to check that
there are no leaks. While we're at it, we should also look at other memory
issues that valgrind can detect. One problem is that any individual planner run
only covers a small amount of search algorithm, heuristics, etc., so we would
also need to think about code coverage. Perhaps this is also interesting for
other reasons (e.g. when testing whole-planner changes, where we currently test
more or less ad-hoc configs).
We might want to link dynamically for this. (See Silvan's comment at issue662.) -> (no value) |
2016-12-24 15:06:29 | malte | create | |