msg9728 (view) |
Author: florian |
Date: 2020-10-01.12:14:39 |
|
I created issue982 to deal with _GLIBCXX_DEBUG in LP builds.
|
msg9198 (view) |
Author: malte |
Date: 2020-02-13.12:56:30 |
|
I suggest we discuss the details of this live, there must be something I'm still missing.
|
msg9196 (view) |
Author: florian |
Date: 2020-02-13.09:38:52 |
|
Yes, that would be possible. I thought you wanted to detect whether the library is compatible. I think your suggestion is a bit drastic though. It would make it even more difficult to debug LP code and would make all assertions in the LP code redundant. If something breaks with an LP, I'd rather have the option to remove the _GLIBCXX_DEBUG flag or even use an OSI version compiled with the flag. Maybe we can add a CMake option instead that switches off _GLIBCXX_DEBUG and stop the build only if there is an LP solver and this option is not used?
|
msg9195 (view) |
Author: malte |
Date: 2020-02-12.16:00:02 |
|
Wouldn't it be as simple as "In a debug build, LP configurations must be disabled?" When you say "some of the library calls work without issue", it sounds like we don't know which one and we in general cannot trust LP code in a debug build. Maybe I am misunderstanding or misinterpreting something.
|
msg9194 (view) |
Author: florian |
Date: 2020-02-12.15:58:03 |
|
Not that I know of. The build does not complain and some of the library calls work without issue.
|
msg9193 (view) |
Author: malte |
Date: 2020-02-12.15:53:08 |
|
Is there any way to detect this and abort the build, give an error during runtime when LP code is used in the wrong context or something along those lines?
|
msg9192 (view) |
Author: florian |
Date: 2020-02-12.15:36:17 |
|
Roughly one year later I ran into a similar problem again and found this issue after another day of debugging. Again, the problem was that Fast Downward was compiled with _GLIBCXX_DEBUG while OSI was not. This can lead to strange and hard to debug problems like a library function returning a vector with a single element producing a vector with size 8972014882083734.
I am writing this here as a reminder: if you have to debug LP code, do *not* use a debug build! I hope I remember it this time or I will be back here next year.
|
msg8120 (view) |
Author: florian |
Date: 2018-11-28.12:46:03 |
|
Yes, this was in the middle of the ICAPS rush, so I stopped spending time on it
once I had a running solution. I'm still not completely sure what happened. Here
is what I observed:
In our adapter to OSI, we derive a class ErrorCatchingCoinMessageHandler from
the OSI class CoinMessageHandler and override a method checkSeverity to handle
CPLEX out-of-memory situations cleanly. For that, we compare the content of the
produced message to some constant strings and exit with the correct exit code if
it matches. If it doesn't match, we call checkSeverity of the base class
In a debug build, the content of the message was always empty in our overridden
method but had the correct content in the base class. The field I accessed is a
protected char[1000] and I printed it just before calling the base class
implementation and in the first line of the base class implementation (modifying
the OSI code). The first print showed it as an empty string, the second had the
message as expected. In the release build this didn't occur.
OSI was compiled without the flag and our debug build was compiled with it, so I
assume that the class is layed out differently in memory in the library code and
our code so accessing the protected field gets the wrong data. But this is the
point where I stopped looking into it.
|
msg8108 (view) |
Author: malte |
Date: 2018-11-28.08:05:03 |
|
@Florian: Gabi mentioned that the debug STL build caused you a day of work
because it messed up something. Is that right? If yes, would it be worth telling
everyone (e.g. on ai@... or downward-dev) what the problem was, so that others
won't run into the same problems? Is there a way to prevent these problems?
|
msg7869 (view) |
Author: silvan |
Date: 2018-10-03.19:09:06 |
|
I see. Then this is fine with me.
|
msg7868 (view) |
Author: malte |
Date: 2018-10-03.19:08:22 |
|
No, this is run-time checking, not compile-time checking. Compilation shouldn't
take significantly longer, although we didn't test this.
The factor 2-3 (or perhaps it's more like 5-10, we didn't really analyze it a
lot) is for *runtime*, not compile time. This should catch all out-of-bounds
errors for vectors, along with some but not all other errors regarding use of
invalidated iterators and similar. I think it is most useful in cases where we
previously did *not* get a segfault; it is not unusual for an out-of-bounds
access to just lead to a garbage result rather than any kind of immediate crash,
and this can of course be hard to detect and debug.
|
msg7867 (view) |
Author: silvan |
Date: 2018-10-03.19:04:22 |
|
But this only works for index violations that can be determined at compile time,
I suppose? If so, this doesn't sound like it would prevent getting segfaults if
you just use the wrong index into a vector. Or will the segfault that the
compiled program then also be more informative?
I also think that 2-3 times longer compilation is a very big difference; I
normally exclusively compile in debug when developing.
|
msg7866 (view) |
Author: malte |
Date: 2018-10-03.19:01:11 |
|
Many thanks!
Regarding the message to the group, I think we should generally strive to give
people enough information to make an informed opinion without spending a lot of
time on it, i.e., ideally based on the message alone.
In this case, I think it's important to give people some idea of the magnitude
of the speed difference. People might think of this differently depending on
whether the slowdown factor is 1.1, 10, 100 or 1000. (Of course everybody can do
their own research if they like, but that's not time-efficient for the group as
a whole.)
|
msg7865 (view) |
Author: jendrik |
Date: 2018-10-03.18:56:59 |
|
I merged this. As you said, this change could quickly be reverted.
|
msg7864 (view) |
Author: malte |
Date: 2018-10-03.18:51:19 |
|
I think the more comprehensive option is better. We don't need a fast debug build.
|
msg7863 (view) |
Author: jendrik |
Date: 2018-10-03.18:49:36 |
|
Quick note: There's also _GLIBCXX_ASSERTIONS, which is automatically enabled by
_GLIBCXX_DEBUG
(https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_macros.html). However,
_GLIBCXX_ASSERTIONS doesn't do container bounds checks as I just verified
locally.
|
msg7862 (view) |
Author: malte |
Date: 2018-10-03.18:44:12 |
|
Thanks, Jendrik! I've left a message over there. Sometimes bumping an issue
helps getting it resolved.
|
msg7860 (view) |
Author: jendrik |
Date: 2018-10-03.18:41:35 |
|
There's already an issue for the failing assertions: issue467.
|
msg7859 (view) |
Author: malte |
Date: 2018-10-03.18:03:24 |
|
Thanks, Jendrik! I don't think efficiency of the debug build is a big deal, and
of course there is a reason why these additional checks are disabled by default.
It looks like in some tasks runtimes grows by more than a factor of 5.
I would be in favour of enabling this flag, but again I suggest writing to
downward-dev, both to give people a chance to check if they like this despite
the runtime impact, and also to let them know what this change means for them in
practice (in terms of getting more useful error checking out of debug builds). I
think it would be fine to merge before waiting for feedback, though. We can
always undo the change if necessary.
The most interesting thing for me is that we got a bunch of assertion failures
in both experiments. It seems they are all related to landmarks, but someone
should double-check this. Can you look into these a little bit and open an issue
for them if we don't already have one?
|
msg7857 (view) |
Author: jendrik |
Date: 2018-10-03.17:47:51 |
|
Here are the results:
https://ai.dmi.unibas.ch/_tmp_files/seipp/issue837-v1-opt-issue837-base-issue837-v1-compare.html
https://ai.dmi.unibas.ch/_tmp_files/seipp/issue837-v1-sat-issue837-base-issue837-v1-compare.html
Unfortunately, coverage decreases significantly.
|
msg7843 (view) |
Author: jendrik |
Date: 2018-10-03.12:23:12 |
|
Yes, the errors are printed on stderr and Lab will report them as unexplained
errors.
Experiments are running.
|
msg7839 (view) |
Author: malte |
Date: 2018-10-02.20:02:19 |
|
Thanks, Jendrik! I would suggest running the usual experiments we do for
"whole-planner" changes here because there is a chance this will actually catch
some bugs. (It would also be interesting to see how much this slows down the code.)
We should also make sure that we actually see any warning that are produced in
the experiment reports. But I assume they are printed on stderr, and hence lab
will tell us about them, right?
I would suggest a before-after experiment, obviously only in debug mode, using a
5-minute timeout.
|
msg7838 (view) |
Author: jendrik |
Date: 2018-10-02.19:55:54 |
|
I tested Florian's suggestion locally and it works for GCC and Clang. When I
compile and run the following code
vector<int> test;
cout << test[4] << endl;
the following warning is printed
/usr/include/c++/5.4.0/debug/vector:409:error: attempt to subscript
container with out-of-bounds index 4, but container only holds 0
elements.
Objects involved in the operation:
sequence "this" @ 0x0x7ffc94412ec0 {
type = NSt7__debug6vectorIiSaIiEEE;
}
The (very small) patch is here:
https://bitbucket.org/jendrikseipp/downward/pull-requests/102
|
msg7639 (view) |
Author: malte |
Date: 2018-09-20.13:29:13 |
|
Thanks, Florian!
I think "-D" is equivalent to using #define in the code, which means that you
can define anything, and if the compiler doesn't need it, it won't care. So we
don't necessarily need a per-compiler distinction.
The compiler itself doesn't do anything with these, I think. They are solely
there to control #ifdefs etc. So whether or not this will do something in clang
depends on whether or not clang uses glibc or not. I think it does, right? I'm
not aware that there are two different stdlib implementations in common use in
Linux. But I'm not 100% sure.
|
msg7638 (view) |
Author: florian |
Date: 2018-09-20.13:26:09 |
|
It should not be hard to add to the CMake build. We could add it here if it can
be passed as a compiler flag (I saw it used with "-D", i.e., "-D_GLIBCXX_DEBUG"):
http://hg.fast-downward.org/file/f67b4916c871/src/cmake_modules/FastDownwardMacros.cmake#l21
If it doesn't work in clang, we would need another if block because the check in
line 7 treats clang and gcc the same.
|
msg7633 (view) |
Author: malte |
Date: 2018-09-20.12:44:03 |
|
vector::at should *always* do a bounds check (and raise an exception, not fail
an assert if it fails). It's part of the spec.
The only reason I could see against _GLIBCXX_DEBUG is that it might be difficult
to set up, either on the CMake side or by requiring additional installs. We'll
find out if we try it out. :-)
|
msg7631 (view) |
Author: guillem |
Date: 2018-09-20.12:33:10 |
|
Don't know why, but I had always assumed that "-DDEBUG" was enabling this type of
checks.
Ok, now I see that the confusion arose from the fact that "-DDEBUG" allows the the
asserts throughout the STL, which include a bounds check when using vector::at:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-
v3/include/bits/stl_vector.h#L1045
But not when using operator[], of course.
Looks to me like it would be good to enable _GLIBCXX_DEBUG in our debug mode, would
there be any strong argument against doing that?
|
msg7625 (view) |
Author: jendrik |
Date: 2018-09-20.11:27:26 |
|
We should think about whether we want to use the debug mode of libstdc++ in debug
builds to enable vector bounds checking among other things.
(https://stackoverflow.com/questions/38958523/safe-versions-of-stl-on-linux).
|
|
Date |
User |
Action |
Args |
2020-10-01 12:14:39 | florian | set | messages:
+ msg9728 |
2020-02-13 12:56:30 | malte | set | messages:
+ msg9198 |
2020-02-13 09:38:52 | florian | set | messages:
+ msg9196 |
2020-02-12 16:00:02 | malte | set | messages:
+ msg9195 |
2020-02-12 15:58:03 | florian | set | messages:
+ msg9194 |
2020-02-12 15:53:08 | malte | set | messages:
+ msg9193 |
2020-02-12 15:36:18 | florian | set | messages:
+ msg9192 |
2018-11-28 12:46:03 | florian | set | messages:
+ msg8120 |
2018-11-28 08:05:03 | malte | set | messages:
+ msg8108 |
2018-10-04 09:01:44 | cedric | set | nosy:
+ cedric |
2018-10-03 19:09:06 | silvan | set | messages:
+ msg7869 |
2018-10-03 19:08:22 | malte | set | messages:
+ msg7868 |
2018-10-03 19:04:22 | silvan | set | nosy:
+ silvan messages:
+ msg7867 |
2018-10-03 19:01:11 | malte | set | messages:
+ msg7866 |
2018-10-03 18:56:59 | jendrik | set | status: testing -> resolved messages:
+ msg7865 |
2018-10-03 18:51:19 | malte | set | messages:
+ msg7864 |
2018-10-03 18:49:36 | jendrik | set | messages:
+ msg7863 |
2018-10-03 18:44:12 | malte | set | messages:
+ msg7862 |
2018-10-03 18:41:35 | jendrik | set | messages:
+ msg7860 |
2018-10-03 18:03:24 | malte | set | messages:
+ msg7859 |
2018-10-03 17:47:51 | jendrik | set | messages:
+ msg7857 |
2018-10-03 12:23:12 | jendrik | set | status: reviewing -> testing messages:
+ msg7843 |
2018-10-02 20:02:19 | malte | set | messages:
+ msg7839 |
2018-10-02 19:55:54 | jendrik | set | status: chatting -> reviewing assignedto: jendrik messages:
+ msg7838 |
2018-09-20 13:29:13 | malte | set | messages:
+ msg7639 |
2018-09-20 13:26:09 | florian | set | nosy:
+ florian messages:
+ msg7638 |
2018-09-20 12:44:03 | malte | set | messages:
+ msg7633 |
2018-09-20 12:33:10 | guillem | set | status: unread -> chatting nosy:
+ guillem messages:
+ msg7631 |
2018-09-20 11:27:26 | jendrik | create | |