Summary: | sys-libs/glibc-2.8_p20080602-r1: ld-2.8.so reads from uninitialised memory regions | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Stephan Krauß <sycon42> |
Component: | [OLD] Core system | Assignee: | Gentoo Toolchain Maintainers <toolchain> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | dabbott, loki_val, math228a, mlcreech, nessuno, realnc |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | 274771 | ||
Bug Blocks: | |||
Attachments: |
Test code showing some misbehaviour
Test code showing some misbehaviour valgrind ldd emerge --info |
Description
Stephan Krauß
2009-04-01 22:23:45 UTC
Do: FEATURES="nostrip" emerge -1 glibc and see if this doesn't go away. *** This bug has been marked as a duplicate of bug 47576 *** I did already re-emerge it with added debug use-flag, -g cflag and nostrip feature. That's why you can see the function calls and source code lines. But just in case you changed something, I'm doing it again... I re-emerged it again and the problem is still there. Maybe this is a regression? (Sorry for not finding bug 47576.) That's weird, reopening. post some code that actually exhibits a problem. valgrind has a history of not being reliable with the ldso. Here is what I found out after a day of work: The bug in ld.so seems to get triggered only under very special circumstances. My setting is as follows: I have an archive with library code, which does some calculation. This archive is linked statically into a binary created with LLVM. (LLVM links in other libraries, does some instrumentation and compiles the code to a normal object file. After that the said archive is linked in.) When the resulting program is started, ld.so links dynamically some standard libraries and in doing so, it damages the code of the contained archive. (At least its calculations get corrupted. Its hard to track down what's actually going wrong. If I let some intermediate values being printed out, the final result changes. And no, its not my code's fault. Under Debian and Ubuntu, this doesn'nt happen.) I've tried to reproduce this behaviour without LLVM being involved -- without success. So it seems that ld.so has problems to handle the code generated by LLVM. Strangely, the same code works flawlessly, when using other distros. Well, not that surprising, as Valgrind does not find any reads from uninitialised memory there. I suppose, you won't bother yourself with LLVM. Unfortunately, I can't provide some non-LLVM code, as it seems to work fine if it is not involved. I understand, that you don't want to waste your time on a bug, that you can't be sure, that it is really there and even if it exists, won't affect most users. (At least I hope so.) I believe the best will be, if I test it with newer versions of the glibc-package and if that doesn't help I'll install a different Linux distribution. that's still a high level description, not a set of instructions like: - download XXX files - run XXX commands - see XXX misbehavior Created attachment 187298 [details]
Test code showing some misbehaviour
I finally figured out a way to reproduce it without LLVM. (It takes hour to compile LLVM. I don't want to do this to you.)
- download and extract the test archive
- run make to compile the example
- run ./test
- rename the emitted debug.log
- comment out or delete line 522 in calc.cpp (cout ...)
- re-run make and ./test
- make a diff of the old and new debug.log
- see different results at some places
Another way to get different results, is to run it with valgrind (and its memcheck tool). So seems to be somehow memory related. (Although I've spent a great deal in looking for reads from uninitialised memory and couldn't find some in my code.)
By the way, it makes no difference, where the cout line is put (as long as it is in the same method). It is also indifferent what you print out, or how often.
While it cut down the code, I sometimes even saw changes in the results of the calculation, as I removed some methods, although they were never called in this test.
By the way, you need boost installed.
Unfortunately, I was not able to produce a smaller piece of code yet, that shows this behaviour. The problem is that is hard to figure out, what's going on, if adding some output code changes the results. The strange thing is, that not every value is wrong. Thus, I'm still wondering if it my fault. But I have no idea, what I could do to the code, besides reading from uninitialised memory, that could result in such a strange behaviour.
Created attachment 187299 [details]
Test code showing some misbehaviour
Sorry for the wrong content type. Fixed it.
thanks for spending the time to put that together i really dont think these warnings from glibc have any bearing on your test case misbehavior whatsoever. they seem to be x86-specific ... or at least, valgrind doesnt whine on x86_64. and the same warning exists in glibc-2.9. I'm so sorry. I was totally wrong. This hasn't to do anything with ld.so. When I tested my code on non-Gentoo machines, I also compiled it on them. (And had no issues.) But if I use the binary built under Gentoo, I see the same strange behaviour on those non-Gentoo machines. Thus, it is clearly a compiler issue and ld.so is not to blame for it. I will move to the recently stabilized GCC asap. Hope that will fix it. After an upgrade, my code: --- program.cpp --- int main(void) { return 0; } --- program.cpp --- $ g++ -g program.cpp $ valgrind ./a.out produced the error, until i did: # emerge -C valgrind # emerge =valgrind-3.3.1 # after this I get no errors # emerge -C valgrind # emerge valgrind # means version 3.4.0 after the reinstall I can't reproduce it. (with gcc 4.1.2 on a Core2Duo in x86 mode (not 64 bit).) yes, on my x86 systems where i could reproduce, i rebuilt glibc and no longer get the valgrind warnings *** Bug 271596 has been marked as a duplicate of this bug. *** I get a long list of errors with *any* executable in valgrind. For example "valgrind ls" or "valgrind ldd" (ldd is a static executable). Or anything else. This is with glibc-2.10.1. Although it's a different glibc version that the version this bug is about, I'm not opening a new bug since someone already did (but 271596) but it got marked as duplicate of this one. I'm attaching the errors and my emerge --info. Created attachment 194570 [details]
valgrind ldd
Created attachment 194571 [details]
emerge --info
OK, it seems I found the solution. Valgrind can't cope with the new, sse-optimized strlen function of glibc 2.10.1 on amd64. Fedora is applying a patch for this: valgrind-3.4.1-x86_64-ldso-strlen.patch as well as another, glibc 2.10.1 specific one: valgrind-3.4.1-glibc-2.10.1.patch Those patches can be found at: http://cvs.fedoraproject.org/viewvc/rpms/valgrind/F-11 However, the strlen patch (the one we need) does not work with Gentoo unless glibc is emerged with debug symbols enabled: valgrind: Fatal error at startup: a function redirection valgrind: which is mandatory for this platform-tool combination valgrind: cannot be set up. Details of the redirection are: valgrind: valgrind: A must-be-redirected function valgrind: whose name matches the pattern: strlen valgrind: in an object with soname matching: ld-linux-x86-64.so.2 valgrind: was not found whilst processing valgrind: symbols from the object with soname: ld-linux-x86-64.so.2 valgrind: valgrind: Possible fix: install glibc's debuginfo package on this machine. valgrind: valgrind: Cannot continue -- exiting now. Sorry. could you file a new bug about that (as that would be assigned to the valgrind maintainer) and have it block this bug please ? (In reply to comment #20) > could you file a new bug about that (as that would be assigned to the valgrind > maintainer) and have it block this bug please ? Done. |