ld-2.8.so reads from uninitialised variables while doing dynamic linking. A program of mine therefore computes wrong results. (But I suppose this affects all programs, which are linked dynamically.) Originally, I thought my program would read from uninitialised memory regions, but using Valgrind I found out, that it is actually the fault of ld.so. (I tested it under an Ubuntu installation, where it computes everything correctly. So this is really the linker's fault.) Reproducible: Always Steps to Reproduce: 1. Make sure you have glibc-2.8-2.8_p20080602-r1 installed. 2. Run a program with Valgrind or gdb. For example: valgrind ls 3. Watch it doing indeterministic jumps. Actual Results: The relevant parts of the Valgrind output: ==15748== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0) ==15748== ==15748== 1 errors in context 1 of 7: ==15748== Conditional jump or move depends on uninitialised value(s) ==15748== at 0x400ACA4: _dl_relocate_object (do-rel.h:117) ==15748== by 0x4003F4E: dl_main (rtld.c:2304) ==15748== by 0x4014225: _dl_sysdep_start (dl-sysdep.c:239) ==15748== by 0x400138E: _dl_start (rtld.c:330) ==15748== by 0x4000986: (within /lib/ld-2.8.so) ==15748== Uninitialised value was created by a stack allocation ==15748== at 0x400AA66: _dl_relocate_object (dl-reloc.c:142) ==15748== ==15748== 1 errors in context 2 of 7: ==15748== Conditional jump or move depends on uninitialised value(s) ==15748== at 0x400AB61: _dl_relocate_object (do-rel.h:68) ==15748== by 0x4003F4E: dl_main (rtld.c:2304) ==15748== by 0x4014225: _dl_sysdep_start (dl-sysdep.c:239) ==15748== by 0x400138E: _dl_start (rtld.c:330) ==15748== by 0x4000986: (within /lib/ld-2.8.so) ==15748== Uninitialised value was created by a stack allocation ==15748== at 0x400AA66: _dl_relocate_object (dl-reloc.c:142) ==15748== ==15748== 1 errors in context 3 of 7: ==15748== Conditional jump or move depends on uninitialised value(s) ==15748== at 0x400AB59: _dl_relocate_object (do-rel.h:65) ==15748== by 0x4003F4E: dl_main (rtld.c:2304) ==15748== by 0x4014225: _dl_sysdep_start (dl-sysdep.c:239) ==15748== by 0x400138E: _dl_start (rtld.c:330) ==15748== by 0x4000986: (within /lib/ld-2.8.so) ==15748== Uninitialised value was created by a stack allocation ==15748== at 0x400AA66: _dl_relocate_object (dl-reloc.c:142) ==15748== ==15748== 1 errors in context 4 of 7: ==15748== Conditional jump or move depends on uninitialised value(s) ==15748== at 0x400ACA4: _dl_relocate_object (do-rel.h:117) ==15748== by 0x400410D: dl_main (rtld.c:2234) ==15748== by 0x4014225: _dl_sysdep_start (dl-sysdep.c:239) ==15748== by 0x400138E: _dl_start (rtld.c:330) ==15748== by 0x4000986: (within /lib/ld-2.8.so) ==15748== Uninitialised value was created by a stack allocation ==15748== at 0x400AA66: _dl_relocate_object (dl-reloc.c:142) ==15748== ==15748== 1 errors in context 5 of 7: ==15748== Conditional jump or move depends on uninitialised value(s) ==15748== at 0x400B019: _dl_relocate_object (do-rel.h:104) ==15748== by 0x400410D: dl_main (rtld.c:2234) ==15748== by 0x4014225: _dl_sysdep_start (dl-sysdep.c:239) ==15748== by 0x400138E: _dl_start (rtld.c:330) ==15748== by 0x4000986: (within /lib/ld-2.8.so) ==15748== Uninitialised value was created by a stack allocation ==15748== at 0x400AA66: _dl_relocate_object (dl-reloc.c:142) ==15748== ==15748== 1 errors in context 6 of 7: ==15748== Conditional jump or move depends on uninitialised value(s) ==15748== at 0x400AB61: _dl_relocate_object (do-rel.h:68) ==15748== by 0x400410D: dl_main (rtld.c:2234) ==15748== by 0x4014225: _dl_sysdep_start (dl-sysdep.c:239) ==15748== by 0x400138E: _dl_start (rtld.c:330) ==15748== by 0x4000986: (within /lib/ld-2.8.so) ==15748== Uninitialised value was created by a stack allocation ==15748== at 0x400AA66: _dl_relocate_object (dl-reloc.c:142) ==15748== ==15748== 1 errors in context 7 of 7: ==15748== Conditional jump or move depends on uninitialised value(s) ==15748== at 0x400AB59: _dl_relocate_object (do-rel.h:65) ==15748== by 0x400410D: dl_main (rtld.c:2234) ==15748== by 0x4014225: _dl_sysdep_start (dl-sysdep.c:239) ==15748== by 0x400138E: _dl_start (rtld.c:330) ==15748== by 0x4000986: (within /lib/ld-2.8.so) ==15748== Uninitialised value was created by a stack allocation ==15748== at 0x400AA66: _dl_relocate_object (dl-reloc.c:142) ==15748== IN SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0) Expected Results: No errors. Neither in Valgrind output, nor in my program. The obligatory emerge --info: Portage 2.1.6.7 (default/linux/x86/2008.0/desktop, gcc-4.1.2, glibc-2.8_p20080602-r1, 2.6.27-gentoo-r8 i686) ================================================================= System uname: Linux-2.6.27-gentoo-r8-i686-Intel-R-_Pentium-R-_M_processor_2.00GHz-with-glibc2.0 Timestamp of tree: Wed, 01 Apr 2009 21:00:16 +0000 app-shells/bash: 3.2_p39 dev-java/java-config: 2.1.7 dev-lang/python: 2.5.2-r7 dev-util/cmake: 2.4.8 sys-apps/baselayout: 1.12.11.1 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.63 sys-devel/automake: 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.27-r2 ACCEPT_KEYWORDS="x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=pentium-m -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-O2 -march=pentium-m -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://files.gentoo.org http://files.gentoo.org" LANG="de_DE.UTF-8" LC_ALL="de_DE.UTF-8" LDFLAGS="-Wl,-O1" LINGUAS="de" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X a52 aac acl acpi alsa berkdb bluetooth bzip2 cairo cdr cli cracklib crypt cups dbus dri dvd dvdr dvdread emboss encode exif fam firefox fortran gdbm gif gnutls gpm gstreamer gtk hal iconv ipv6 isdnlog java jpeg jpeg2k laptop libnotify mad midi mikmod mmx mp3 mpeg mudflap ncurses nls nptl nptlonly ogg opengl openmp pam pcre pdf perl png ppds pppd python qt3support quicktime readline reflection sdl session spl sse sse2 ssl startup-notification svg sysfs tcpd tiff truetype unicode usb vorbis win32codecs x264 x86 xcomposite xml xorg xulrunner xv xvid xvmc zlib" ALSA_CARDS="intel8x0 intel8x0m" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" USERLAND="GNU" VIDEO_CARDS="fbdev radeon vesa vga vmware" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
Do: FEATURES="nostrip" emerge -1 glibc and see if this doesn't go away. *** This bug has been marked as a duplicate of bug 47576 ***
I did already re-emerge it with added debug use-flag, -g cflag and nostrip feature. That's why you can see the function calls and source code lines. But just in case you changed something, I'm doing it again...
I re-emerged it again and the problem is still there. Maybe this is a regression? (Sorry for not finding bug 47576.)
That's weird, reopening.
post some code that actually exhibits a problem. valgrind has a history of not being reliable with the ldso.
Here is what I found out after a day of work: The bug in ld.so seems to get triggered only under very special circumstances. My setting is as follows: I have an archive with library code, which does some calculation. This archive is linked statically into a binary created with LLVM. (LLVM links in other libraries, does some instrumentation and compiles the code to a normal object file. After that the said archive is linked in.) When the resulting program is started, ld.so links dynamically some standard libraries and in doing so, it damages the code of the contained archive. (At least its calculations get corrupted. Its hard to track down what's actually going wrong. If I let some intermediate values being printed out, the final result changes. And no, its not my code's fault. Under Debian and Ubuntu, this doesn'nt happen.) I've tried to reproduce this behaviour without LLVM being involved -- without success. So it seems that ld.so has problems to handle the code generated by LLVM. Strangely, the same code works flawlessly, when using other distros. Well, not that surprising, as Valgrind does not find any reads from uninitialised memory there. I suppose, you won't bother yourself with LLVM. Unfortunately, I can't provide some non-LLVM code, as it seems to work fine if it is not involved. I understand, that you don't want to waste your time on a bug, that you can't be sure, that it is really there and even if it exists, won't affect most users. (At least I hope so.) I believe the best will be, if I test it with newer versions of the glibc-package and if that doesn't help I'll install a different Linux distribution.
that's still a high level description, not a set of instructions like: - download XXX files - run XXX commands - see XXX misbehavior
Created attachment 187298 [details] Test code showing some misbehaviour I finally figured out a way to reproduce it without LLVM. (It takes hour to compile LLVM. I don't want to do this to you.) - download and extract the test archive - run make to compile the example - run ./test - rename the emitted debug.log - comment out or delete line 522 in calc.cpp (cout ...) - re-run make and ./test - make a diff of the old and new debug.log - see different results at some places Another way to get different results, is to run it with valgrind (and its memcheck tool). So seems to be somehow memory related. (Although I've spent a great deal in looking for reads from uninitialised memory and couldn't find some in my code.) By the way, it makes no difference, where the cout line is put (as long as it is in the same method). It is also indifferent what you print out, or how often. While it cut down the code, I sometimes even saw changes in the results of the calculation, as I removed some methods, although they were never called in this test. By the way, you need boost installed. Unfortunately, I was not able to produce a smaller piece of code yet, that shows this behaviour. The problem is that is hard to figure out, what's going on, if adding some output code changes the results. The strange thing is, that not every value is wrong. Thus, I'm still wondering if it my fault. But I have no idea, what I could do to the code, besides reading from uninitialised memory, that could result in such a strange behaviour.
Created attachment 187299 [details] Test code showing some misbehaviour Sorry for the wrong content type. Fixed it.
thanks for spending the time to put that together
i really dont think these warnings from glibc have any bearing on your test case misbehavior whatsoever. they seem to be x86-specific ... or at least, valgrind doesnt whine on x86_64. and the same warning exists in glibc-2.9.
I'm so sorry. I was totally wrong. This hasn't to do anything with ld.so. When I tested my code on non-Gentoo machines, I also compiled it on them. (And had no issues.) But if I use the binary built under Gentoo, I see the same strange behaviour on those non-Gentoo machines. Thus, it is clearly a compiler issue and ld.so is not to blame for it. I will move to the recently stabilized GCC asap. Hope that will fix it.
After an upgrade, my code: --- program.cpp --- int main(void) { return 0; } --- program.cpp --- $ g++ -g program.cpp $ valgrind ./a.out produced the error, until i did: # emerge -C valgrind # emerge =valgrind-3.3.1 # after this I get no errors # emerge -C valgrind # emerge valgrind # means version 3.4.0 after the reinstall I can't reproduce it. (with gcc 4.1.2 on a Core2Duo in x86 mode (not 64 bit).)
yes, on my x86 systems where i could reproduce, i rebuilt glibc and no longer get the valgrind warnings
*** Bug 271596 has been marked as a duplicate of this bug. ***
I get a long list of errors with *any* executable in valgrind. For example "valgrind ls" or "valgrind ldd" (ldd is a static executable). Or anything else. This is with glibc-2.10.1. Although it's a different glibc version that the version this bug is about, I'm not opening a new bug since someone already did (but 271596) but it got marked as duplicate of this one. I'm attaching the errors and my emerge --info.
Created attachment 194570 [details] valgrind ldd
Created attachment 194571 [details] emerge --info
OK, it seems I found the solution. Valgrind can't cope with the new, sse-optimized strlen function of glibc 2.10.1 on amd64. Fedora is applying a patch for this: valgrind-3.4.1-x86_64-ldso-strlen.patch as well as another, glibc 2.10.1 specific one: valgrind-3.4.1-glibc-2.10.1.patch Those patches can be found at: http://cvs.fedoraproject.org/viewvc/rpms/valgrind/F-11 However, the strlen patch (the one we need) does not work with Gentoo unless glibc is emerged with debug symbols enabled: valgrind: Fatal error at startup: a function redirection valgrind: which is mandatory for this platform-tool combination valgrind: cannot be set up. Details of the redirection are: valgrind: valgrind: A must-be-redirected function valgrind: whose name matches the pattern: strlen valgrind: in an object with soname matching: ld-linux-x86-64.so.2 valgrind: was not found whilst processing valgrind: symbols from the object with soname: ld-linux-x86-64.so.2 valgrind: valgrind: Possible fix: install glibc's debuginfo package on this machine. valgrind: valgrind: Cannot continue -- exiting now. Sorry.
could you file a new bug about that (as that would be assigned to the valgrind maintainer) and have it block this bug please ?
(In reply to comment #20) > could you file a new bug about that (as that would be assigned to the valgrind > maintainer) and have it block this bug please ? Done.