I tried to install a new version of glibc to try out what would happen if I used the nptlonly flag. Well, this is what happened (the last lines of the build log, unfortunately the only things I saved. I can, however, remember I did *not* see anything getting installed in /lib/tls): >>> /sbin/ldconfig !!! FAILED postinst: 2816 I don't have the full build log anymore since I crashed my computer without saving it, and am on Knoppix now (hint for the future me: Don't copy libc.so.6 from a different install on your box over the /lib/tls/libc.so.6 that's currently in use. It makes your box crash immediately and go in a strange HD-using loop). --- from here to the next --- we're still in Gentoo and the system hasn't crashed yet I tried to check out what was broken and what wasn't. I didn't get a very coherent picture. xterm, mozilla, tar, bash and ls all segfaulted, but mv, cp, mpg123, which, ld, ldd and ldconfig didn't. Checking with ldd showed that all of my programs (the ones I checked out, anyway) did resolve "libc.so.6 => /lib/tls/libc.so.6". On IRC, I went to #gentoo, where marienz asked me if running ldconfig would help. I tried it, and it didn't change anything. --- now I try to replace the running /lib/tls/libc.so.6 and the aforementioned further breakage occurs --- I tried out "strace -f chroot /mnt/hda1" (where /mnt/hda1 is the root of my normal Gentoo installation) from inside Knoppix (which is using a 2.6.7 kernel). It segfaulted. Log will be attached. I got my system to work partly by going into /mnt/hda1/lib, renaming tls to tls-bak and symlinking . to tls. With this, I could chroot into the system normally (*). It got at least bash, tar, xterm and ls to work - not mozilla, though. Portage, however, freezes when I try to use it. strace output of the last few lines of "emerge info" will be attached. (*) I could already do it with "chroot /mnt/hda1 env LD_PRELOAD=/lib/libc.so.6 bash, though that made only bash work - all the other stuff that used to segfault, kept segfaulting If there's a way to gather more information about exactly what that "FAILED postinst: 2816" message meant and what caused it, I'd appreciate knowing about it, and if there isn't, I'd *really* appreciate it if someone added something for it into Portage... (assuming, of course, that what happened wasn't that Portage tried to output useful debugging information but couldn't because glibc was broken). Other than that, I'm thinking of getting a working glibc from http://dev.gentoo.org/~avenj/bins/ but since I can do everything I *need* to do with my computer from inside Knoppix at the moment, I'm not in a hurry to break my system further by trying to fix it... Reproducible: Didn't try Steps to Reproduce:
Created attachment 42767 [details] strace -f chroot .
Created attachment 42768 [details] emerge --info It turned out from using strace -f instead of just strace that the reason why emerge info failed wasn't the glibc but a broken /dev/null (which I'd shamefully ignored for quite a while since udev would normally take over /dev early in the bootup anyway so it didn't cause any problems). I fixed that, so instead of attaching a strace log of what supposedly was broken, I'm attaching the output of emerge info. I seem to have two glibcs :).
I've also fallen foul of this, but haven't yet had a chance to try getting my system to a diagnosable state. It's pretty critical, and the ebuild should probably be pulled until it can be fixed... Could it have anything to do with prelinking or nptl? If anyone has any suggestions for attempting to recover from a completely segfaulting system, I'd appreciate it. I'm wondering whether simply manually completing the installation would help? Perhaps moving files in a particular order breaks something? Anyway, any advice anyone can offer would be helpful... Mike 5:)
I now have two broken machines (they were upgrading at the same time) thanks to this bug. Please pull the ebuild.
A quick update, by following the earlier instructions to move out /lib/tls and symlink (or actually to avoid problems later, copying the contents of) /lib into it's place, I've got back up a 99.9% working system. I'm currently re-emerging the same version to see if works better with half of it having been preinstalled. Chances are it'll fail, and I'll step back to an earlier version (interestingly the 20041006 worked fine, but is now unavailable, so I guess I'm gonna have to step back a couple months). I've included my emerge --info in case it will help anyone (note, since I haven't fixed it entirely, I also have two version of glibc): Portage 2.0.51-r2 (default-linux/x86/2004.2, gcc-3.3.4, glibc-2.3.4.20041006-r0,glibc-2.3.4.20041021-r0, 2.6.9 i686) ================================================================= System uname: 2.6.9 i686 AMD Athlon(tm) processor Gentoo Base System version 1.6.4 distcc 2.18 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]ccache version 2.3 [enabled] Autoconf: sys-devel/autoconf-2.59-r5 Automake: sys-devel/automake-1.8.5-r1 Binutils: sys-devel/binutils-2.15.92.0.2-r1 Headers: sys-kernel/linux26-headers-2.6.8.1-r1 Libtools: sys-devel/libtool-1.5.2-r5 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-O3 -mcpu=athlon -march=athlon -funroll-loops -pipe" CHOST="i686-pc-linux-gnu" COMPILER="" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O3 -mcpu=athlon -march=athlon -funroll-loops -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache distlocks sandbox sfperms" GENTOO_MIRRORS="http://www.mirrorservice.org/sites/www.ibiblio.org/gentoo/" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage /usr/local/overlays/bmg-overlay /usr/local/overlays/freedesktop" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="X aalib adns alsa apm avi berkdb bitmap-fonts cdr crypt cups directfb divx4linux dvd encode f77 fam flac foomaticdb gdbm gif gnome gstreamer gtk gtk2 guile ieee1394 imagemagick imlib java jpeg junit ldap libg++ libwww mad mikmod motif mozilla mpeg ncurses nls nptl nptlonly odbc oggvorbis opengl oss pam pda pdflib perl pic png postgres python quicktime readline sdl slang spell sqlite ssl svga tcpd tetex tiff truetype x86 xml xml2 xmms xprint xv xvid zlib"
The system currently hangs on boot after printing the "Booting..." line, so I can't provide any other information. I'm looking around for a cd to boot from to fix some of this. I don't remember all my USE flags, but I do know that I had nptl and nptlonly. Interestingly I am typing this on a machine that this affects. Firefox appears to be running fine (it was launched before this), but everything else seems to segfault.
I emerged the new glibc with no problem. I then noticed that it didn't fix any of the DNS related bugs in 20041006 so I regressed to 20040408 which failed with the same >>> /sbin/ldconfig !!! FAILED postinst: 2816 now the system segfaults on everything.
Using Ari's /lib/tls/ trick seems to have recovered this machine. I am currently re-remergeing glic with nptlonly turned off to see if it's different. If it still breaks, I will try downgrading -- though that doesn't sound like it'll work either.
So, just some more results to post back. Having replaced my /lib/tls directory with a copy of lib I was got back up and working, and recompiled the *same* version of glibc (20041021) and it seemed to emerge fine this time, without changes to the nptlonly flag or anything. It also appears that the /lib/tls directory does not belong to this new glibc package (or at least so says qpkg). I have no idea what the tls directory does or is for, but it does definitely seem to be the root of the problem. Qpkg has just told me that it's now not registered with any package, so I'm going to delete it and assume that it is a fix for the problem. Revdep-rebuild has not shown any dependencies on it, and all of my system seems to be working as normal. I hope this helps some people out, and allows the developers to identify what the problem is. This build should probably still be masked. It's pretty scarey losing your system, even if it can be quickly fixed with a bootable cd (gentoo livecd, knoppix, whatever)...
I was about to comment that re-emerging with nptlonly off fixed it, but apparently just re-emerging would have been enough.
Created attachment 42810 [details, diff] glibc-ignore-nptlonly.patch I'm going to update the ebuild with this diff once I've confirmed that nptlonly is definately the problem. toolchain/glibc guys, sorry, this is a really messy ebuild hack. But I can tell you that being in the situation where 99% of binaries won't run really isn't much fun. And package.masking this apparently isn't an issue, that would force a downgrade to the 200408xx version for people who are successfully running the 200410xx version (e.g. they didn't use nptlonly). Hope you can understand!
Created attachment 42811 [details, diff] glibc-ignore-nptlonly.patch Sigh..getting the diff the right way round would help. And to clarify about the package.mask thing, I meant to go on to say that forcing a downgrade to a version that is so much older would apparently break lots of things.
Just spoken with Lv on irc, I won't be commiting this. We don't know how it could be nptlonly causing this (due to strangness of /lib/tls). Plus, after you hit the problem, remerging *will* fix it no matter if nptlonly is set or not. Strange stuff..
can anyone with this problem please give me an `ls -lh /lib/tls/` on a broken system?
This is an ls -lh of the directory that I moved away, it is in the same state it was after the broken emerge. Hope it helps... total 1.6M -rwxr-xr-x 1 root root 1.2M Oct 25 05:17 libc-2.3.4.so lrwxrwxrwx 1 root root 13 Oct 25 05:17 libc.so.6 -> libc-2.3.4.so -rwxr-xr-x 1 root root 154K Oct 25 05:17 libm-2.3.4.so lrwxrwxrwx 1 root root 13 Oct 25 05:23 libm.so.6 -> libm-2.3.4.so -rwxr-xr-x 1 root root 163K Oct 25 05:17 libpthread-2.3.4.so lrwxrwxrwx 1 root root 19 Oct 25 05:17 libpthread.so.0 -> libpthread-2.3.4.so-rwxr-xr-x 1 root root 34K Oct 25 05:17 librt-2.3.4.so lrwxrwxrwx 1 root root 14 Oct 25 05:17 librt.so.1 -> librt-2.3.4.so -rwxr-xr-x 1 root root 36K Oct 25 05:17 libthread_db-1.0.so lrwxrwxrwx 1 root root 19 Oct 25 05:17 libthread_db.so.1 -> libthread_db-1.0.so
just to explain, /lib/tls/ shouldnt exist at all with USE=nptlonly, and i have no idea why it would exist at all for some of you. the older glibc ebuilds didnt create it at all, regardless of USE... /lib/tls/ is for the nptl-specific libs if and only if /lib/ is currently being used for linuxthreads-enabled versions.
If it's of any help, this was the first time I'd seen the nptlonly USE flag, so I decided to set it. If it existed in the 1006 version, then it's entirely possible I compiled with dual support (possibly creating the tls directory in the process)?
ok, so if we have this right after some discussion on irc, it seems that this is a bug in migrating from USE="nptl" after installing one of the newer glibc ebuilds to USE="nptl nptlonly" with those same ebuilds. 1) the new version without /lib/tls is installed to root 2) the /lib/tls stuff from the previous merge breaks, but is still used... since it's still there 3) postinst fails due to 2 4) due to postinst failing, we never make it as far as removing /lib/tls 5) bork bork bork Daniel is adding a check to prevent others from hitting this bug until we figure out how to prevent it. Many apologies to anyone who had their install hosed.
Check is now in CVS. The ebuild will simply exit if a nptl --> nptlonly migration is detected. This is only a temporary thing, until we figure out how we can solve this ugly bug properly.
Well, I, as well, ran across this yeaterday and using the /lib/tls trick from Ari (thank you very much) was able to emerge an older version (20040808-r1) of glibc. Saw the updates today and reading that as a problem with nptl and nptlonly, I did an emerge sync, took out nptlonly and did an emerge glibc. Big mistake. Here is the tail end of the emerge output: --- !targe sym /lib/libnss_compat.so.2 --- !targe sym /lib/libnsl.so.1 --- !targe sym /lib/libm.so.6 --- !targe sym /lib/libdl.so.2 --- !targe sym /lib/libcrypt.so.1 --- !targe sym /lib/libc.so.6 --- !targe sym /lib/libanl.so.1 --- !targe sym /lib/libBrokenLocale.so.1 --- !targe sym /lib/ld-linux.so.2 >>> Regenerating /etc/ld.so.cache... * Caching service dependencies ... ls: relocation error: /lib/libpthread.so.0: symbol errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference * No scripts to process! bash: /var/lib/init.d/depcache: No such file or directory * Failed to cache service dependencies [ !! ]>>> Regenerating /etc/ld.so.cache... * Caching service dependencies ... ls: relocation error: /lib/libpthread.so.0: symbol errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference * No scripts to process! bash: /var/lib/init.d/depcache: No such file or directory * Failed to cache service dependencies [ !! ]>>> Auto-cleaning packages ... >>> No outdated packages were found on your system. * Regenerating GNU info directory index... * Processed 325 info files. Running things like ls or emerge gives the error: <program name>: relocation error: /lib/libpthread.so.0: symbol errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference Things currently running (gnome, gnome-terminal, firefox, etc) seem to still work. I'm afraid to reboot but I don't know how to get out of this. I'll poke around a bit and see if I come across an out. Any suggestions would be appreciated.
One other thing. I tried moving away /lib/tls symlinking /lib to /libb/tls but I still have this error.
I also experienced this problem using the nptlonly flag to the point I couldn't even boot. As (implicidly) suggested by comment #18 I simply deleted /lib/tls which fixed all my problems.
ok folks, I have 3 broken machines now. - 1st is booting now again with a copied over glibc from another machine, but still not perfect - 2rd is fully dead - 3rd is working mostly, but I can't start some apps. what should I do, to get fully working machines again? downgrade to previous glibc, remerging current glibc, using -nptlonly or +nptlonly? PLEASE gimme a hint! ;)
If you compiled glibc with "USE=nptlonly" then boot from CD/bootfloppy and "mv /lib/tls /root/backup".
I have problems with glibc 2.3.4.20041021 (nptl) : amarok and xine-lib fails with : Inconsistency detected by ld.so: ../sysdeps/generic/dl-tls.c: 72: _dl_next_tls_modid: Assertion `result <= _rtld_local._dl_tls_max_dtv_idx' failed! Tested with nptlonly and without Reverting to 2.3.4.20041006 with nptlonly solves all my problems (I have broken my gentoo while downgrading, during the merge, so I used a glibc backup then reemerge glibc-2.3.4.20041006)
Well, I managed to completely hose my system by trying to swap to a different /li (yes, I did mv /lib to /lib.bad and yes, I should not have done that since nothing works after that and yes I will never do that again). A good learning experience to say the least. I had an old /lib on my second drive so recovered by booting off the Gentoo livecd and copying the old /lib off my second HD to my working system /lib (remember remember compiling glibc with nptl and without nptlonly cause many commands, including emerge, to not work for me so I was trying to get a workable /lib on my system). This allowed me chroot and emerge an old version of glibc (20040808-r1 was the most current that worked for me). Note, during the reboot I noticed an error about libproc-3.2.2.so missing. Not sure where this came from. When I get to a point where I can afford some down time I'll try 20041021 with use='nptl nptlonly' and then mv /lib/tls away and reboot.
i'll make the preinst delete the /lib/tls directory if present to prevent any further problems. (this should be fine, since there are libraries in /lib to fall back on) hopefully fixed in glibc 2.3.4.20041102, re-open if it is not
I have run into this problem with 20041102 on one of my systems. I've checked two other machines the the existance of /lib/tls after the update to 20041102 and it's still there.
I just hit this on the Pegasos trying to upgrade to glibc-2.3.4.20041102 from glibc-2.3.3.20040420-r2. Neither nptl nor nptlonly has ever been in my USE flags, and running /lib/libc.so.6 returns: GNU C Library 20041102 release version 2.3.4, by Roland McGrath et al. Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7). Compiled on a Linux 2.4.22 system on 2005-01-04. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others linuxthreads-0.10 by Xavier Leroy The C stubs add-on version 2.1.2. GNU Libidn by Simon Josefsson BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk software FPU emulation by Richard Henderson, Jakub Jelinek and others Thread-local storage support included. For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. Also, I don't have /lib/tls or anything along those lines.
OK, ignore my comment -- spanky suggested it's likely unrelated.
closing bug ;)