Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 69258 - glibc-2.3.4.20041021 postinst failed, now most programs segfault immediately
Summary: glibc-2.3.4.20041021 postinst failed, now most programs segfault immediately
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-10-28 06:25 UTC by Ari Rahikkala
Modified: 2005-01-04 18:05 UTC (History)
8 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
strace -f chroot . (strace--f-chroot-..log,9.34 KB, text/plain)
2004-10-28 06:27 UTC, Ari Rahikkala
Details
emerge --info (emerge--info.log,1.89 KB, text/plain)
2004-10-28 06:38 UTC, Ari Rahikkala
Details
glibc-ignore-nptlonly.patch (glibc-ignore-nptlonly.patch,2.67 KB, patch)
2004-10-28 15:51 UTC, Daniel Drake (RETIRED)
Details | Diff
glibc-ignore-nptlonly.patch (glibc-ignore-nptlonly.patch,2.67 KB, patch)
2004-10-28 15:53 UTC, Daniel Drake (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ari Rahikkala 2004-10-28 06:25:55 UTC
I tried to install a new version of glibc to try out what would happen if I used the nptlonly flag. Well, this is what happened (the last lines of the build log, unfortunately the only things I saved. I can, however, remember I did *not* see anything getting installed in /lib/tls):

>>> /sbin/ldconfig
!!! FAILED postinst: 2816

I don't have the full build log anymore since I crashed my computer without saving it, and am on Knoppix now (hint for the future me: Don't copy libc.so.6 from a different install on your box over the /lib/tls/libc.so.6 that's currently in use. It makes your box crash immediately and go in a strange HD-using loop). 

--- from here to the next --- we're still in Gentoo and the system hasn't crashed yet

I tried to check out what was broken and what wasn't. I didn't get a very coherent picture. xterm, mozilla, tar, bash and ls all segfaulted, but mv, cp, mpg123, which, ld, ldd and ldconfig didn't. Checking with ldd showed that all of my programs (the ones I checked out, anyway) did resolve "libc.so.6 => /lib/tls/libc.so.6".

On IRC, I went to #gentoo, where marienz asked me if running ldconfig would help. I tried it, and it didn't change anything.

--- now I try to replace the running /lib/tls/libc.so.6 and the aforementioned further breakage occurs ---

I tried out "strace -f chroot /mnt/hda1" (where /mnt/hda1 is the root of my normal Gentoo installation) from inside Knoppix (which is using a 2.6.7 kernel). It segfaulted. Log will be attached.

I got my system to work partly by going into /mnt/hda1/lib, renaming tls to tls-bak and symlinking . to tls. With this, I could chroot into the system normally (*). It got at least bash, tar, xterm and ls to work - not mozilla, though. Portage, however, freezes when I try to use it. strace output of the last few lines of "emerge info" will be attached.

(*) I could already do it with "chroot /mnt/hda1 env LD_PRELOAD=/lib/libc.so.6 bash, though that made only bash work - all the other stuff that used to segfault, kept segfaulting

If there's a way to gather more information about exactly what that "FAILED postinst: 2816" message meant and what caused it, I'd appreciate knowing about it, and if there isn't, I'd *really* appreciate it if someone added something for it into Portage... (assuming, of course, that what happened wasn't that Portage tried to output useful debugging information but couldn't because glibc was broken). Other than that, I'm thinking of getting a working glibc from http://dev.gentoo.org/~avenj/bins/ but since I can do everything I *need* to do with my computer from inside Knoppix at the moment, I'm not in a hurry to break my system further by trying to fix it...

Reproducible: Didn't try
Steps to Reproduce:
Comment 1 Ari Rahikkala 2004-10-28 06:27:29 UTC
Created attachment 42767 [details]
strace -f chroot .
Comment 2 Ari Rahikkala 2004-10-28 06:38:15 UTC
Created attachment 42768 [details]
emerge --info

It turned out from using strace -f instead of just strace that the reason why
emerge info failed wasn't the glibc but a broken /dev/null (which I'd
shamefully ignored for quite a while since udev would normally take over /dev
early in the bootup anyway so it didn't cause any problems). I fixed that, so
instead of attaching a strace log of what supposedly was broken, I'm attaching
the output of emerge info. I seem to have two glibcs :).
Comment 3 Mike Auty (RETIRED) gentoo-dev 2004-10-28 10:51:17 UTC
I've also fallen foul of this, but haven't yet had a chance to try getting my system to a diagnosable state.  It's pretty critical, and the ebuild should probably be pulled until it can be fixed...

Could it have anything to do with prelinking or nptl?

If anyone has any suggestions for attempting to recover from a completely segfaulting system, I'd appreciate it.  I'm wondering whether simply manually completing the installation would help?  Perhaps moving files in a particular order breaks something?  Anyway, any advice anyone can offer would be helpful...

Mike  5:)
Comment 4 Myles Grant 2004-10-28 12:05:33 UTC
I now have two broken machines (they were upgrading at the same time) thanks to this bug.  Please pull the ebuild.
Comment 5 Mike Auty (RETIRED) gentoo-dev 2004-10-28 12:07:49 UTC
A quick update, by following the earlier instructions to move out /lib/tls and symlink (or actually to avoid problems later, copying the contents of) /lib into it's place, I've got back up a 99.9% working system.  I'm currently re-emerging the same version to see if works better with half of it having been preinstalled.  Chances are it'll fail, and I'll step back to an earlier version (interestingly the 20041006 worked fine, but is now unavailable, so I guess I'm gonna have to step back a couple months).  I've included my emerge --info in case it will help anyone (note, since I haven't fixed it entirely, I also have two version of glibc):

Portage 2.0.51-r2 (default-linux/x86/2004.2, gcc-3.3.4, glibc-2.3.4.20041006-r0,glibc-2.3.4.20041021-r0, 2.6.9 i686)
=================================================================
System uname: 2.6.9 i686 AMD Athlon(tm) processor
Gentoo Base System version 1.6.4
distcc 2.18 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]ccache version 2.3 [enabled]
Autoconf: sys-devel/autoconf-2.59-r5
Automake: sys-devel/automake-1.8.5-r1
Binutils: sys-devel/binutils-2.15.92.0.2-r1
Headers:  sys-kernel/linux26-headers-2.6.8.1-r1
Libtools: sys-devel/libtool-1.5.2-r5
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-O3 -mcpu=athlon -march=athlon -funroll-loops -pipe"
CHOST="i686-pc-linux-gnu"
COMPILER=""
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -mcpu=athlon -march=athlon -funroll-loops -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache distlocks sandbox sfperms"
GENTOO_MIRRORS="http://www.mirrorservice.org/sites/www.ibiblio.org/gentoo/"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage /usr/local/overlays/bmg-overlay /usr/local/overlays/freedesktop"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X aalib adns alsa apm avi berkdb bitmap-fonts cdr crypt cups directfb divx4linux dvd encode f77 fam flac foomaticdb gdbm gif gnome gstreamer gtk gtk2 guile ieee1394 imagemagick imlib java jpeg junit ldap libg++ libwww mad mikmod motif mozilla mpeg ncurses nls nptl nptlonly odbc oggvorbis opengl oss pam pda pdflib perl pic png postgres python quicktime readline sdl slang spell sqlite ssl svga tcpd tetex tiff truetype x86 xml xml2 xmms xprint xv xvid zlib"
Comment 6 Myles Grant 2004-10-28 12:27:17 UTC
The system currently hangs on boot after printing the "Booting..." line, so I can't provide any other information.  I'm looking around for a cd to boot from to fix some of this.  I don't remember all my USE flags, but I do know that I had nptl and nptlonly.  Interestingly I am typing this on a machine that this affects.  Firefox appears to be running fine (it was launched before this), but everything else seems to segfault.
Comment 7 Harris Landgarten 2004-10-28 12:35:48 UTC
I emerged the new glibc with no problem. I then noticed that it didn't fix any of the DNS related bugs in 20041006 so I regressed to 20040408 which failed with the same >>> /sbin/ldconfig
!!! FAILED postinst: 2816

now the system segfaults on everything.
Comment 8 Myles Grant 2004-10-28 12:49:09 UTC
Using Ari's /lib/tls/ trick seems to have recovered this machine.  I am currently re-remergeing glic with nptlonly turned off to see if it's different.  If it still breaks, I will try downgrading -- though that doesn't sound like it'll work either.
Comment 9 Mike Auty (RETIRED) gentoo-dev 2004-10-28 15:21:01 UTC
So, just some more results to post back.  Having replaced my /lib/tls directory with a copy of lib I was got back up and working, and recompiled the *same* version of glibc (20041021) and it seemed to emerge fine this time, without changes to the nptlonly flag or anything.  It also appears that the /lib/tls directory does not belong to this new glibc package (or at least so says qpkg).

I have no idea what the tls directory does or is for, but it does definitely seem to be the root of the problem.  Qpkg has just told me that it's now not registered with any package, so I'm going to delete it and assume that it is a fix for the problem.  Revdep-rebuild has not shown any dependencies on it, and all of my system seems to be working as normal.  I hope this helps some people out, and allows the developers to identify what the problem is.  This build should probably still be masked.  It's pretty scarey losing your system, even if it can be quickly fixed with a bootable cd (gentoo livecd, knoppix, whatever)...
Comment 10 Myles Grant 2004-10-28 15:48:39 UTC
I was about to comment that re-emerging with nptlonly off fixed it, but apparently just re-emerging would have been enough.
Comment 11 Daniel Drake (RETIRED) gentoo-dev 2004-10-28 15:51:24 UTC
Created attachment 42810 [details, diff]
glibc-ignore-nptlonly.patch

I'm going to update the ebuild with this diff once I've confirmed that nptlonly
is definately the problem.

toolchain/glibc guys, sorry, this is a really messy ebuild hack. But I can tell
you that being in the situation where 99% of binaries won't run really isn't
much fun. And package.masking this apparently isn't an issue, that would force
a downgrade to the 200408xx version for people who are successfully running the
200410xx version (e.g. they didn't use nptlonly). Hope you can understand!
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2004-10-28 15:53:50 UTC
Created attachment 42811 [details, diff]
glibc-ignore-nptlonly.patch

Sigh..getting the diff the right way round would help.
And to clarify about the package.mask thing, I meant to go on to say that
forcing a downgrade to a version that is so much older would apparently break
lots of things.
Comment 13 Daniel Drake (RETIRED) gentoo-dev 2004-10-28 15:58:27 UTC
Just spoken with Lv on irc, I won't be commiting this. We don't know how it could be nptlonly causing this (due to strangness of /lib/tls). Plus, after you hit the problem, remerging *will* fix it no matter if nptlonly is set or not. Strange stuff..
Comment 14 Travis Tilley (RETIRED) gentoo-dev 2004-10-28 16:05:38 UTC
can anyone with this problem please give me an `ls -lh /lib/tls/` on a broken system?
Comment 15 Mike Auty (RETIRED) gentoo-dev 2004-10-28 16:08:07 UTC
This is an ls -lh of the directory that I moved away, it is in the same state it was after the broken emerge.  Hope it helps...

total 1.6M
-rwxr-xr-x  1 root root 1.2M Oct 25 05:17 libc-2.3.4.so
lrwxrwxrwx  1 root root   13 Oct 25 05:17 libc.so.6 -> libc-2.3.4.so
-rwxr-xr-x  1 root root 154K Oct 25 05:17 libm-2.3.4.so
lrwxrwxrwx  1 root root   13 Oct 25 05:23 libm.so.6 -> libm-2.3.4.so
-rwxr-xr-x  1 root root 163K Oct 25 05:17 libpthread-2.3.4.so
lrwxrwxrwx  1 root root   19 Oct 25 05:17 libpthread.so.0 -> libpthread-2.3.4.so-rwxr-xr-x  1 root root  34K Oct 25 05:17 librt-2.3.4.so
lrwxrwxrwx  1 root root   14 Oct 25 05:17 librt.so.1 -> librt-2.3.4.so
-rwxr-xr-x  1 root root  36K Oct 25 05:17 libthread_db-1.0.so
lrwxrwxrwx  1 root root   19 Oct 25 05:17 libthread_db.so.1 -> libthread_db-1.0.so
Comment 16 Travis Tilley (RETIRED) gentoo-dev 2004-10-28 16:14:11 UTC
just to explain, /lib/tls/ shouldnt exist at all with USE=nptlonly, and i have no idea why it would exist at all for some of you. the older glibc ebuilds didnt create it at all, regardless of USE...

/lib/tls/ is for the nptl-specific libs if and only if /lib/ is currently being used for linuxthreads-enabled versions.
Comment 17 Mike Auty (RETIRED) gentoo-dev 2004-10-28 16:20:07 UTC
If it's of any help, this was the first time I'd seen the nptlonly USE flag, so I decided to set it.  If it existed in the 1006 version, then it's entirely possible I compiled with dual support (possibly creating the tls directory in the process)?
Comment 18 Travis Tilley (RETIRED) gentoo-dev 2004-10-28 16:24:02 UTC
ok, so if we have this right after some discussion on irc, it seems that this is a bug in migrating from USE="nptl" after installing one of the newer glibc ebuilds to USE="nptl nptlonly" with those same ebuilds.

1) the new version without /lib/tls is installed to root
2) the /lib/tls stuff from the previous merge breaks, but is still used... since it's still there
3) postinst fails due to 2
4) due to postinst failing, we never make it as far as removing /lib/tls
5) bork bork bork

Daniel is adding a check to prevent others from hitting this bug until we figure out how to prevent it. Many apologies to anyone who had their install hosed.
Comment 19 Daniel Drake (RETIRED) gentoo-dev 2004-10-28 16:33:53 UTC
Check is now in CVS. The ebuild will simply exit if a nptl --> nptlonly migration is detected. This is only a temporary thing, until we figure out how we can solve this ugly bug properly.
Comment 20 Bill Krueger 2004-10-29 14:46:15 UTC
Well, I, as well, ran across this yeaterday and using the /lib/tls trick from Ari (thank you very much) was able to emerge an older version (20040808-r1) of glibc.  Saw the updates today and reading that as a problem with nptl and nptlonly, I did an emerge sync, took out nptlonly and did an emerge glibc. Big mistake. Here is the tail end of the emerge output:


--- !targe sym /lib/libnss_compat.so.2
--- !targe sym /lib/libnsl.so.1
--- !targe sym /lib/libm.so.6
--- !targe sym /lib/libdl.so.2
--- !targe sym /lib/libcrypt.so.1
--- !targe sym /lib/libc.so.6
--- !targe sym /lib/libanl.so.1
--- !targe sym /lib/libBrokenLocale.so.1
--- !targe sym /lib/ld-linux.so.2
>>> Regenerating /etc/ld.so.cache...
 * Caching service dependencies ...
ls: relocation error: /lib/libpthread.so.0: symbol errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
 * No scripts to process!
bash: /var/lib/init.d/depcache: No such file or directory
 * Failed to cache service dependencies                                                                                 [ !! ]>>> Regenerating /etc/ld.so.cache...
 * Caching service dependencies ...
ls: relocation error: /lib/libpthread.so.0: symbol errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
 * No scripts to process!
bash: /var/lib/init.d/depcache: No such file or directory
 * Failed to cache service dependencies                                                                                 [ !! ]>>> Auto-cleaning packages ...

>>> No outdated packages were found on your system.


 * Regenerating GNU info directory index...
 * Processed 325 info files.

Running things like ls or emerge gives the error:

<program name>: relocation error: /lib/libpthread.so.0: symbol errno, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

Things currently running (gnome, gnome-terminal, firefox, etc) seem to still work. I'm afraid to reboot but I don't know how to get out of this. I'll poke around a bit and see if I come across an out. Any suggestions would be appreciated. 
Comment 21 Bill Krueger 2004-10-29 14:51:01 UTC
One other thing. I tried moving away /lib/tls symlinking /lib to /libb/tls but I still have this error. 
Comment 22 Aard Keimpema 2004-10-30 00:02:11 UTC
I also experienced this problem using the nptlonly flag to the point I couldn't even boot. As (implicidly) suggested by comment #18 I simply deleted /lib/tls which fixed all my problems. 
Comment 23 Stefan Briesenick (RETIRED) gentoo-dev 2004-10-30 12:35:13 UTC
ok folks, I have 3 broken machines now.

- 1st is booting now again with a copied over glibc from another machine, but still not perfect
- 2rd is fully dead
- 3rd is working mostly, but I can't start some apps.

what should I do, to get fully working machines again?

downgrade to previous glibc, remerging current glibc, using -nptlonly or +nptlonly? PLEASE gimme a hint! ;)
Comment 24 Aard Keimpema 2004-10-31 05:56:32 UTC
If you compiled glibc with "USE=nptlonly" then boot from CD/bootfloppy and "mv /lib/tls /root/backup".
Comment 25 Guillaume Castagnino 2004-10-31 17:07:04 UTC
I have problems with glibc 2.3.4.20041021 (nptl) :

amarok and xine-lib fails with :
Inconsistency detected by ld.so: ../sysdeps/generic/dl-tls.c: 72: _dl_next_tls_modid: Assertion `result <= _rtld_local._dl_tls_max_dtv_idx' failed!
Tested with nptlonly and without

Reverting to 2.3.4.20041006 with nptlonly solves all my problems (I have broken my gentoo while downgrading, during the merge, so I used a glibc backup then reemerge glibc-2.3.4.20041006)
Comment 26 Bill Krueger 2004-11-01 09:32:46 UTC
Well, I managed to completely hose my system by trying to swap to a different /li (yes, I did mv /lib to /lib.bad and yes, I should not have done that since nothing works after that and yes I will never do that again). A good learning experience to say the least. I had an old /lib on my second drive so recovered by booting off the Gentoo livecd and copying the old /lib off my second HD to my working system /lib (remember remember compiling glibc with nptl and without nptlonly cause many commands, including emerge, to not work for me so I was trying to get a workable /lib on my system). This allowed me chroot and emerge an old version of glibc (20040808-r1 was the most current that worked for me). Note, during the reboot I noticed an error about libproc-3.2.2.so missing. Not sure where this came from.

When I get to a point where I can afford some down time I'll try 20041021 with use='nptl nptlonly' and then mv /lib/tls away and reboot.
Comment 27 Travis Tilley (RETIRED) gentoo-dev 2004-11-04 18:56:55 UTC
i'll make the preinst delete the /lib/tls directory if present to prevent any further problems. (this should be fine, since there are libraries in /lib to fall back on)

hopefully fixed in glibc 2.3.4.20041102, re-open if it is not
Comment 28 Paul Slinski 2004-11-12 06:38:26 UTC
I have run into this problem with 20041102 on one of my systems.
I've checked two other machines the the existance of /lib/tls after the update to 20041102 and it's still there.
Comment 29 Donnie Berkholz (RETIRED) gentoo-dev 2005-01-04 17:47:41 UTC
I just hit this on the Pegasos trying to upgrade to glibc-2.3.4.20041102 from glibc-2.3.3.20040420-r2. Neither nptl nor nptlonly has ever been in my USE flags, and running /lib/libc.so.6 returns:

GNU C Library 20041102 release version 2.3.4, by Roland McGrath et al.
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7).
Compiled on a Linux 2.4.22 system on 2005-01-04.
Available extensions:
        GNU libio by Per Bothner
        crypt add-on version 2.1 by Michael Glad and others
        linuxthreads-0.10 by Xavier Leroy
        The C stubs add-on version 2.1.2.
        GNU Libidn by Simon Josefsson
        BIND-8.2.3-T5B
        libthread_db work sponsored by Alpha Processor Inc
        NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
        software FPU emulation by Richard Henderson, Jakub Jelinek and others
Thread-local storage support included.
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

Also, I don't have /lib/tls or anything along those lines.
Comment 30 Donnie Berkholz (RETIRED) gentoo-dev 2005-01-04 17:57:43 UTC
OK, ignore my comment -- spanky suggested it's likely unrelated.
Comment 31 SpanKY gentoo-dev 2005-01-04 18:05:18 UTC
closing bug ;)