Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 627378 - sys-libs/glibc with /usr -> . symlink - libm.so gets clobbered
Summary: sys-libs/glibc with /usr -> . symlink - libm.so gets clobbered
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on: 628652
Blocks:
  Show dependency tree
 
Reported: 2017-08-09 08:48 UTC by Duncan
Modified: 2017-12-31 21:26 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan 2017-08-09 08:48:44 UTC
Using lynx to file this as little else I wasn't already running will run, so brief:

System's broken due to bad glibc-2.25-r2 install -- cyclical symlinks.

Almost certainly due to usr/root unification with /usr -> . symlink.

Example cyclical symlink:

lrwxrwxrwx 1 root root 24 Aug  9 00:19 /lib64/libm-2.25.so -> ../../lib64/libm-2.25.so

Presumably the former was supposed to be /usr/lib64 instead of simply /lib64, but since /usr -> . the /lib64 is canonical, and because the symlinker blindly created the /usr/lib64 symlink without checking/dereferencing the existing one...

There are of course several similar cyclicals for different libs...   Now I gotta decide whether I try to fix it manually without rebooting or boot to a backup to fix it...

(Hope this formats reasonably, I'm not used to filing bugs from lynx...)
Comment 1 Duncan 2017-08-09 09:24:56 UTC
(In reply to Duncan from comment #0)

> There are of course several similar cyclicals for different libs...

Taking that back now that I can use familiar tools again.  I got lucky this time and it was only libm.

(I mixed myself up by listing the /proc/$PID/maps for still-running apps, which of course had lots of deleted glibc-2.24-r-whatever files listed.  Only libm-2.25 was missing from 2.25, however.)
Comment 2 Duncan 2017-08-09 10:02:40 UTC
This is probably it (tabs to two-spaces for posting, the dosym line is wrapping here):

toolchain-glibc.eclass @ toolchain-glibc_do_src_install(), line 1148 is the culprit dosym:

  # Newer versions get fancy with libm linkage to include vectorized support.
  # While we don't really need a ldscript here, portage QA checks get upset.
  if [[ -e ${ED}$(alt_usrlibdir)/libm-${upstream_pv}.a ]] ; then
    dosym ../../$(get_libdir)/libm-${upstream_pv}.so $(alt_usrlibdir)/libm-${upstream_pv}.so
  fi


That needs a check to be sure it's not going to be overwriting its actual target.
Comment 3 Mike Gilbert gentoo-dev 2017-08-09 13:40:41 UTC
Reducing the severity due to the exotic system config here.
Comment 4 Sergei Trofimovich (RETIRED) gentoo-dev 2017-08-09 21:15:33 UTC
AFAIU it's not the new code and was there for quite a while.
Comment 5 Duncan 2017-08-10 22:11:59 UTC
(In reply to Sergei Trofimovich from comment #4)
> AFAIU it's not the new code and was there for quite a while.

Maybe it's triggering differently now, perhaps due to the recent migration away from eblits?  (I hadn't upgraded/reinstalled glibc since then.)

It's worth noting that portage normally "does the right thing" with symlinks created in the "fake install" pre-qmerge, ensuring they don't overwrite their targets regardless of what sort of symlinked tree exists on the live filesystem, and has for quite some time.  Breakage is thus rare and tends to happen when ebuilds/eclasses try to manage their own symlinks directly on the live filesystem instead of in the fake install, where portage catches it and does the right thing at qmerge time.  I've very occasionally seen it before, but not with anything near as critical as glibc.

So maybe recent changes moved what was an old-code symlink in the fake install that portage handled correctly, to the live filesystem, where portage doesn't catch and prevent it by dropping the symlink?

I can test a bit now that I know how to fix it if things go bad again, but glibc isn't the shortest to build and install, so it could be awhile before I can report anything useful.
Comment 6 Mike Gilbert gentoo-dev 2017-08-10 22:52:53 UTC
(In reply to Duncan from comment #5)
> It's worth noting that portage normally "does the right thing" with symlinks
> created in the "fake install" pre-qmerge, ensuring they don't overwrite
> their targets regardless of what sort of symlinked tree exists on the live
> filesystem, and has for quite some time.

That doesn't seem to be a true.

% cat foo-0.ebuild
...
src_install() {
        touch x
        insinto a
        doins x
        dosym ../a/x b/x
}


% ls -ld /a /b
drwxr-xr-x 1 root root 2 Aug 10 18:43 /a
lrwxrwxrwx 1 root root 1 Aug 10 18:40 /b -> a

>>> Merging app-misc/foo-0 to /
--- /a/
>>> /a/x
--- /b/
>>> /b/x -> ../a/x
>>> app-misc/foo-0 merged.

% ls -l /a/
lrwxrwxrwx 1 root root 6 Aug 10 18:43 x -> ../a/x

Conclusion: /a/x becomes a circular symlink instead of an empty file.
Comment 7 Mike Gilbert gentoo-dev 2017-08-11 01:33:26 UTC
Also, if /a/x and /b/x are both regular files, the end result is pretty much random depending on which order portage merges them.
Comment 8 Sergei Trofimovich (RETIRED) gentoo-dev 2017-08-11 08:18:32 UTC
> So maybe recent changes moved what was an old-code symlink in the fake
> install that portage handled correctly, to the live filesystem, where
> portage doesn't catch and prevent it by dropping the symlink?
> 
> I can test a bit now that I know how to fix it if things go bad again

Please do.

> but glibc isn't the shortest to build and install, so it could be awhile before

Takes 4 minutes here.

You can check if it's a new breakage by setting up a chroot:
- with your '/usr -> .' symlinks set
- with pre-eblits glibc portage tree by rewinding git ::gentoo into past
- install any glibc version from there to chech if the breakage is new

I suspect it's not a new breakage. Because glibc ebuild does not check
symlink state on a live system (and it should not). portage does peform
merge into live system phase and writes two files into the same path.
Comment 9 Duncan 2017-08-13 08:00:13 UTC
To my pleasant but slightly frustrated surprise, when I updated tonite and pulled in portage-2.3.7 and glibc-2.25-r3, glibc merged just fine, including libm-2.25.so, and emerge continued on with the next packages. =:^)

I had a couple mcs and spare firefox tabs open so when libm was killed I could easily copy it out of the new binpkg and keep working, but it turned out that wasn't necessary.

So the bug seems fixed, but I honestly don't know if it was something in the new glibc-2.25-r3 revision-bump, or that pending eclass change that was posted to -dev (which I'm not sure whether it went in yet or not), or the portage update that happened earlier in the emerge, or something else, and I'm always a bit suspicious of fixes I can't pin down as that means I don't know what was fixed or what to look for if it breaks again, so despite the relief, I remain a bit frustrated.

Anyway, resolved/fixed for now. =:^) Just hoping it stays that way since I don't know what fixed it.

Thanks for the support, and the fix, even if I'm not sure what it was. =:^)
Comment 10 Sergei Trofimovich (RETIRED) gentoo-dev 2017-08-13 08:42:31 UTC
(In reply to Duncan from comment #9)
> To my pleasant but slightly frustrated surprise, when I updated tonite and
> pulled in portage-2.3.7 and glibc-2.25-r3, glibc merged just fine, including
> libm-2.25.so, and emerge continued on with the next packages. =:^)
> 
> I had a couple mcs and spare firefox tabs open so when libm was killed I
> could easily copy it out of the new binpkg and keep working, but it turned
> out that wasn't necessary.
> 
> So the bug seems fixed, but I honestly don't know if it was something in the
> new glibc-2.25-r3 revision-bump, or that pending eclass change that was
> posted to -dev (which I'm not sure whether it went in yet or not), or the
> portage update that happened earlier in the emerge, or something else, and
> I'm always a bit suspicious of fixes I can't pin down as that means I don't
> know what was fixed or what to look for if it breaks again, so despite the
> relief, I remain a bit frustrated.
> 
> Anyway, resolved/fixed for now. =:^) Just hoping it stays that way since I
> don't know what fixed it.
> 
> Thanks for the support, and the fix, even if I'm not sure what it was. =:^)

I don't think anything was fixed at least on glibc side. My suspiction is that
you got lucky and this time symlink was installed first and was overwritten by
shared library itself.

You can check glibc's binpkg and merge log order to check for sequence of files
installed.
Comment 11 Duncan 2017-08-13 09:35:55 UTC
(In reply to Sergei Trofimovich from comment #10)
> (In reply to Duncan from comment #9)
> > To my pleasant but slightly frustrated surprise, when I updated tonite and
> > pulled in portage-2.3.7 and glibc-2.25-r3, glibc merged just fine, including
> > libm-2.25.so, and emerge continued on with the next packages. =:^)

> I don't think anything was fixed at least on glibc side. My suspiction is
> that you got lucky and this time symlink was installed first and was 
> overwritten by shared library itself.
> 
> You can check glibc's binpkg and merge log order to check for sequence of
> files installed.

The order does appear to be as you said, the symlink first.  But if it's pure luck, and the code hasn't changed for some time, then I'm pretty lucky, because it has been working for years and multiple glibc upgrades and reinstalls, now, save 2.25-r2, but working again with -r3.

And there's a *LOT* of symlinks to libs between /lib and /usr/lib, many with glibc, but some with other packages as well, for it to be pure luck all this time!

But perhaps what the portage devs meant when I asked them several years ago, before I did the usr merge, whether portage would handle merges correctly re symlinks where the files pointed to were the same dereferenced path as the targets they pointed to, and they said portage handled it automatically, which it has indeed seemed to, is that what portage actually does is merge the symlinks first, so the actual files overwrite them.

In that case, not doing so in this case would have been a portage bug, and given that I had merged a portage update earlier in the same emerge session that saw the glibc merge work correctly again, maybe...

I should ask again on portage-dev, pointing them at this bug.
Comment 12 Michael Hofmann 2017-08-22 17:58:58 UTC
This doesn't seem to be fixed! 

I just upgraded to glibc-2.25-r4 (from 2.23-r4). emerge showed some errors, but exited successfully. Unfortunately, my system was borked. /usr/lib/libm-2-25.so was linked to itself - with the result that I couldn't start any programs that depend on libm.

I guess I ran into this problem because on my server, /bin/, /sbin, /lib, /lib32 and /lib64 are links to /usr/bin, /usr/sbin, /usr/lib, /usr/lib32 and /usr/lib64. I don't think that an "exoctic" configuration - that's what RHEL does...
Comment 13 Mike Gilbert gentoo-dev 2017-08-22 18:27:24 UTC
(In reply to Michael Hofmann from comment #12)
> I don't think that an "exoctic" configuration - that's what RHEL does...

It is not currently a "supported" configuration on Gentoo Linux.
Comment 14 Sergei Trofimovich (RETIRED) gentoo-dev 2017-08-22 21:05:02 UTC
I see the following paths forward from current situations:

1. fix portage to prevent system breakage when package overwrites it's own files

2. have something that indicates /lib & /usr/lib merge.
  Say, profile variable USR_IS_NO_MORE=yes/no (similar to SYMLINK_LIB=yes/no).
  That way more ebuilds will be able to workaround the collisions.
Comment 15 Sergei Trofimovich (RETIRED) gentoo-dev 2017-08-22 21:20:16 UTC
> 1. fix portage to prevent system breakage when package overwrites it's own
> files

filed bug #628652
Comment 16 Mike Gilbert gentoo-dev 2017-12-31 17:19:01 UTC
It looks like we have had a PR open for this since march. Does the solution presented there seem ok?
Comment 17 Sergei Trofimovich (RETIRED) gentoo-dev 2017-12-31 18:44:22 UTC
(In reply to Mike Gilbert from comment #16)
> It looks like we have had a PR open for this since march. Does the solution
> presented there seem ok?

Looks good. I'll try to pull it in.
Comment 18 Larry the Git Cow gentoo-dev 2017-12-31 21:26:40 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=0e95698ff9584aa8045275705c0247a77264d044

commit 0e95698ff9584aa8045275705c0247a77264d044
Author:     Matija Skala <mskala@gmx.com>
AuthorDate: 2017-11-09 16:49:37 +0000
Commit:     Sergei Trofimovich <slyfox@gentoo.org>
CommitDate: 2017-12-31 21:26:29 +0000

    sys-libs/glibc: avoid libm-2.26.so symlink clash on merged /usr, bug #627378
    
    The problem in bug #627378 manifests as libm-2.26.so file corruption:
    
    Before the change glibc package contained a 'libm-2.26.so'
    symlink from '/usr/lib64' to '/lib64':
    
        $ equery f sys-libs/glibc | sed 's@usr/lib@lib@g' | sort | uniq -d
        /lib64/libm-2.26.so
    
    When both are the same directory all depends on the merge order:
    - symlink first, then real file. real file overwrites symlink, all is good
    - real file first, then symlink. symlink overwrites the file and points to
      itself. Binaries linked against libm fail to start.
    
    The change is to get rid of symlink (symlink was a workaround to portage's
    QA check) and move 'libm-2.26.a' from '/usr/lib64' to '/usr/lib64/glibc-<pv>'.
    
    Reported-by: Duncan
    Fixed-by: Matija Skala
    Closes: https://bugs.gentoo.org/627378
    Closes: https://github.com/gentoo/gentoo/pull/4268

 sys-libs/glibc/{glibc-2.26-r4.ebuild => glibc-2.26-r5.ebuild} | 11 +++++++----
 sys-libs/glibc/glibc-9999.ebuild                              | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)