Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 687648 - app-portage/unsymlink-lib-13 : emerge --info hangs after unsymlink-lib --migrate, orphan entries reported wrongly by unsymlink-lib --analyze
Summary: app-portage/unsymlink-lib-13 : emerge --info hangs after unsymlink-lib --migr...
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Michał Górny
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-08 16:25 UTC by Alexander Bezrukov
Modified: 2019-06-09 18:28 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
strace -f emerge --info (strace.log,137.40 KB, text/plain)
2019-06-09 12:05 UTC, Alexander Bezrukov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Bezrukov 2019-06-08 16:25:31 UTC
According to the instructions in the news item "2019-06-05  amd64 17.1 profiles are now stable", I followed the procedure for migration to the 17.1 profiles. I already completed the procedure with a handful of Gentoo installations so my eyesight might have been not so sharp as in the beginning. In particular, I found some strange output of orphan files (many of which are actually not, like libdb* which belong to sys-libs/db).

The item 4 in the instruction tells to check if important commands (particularly, emerge --info) work after unsymlink-lib --migrate. In my case, emerge --info (and many other commands involving invocation of python) are hanging. Even worse, unsymlink-lib --rollback doesn't fix the situation. I have no idea as to how to debug the issue.

The output of unsymlink-lib --analyze:
----------------------------------------------------------------------
Analyzing files installed into lib & lib64...

directories that will be moved to /lib/:
	rc
	modprobe.d
	udev
	netifrc
	systemd
	gentoo
	firmware
	(+ 0 files)

directories whose contents will be split between /lib/ and /lib64/:
	dhcpcd

orphan files (not owned by any package) that will be moved to /lib/:
	modules
	cpp

orphan files (not owned by any package) that will be kept in /lib64/:
	libgcc_s.so.1
	libunwind.so.8.0.1
	libunwind.so.8

directories that will be moved to /usr/lib/:
	dracut
	geeqie
	portage
	gcc
	crda
	python-exec
	kernel
	llvm
	telegram-desktop-bin
	clang
	systemd
	jvm
	tmpfiles.d
	polkit-1
	upower
	(+ 1 files)

directories whose contents will be split between /usr/lib/ and /usr/lib64/:
	cmake
	ConsoleKit

orphan files (not owned by any package) that will be moved to /usr/lib/:
	cracklib_dict.pwd
	cracklib_dict.hwm
	cracklib_dict.pwi
	ntfs-3g
	libxslt-plugins

orphan files (not owned by any package) that will be kept in /usr/lib64/:
	libdb_cxx.so
	libdb_stl.a
	libdb_stl.so
	libdb_sql.a
	libXvMCgallium.so
	libdb_cxx.a
	libfreebl3.chk
	libnssdbm3.chk
	libsoftokn3.chk
	libdb.so
	libdb.a
	libdb_sql.so

directories that will be moved to /usr/local/lib/:
	(+ 0 files)

directories whose contents will be split between /usr/local/lib/ and /usr/local/lib64/:

orphan files (not owned by any package) that will be kept in /usr/local/lib64/:
	libi2c.so.0.1.0
	libi2c.so.0
	libi2c.so
	libi2c.a


One or more package files are missing from the system. This should not
cause any problems but you may want to reinstall the packages
that installed them. The missing files are:

	/lib/firmware/intel/dsp_fw_bxtn.bin
	/lib/firmware/intel/dsp_fw_kbl.bin
	/lib/firmware/intel/dsp_fw_release.bin
	/usr/lib64/vdpau/libvdpau_gallium.so


The state has been saved and the migration is ready to proceed.
To initiate it, please run:

	/usr/lib/python-exec/python3.6/unsymlink-lib --migrate

Please do not perform any changes to the system at this point.
If you performed any changes, please rerun the analysis.

----------------------------------------------------------------------


The 17.0 profile is:
# ls -l /etc/portage/make.profile
etc/portage/make.profile -> ../../usr/portage/profiles/default/linux/amd64/17.0/desktop

I would appreciate any idea as to how to debug the issue. I do have a fresh backup and may try to restore the installation if I fail to fix the issue in a more intelligent way. For obvious reasons, I cannot attach the output of emerge --info right now.
Comment 1 Alexander Bezrukov 2019-06-09 08:36:43 UTC
I have only gcc:8.3.0 installed. A don't know what exactly changes but I made emerge --info and many other commands (including env-update) to not hang anymore with
> CHOST="x86_64-pc-linux-gnu" eselect gcc 1
Interestingly, gcc-config (instead of eselect gcc) didn't help.

It seems I have now a functional installation with a 17.0 profile.
Comment 2 Alexander Bezrukov 2019-06-09 09:05:13 UTC
What makes me frustrating is:

orphan dirs/files (not owned by any package) that will be moved to /lib/:
	cpp

> equery b /lib/cpp
 * Searching for /lib/cpp ... 
sys-devel/gcc-8.3.0-r1 (/usr/x86_64-pc-linux-gnu/gcc-bin/8.3.0/x86_64-pc-linux-gnu-cpp)

orphan dirs/files (not owned by any package) that will be kept in /usr/lib64/:
	libXvMCgallium.so
	libdb.a
	libdb.so
	libdb_cxx.a
	libdb_cxx.so
	libdb_sql.a
	libdb_sql.so
	libdb_stl.a
	libdb_stl.so
	libfreebl3.chk
	libnssdbm3.chk
	libsoftokn3.chk

> equery b /usr/lib64/{libXvMCgallium.so,libdb.a,libdb.so,libdb_cxx.a,libdb_cxx.so,libdb_sql.a,libdb_sql.so,libdb_stl.a,libdb_stl.so,libfreebl3.chk,libnssdbm3.chk,libsoftokn3.chk}

media-libs/mesa-18.3.6 (/usr/lib64/libXvMCnouveau.so)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb-5.3.so)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_cxx-5.3.so)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_sql-5.3.so)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_stl-5.3.so)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb-5.3.a)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_sql-5.3.a)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_cxx-5.3.a)
sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_stl-5.3.a)

orphan dirs/files (not owned by any package) that will be moved to /usr/lib/:
<...>
	libxslt-plugins
	ntfs-3g

> equery b /usr/lib64/libxslt-plugins
 * Searching for /usr/lib64/libxslt-plugins ... 
dev-libs/libxslt-1.1.33-r1 (/usr/lib64/libxslt-plugins)

> equery b /usr/lib64/ntfs-3g
 * Searching for /usr/lib64/ntfs-3g ... 
sys-fs/ntfs3g-2017.3.23-r2 (/usr/lib64/ntfs-3g)

I don't understand why these entries are called orphan and not belonging to any packages.
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2019-06-09 10:39:25 UTC
They are orphan.  Note that the path in 'equery b' output is different than the one you're checking.

Please strace(1) invocation of hanging tools.
Comment 4 Alexander Bezrukov 2019-06-09 11:52:36 UTC
(In reply to Michał Górny from comment #3)
> They are orphan.  Note that the path in 'equery b' output is different than
> the one you're checking.

I may be blind or not understanding which paths are actually are checked but

1. I don't see difference in paths:
<...> that will be kept in /usr/lib64/:
and
equery b /usr/lib64/{libXvMCgallium.so,libdb.a,libdb.so,libdb_cxx.a,libdb_cxx.so,libdb_sql.a,libdb_sql.so,libdb_stl.a,libdb_stl.so,libfreebl3.chk,libnssdbm3.chk,libsoftokn3.chk}

2. If (in that particular case) the path is not /usr/lib64, then what is the other path?

# updatedb
# locate libXvMCgallium.so libdb.a libdb.so libdb_cxx.a libdb_cxx.so libdb_sql.a libdb_sql.so libdb_stl.a libdb_stl.so libfreebl3.chk libnssdbm3.chk libsoftokn3.chk
/usr/lib64/libXvMCgallium.so
/usr/lib64/libdb.a
/usr/lib64/libdb.so
/usr/lib64/libdb_cxx.a
/usr/lib64/libdb_cxx.so
/usr/lib64/libdb_sql.a
/usr/lib64/libdb_sql.so
/usr/lib64/libdb_stl.a
/usr/lib64/libdb_stl.so
/usr/lib64/libfreebl3.chk
/usr/lib64/libnssdbm3.chk
/usr/lib64/libsoftokn3.chk

There are no files with these names on the system, except in /usr/lib64


> Please strace(1) invocation of hanging tools.

I will create an attachment. What I see is getrandom() seems to get blocked. I have nothing special about randomness sources on this particular laptop except
$ grep RANDOM_TRUST_CPU /usr/src/linux/.config
# CONFIG_RANDOM_TRUST_CPU is not set
Previously this has never been a problem (/dev/urandom was a source of cryptographically strong random numbers, which isn't blocking).

At the moment of hang:
$ cat /proc/sys/kernel/random/entropy_avail
82

I am wondering as to what has changed so that python tools started to block in get_random() after the migration.
Comment 5 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2019-06-09 11:57:26 UTC
(In reply to Alexander Bezrukov from comment #4)
> (In reply to Michał Górny from comment #3)
> > They are orphan.  Note that the path in 'equery b' output is different than
> > the one you're checking.
> 
> I may be blind or not understanding which paths are actually are checked but
> 
> 1. I don't see difference in paths:
> <...> that will be kept in /usr/lib64/:
> and
> equery b
> /usr/lib64/{libXvMCgallium.so,libdb.a,libdb.so,libdb_cxx.a,libdb_cxx.so,
> libdb_sql.a,libdb_sql.so,libdb_stl.a,libdb_stl.so,libfreebl3.chk,libnssdbm3.
> chk,libsoftokn3.chk}

Look at output, not input:

|  * Searching for /lib/cpp ... 
| sys-devel/gcc-8.3.0-r1 (/usr/x86_64-pc-linux-gnu/gcc-bin/8.3.0/x86_64-pc-linux-gnu-cpp)

It reports ownership of symlink *target*, not the symlink itself.

> 2. If (in that particular case) the path is not /usr/lib64, then what is the
> other path?

Since /usr/lib is symlink to /usr/lib64, the file might have been intended to be installed in either of those directories.  Since orphan files were not monitored properly, we need to guess what was the intent.

> > Please strace(1) invocation of hanging tools.
> 
> I will create an attachment. What I see is getrandom() seems to get blocked.
> I have nothing special about randomness sources on this particular laptop
> except
> $ grep RANDOM_TRUST_CPU /usr/src/linux/.config
> # CONFIG_RANDOM_TRUST_CPU is not set
> Previously this has never been a problem (/dev/urandom was a source of
> cryptographically strong random numbers, which isn't blocking).
> 
> At the moment of hang:
> $ cat /proc/sys/kernel/random/entropy_avail
> 82
> 
> I am wondering as to what has changed so that python tools started to block
> in get_random() after the migration.

I suppose it hangs on something that's not monitored by strace(1).  gdb(1) is your next choice.
Comment 6 Alexander Bezrukov 2019-06-09 12:05:16 UTC
Created attachment 579318 [details]
strace -f emerge --info

This is the strace log for emerge --info. I see that same hang in other python applications (get_random() has not returned). The flags are 0, so it should not block but it does.
Comment 7 Alexander Bezrukov 2019-06-09 13:01:31 UTC
(In reply to Michał Górny from comment #5)
> (In reply to Alexander Bezrukov from comment #4)
> > (In reply to Michał Górny from comment #3)

> It reports ownership of symlink *target*, not the symlink itself.

Ah, I see now. Thank you for the explanation. Since qfile has got
broken several years ago (-f option stopped to having been recognized
despite was documented at the time and is still mentioned in the qfile(1)
man page now), I stopped to trace orphan files in the system.

> I suppose it hangs on something that's not monitored by strace(1).  gdb(1)
> is your next choice.

It is monitored. get_random() in its non-blocking variant (default) has not returned. I am wondering what may have changed so that get_random() with zero flags started to block after unsymlink-lib --migrate. By the way, hangs disappeared just after about an hour of idle system usage (the amount of entropy is still low). I was reasonably patient (waited 10 minutes) when the hang was manifesting. I will try to debug the issue when have more spare time (I am afraid, not within a couple of weeks). I am closing the bug for now. The problem is well reproducible on at least one system but it is probably related to glibc. I will append or open a new bug if I will find something worthy mentioning about the issue.
Comment 8 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2019-06-09 15:51:32 UTC
Oh, sorry, I misunderstood you. That's weird indeed. Did you have any custom library override maybe?
Comment 9 Alexander Bezrukov 2019-06-09 16:34:12 UTC
I investigated a little more.

First of all, a reproducer as trivial as the one below demonstrates the hang:

#include <sys/random.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
  size_t buflen=256;
  if(argc==2) {
    buflen=atoi(argv[1]);
    printf("Requesting %zd bytes of random data\n", buflen);
  }
  if(buflen<=0) return EXIT_FAILURE;
    void *const buf=malloc(buflen);
    if(!buf) {
      fprintf(stderr, "Could not allocate %zu bytes\n", buflen);
      return EXIT_FAILURE;
    }
  const ssize_t result=getrandom(buf, buflen, 0);
  printf("Resulted in %zd bytes of random data\n", result);
  return result!=buflen;
}

$ ./reproduce 1
Requesting 1 bytes of random data
^C
(hung)

$ /etc/init.d/urandom status
 * status: started

$ stat /var/lib/misc/random-seed
  File: /var/lib/misc/random-seed
  Size: 512       	Blocks: 8          IO Block: 4096   regular file
Device: 2ah/42d	Inode: 596713      Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-06-09 19:06:52.693999951 +0300
Modify: 2019-06-09 19:06:52.693999951 +0300
Change: 2019-06-09 19:06:52.693999951 +0300
 Birth: -

But if I do
# /etc/init.d/urandom restart
the hung goes away.

I think the real cause of the issue is that because of some random changes to the system made by unsymlink-lib, /etc/init.d/urandom became starting too early, before /dev/urandom ever created (asynchronously). On the other hand, I cannot support this view by experiment, adding just a little debug to /etc/init.d/urandom causes the hung to disappear, even adding a single echo to the beginning of this init script. I am not sure who creates /dev/urandom, kernel on its own or (e)udev. I worked around the issue for myself by sleeping 1 second in /etc/init.d/urandom on start (not the most elegant solution, of course).

Please advise if I should file a bug on this finding or it is the user's duty to keep track of such things, not openrc's.
Comment 10 Alexander Bezrukov 2019-06-09 16:36:19 UTC
(In reply to Michał Górny from comment #8)
> Oh, sorry, I misunderstood you. That's weird indeed. Did you have any custom
> library override maybe?

No, this is a very standard installation with no exotic software. It has a relatively fast SSD, the whole system boots in about couple of seconds. This probably was the culprit: /etc/init.d/urandom was started to early.
Comment 11 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2019-06-09 17:26:07 UTC
Is /dev/urandom a custom script or sth installed by a Gentoo package?  Maybe it needs 'after udev-settle' or sth like that (presuming you're using udev).
Comment 12 Alexander Bezrukov 2019-06-09 17:51:02 UTC
(In reply to Michał Górny from comment #11)
> Is /dev/urandom a custom script or sth installed by a Gentoo package?  Maybe
> it needs 'after udev-settle' or sth like that (presuming you're using udev).

/etc/init.d/urandom is the script installed by openrc, nothing special. Now I added a single line at the very beginning of the start() function

[ -c /dev/urandom ] || sleep 1

and this helped for several reboots. But after half a dozen successfull reboots hang has manifested again and manual re-initialization of /dev/urandom doesn't help anymore. But with the original /etc/init.d/urandom (that is, without this line with sleep), the hung was reproducible very well, I checked 5 times in the row. It looks like the issue is floating. I will try to investigate more.
Comment 13 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2019-06-09 18:28:40 UTC
I'm CC-ing openrc@ in case they have any idea what could be wrong with the script.