Summary: | app-portage/unsymlink-lib-13 : emerge --info hangs after unsymlink-lib --migrate, orphan entries reported wrongly by unsymlink-lib --analyze | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Alexander Bezrukov <phmagic> |
Component: | Current packages | Assignee: | Michał Górny <mgorny> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | openrc |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | strace -f emerge --info |
Description
Alexander Bezrukov
2019-06-08 16:25:31 UTC
I have only gcc:8.3.0 installed. A don't know what exactly changes but I made emerge --info and many other commands (including env-update) to not hang anymore with
> CHOST="x86_64-pc-linux-gnu" eselect gcc 1
Interestingly, gcc-config (instead of eselect gcc) didn't help.
It seems I have now a functional installation with a 17.0 profile.
What makes me frustrating is: orphan dirs/files (not owned by any package) that will be moved to /lib/: cpp > equery b /lib/cpp * Searching for /lib/cpp ... sys-devel/gcc-8.3.0-r1 (/usr/x86_64-pc-linux-gnu/gcc-bin/8.3.0/x86_64-pc-linux-gnu-cpp) orphan dirs/files (not owned by any package) that will be kept in /usr/lib64/: libXvMCgallium.so libdb.a libdb.so libdb_cxx.a libdb_cxx.so libdb_sql.a libdb_sql.so libdb_stl.a libdb_stl.so libfreebl3.chk libnssdbm3.chk libsoftokn3.chk > equery b /usr/lib64/{libXvMCgallium.so,libdb.a,libdb.so,libdb_cxx.a,libdb_cxx.so,libdb_sql.a,libdb_sql.so,libdb_stl.a,libdb_stl.so,libfreebl3.chk,libnssdbm3.chk,libsoftokn3.chk} media-libs/mesa-18.3.6 (/usr/lib64/libXvMCnouveau.so) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb-5.3.so) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_cxx-5.3.so) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_sql-5.3.so) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_stl-5.3.so) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb-5.3.a) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_sql-5.3.a) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_cxx-5.3.a) sys-libs/db-5.3.28-r2 (/usr/lib64/libdb_stl-5.3.a) orphan dirs/files (not owned by any package) that will be moved to /usr/lib/: <...> libxslt-plugins ntfs-3g > equery b /usr/lib64/libxslt-plugins * Searching for /usr/lib64/libxslt-plugins ... dev-libs/libxslt-1.1.33-r1 (/usr/lib64/libxslt-plugins) > equery b /usr/lib64/ntfs-3g * Searching for /usr/lib64/ntfs-3g ... sys-fs/ntfs3g-2017.3.23-r2 (/usr/lib64/ntfs-3g) I don't understand why these entries are called orphan and not belonging to any packages. They are orphan. Note that the path in 'equery b' output is different than the one you're checking. Please strace(1) invocation of hanging tools. (In reply to Michał Górny from comment #3) > They are orphan. Note that the path in 'equery b' output is different than > the one you're checking. I may be blind or not understanding which paths are actually are checked but 1. I don't see difference in paths: <...> that will be kept in /usr/lib64/: and equery b /usr/lib64/{libXvMCgallium.so,libdb.a,libdb.so,libdb_cxx.a,libdb_cxx.so,libdb_sql.a,libdb_sql.so,libdb_stl.a,libdb_stl.so,libfreebl3.chk,libnssdbm3.chk,libsoftokn3.chk} 2. If (in that particular case) the path is not /usr/lib64, then what is the other path? # updatedb # locate libXvMCgallium.so libdb.a libdb.so libdb_cxx.a libdb_cxx.so libdb_sql.a libdb_sql.so libdb_stl.a libdb_stl.so libfreebl3.chk libnssdbm3.chk libsoftokn3.chk /usr/lib64/libXvMCgallium.so /usr/lib64/libdb.a /usr/lib64/libdb.so /usr/lib64/libdb_cxx.a /usr/lib64/libdb_cxx.so /usr/lib64/libdb_sql.a /usr/lib64/libdb_sql.so /usr/lib64/libdb_stl.a /usr/lib64/libdb_stl.so /usr/lib64/libfreebl3.chk /usr/lib64/libnssdbm3.chk /usr/lib64/libsoftokn3.chk There are no files with these names on the system, except in /usr/lib64 > Please strace(1) invocation of hanging tools. I will create an attachment. What I see is getrandom() seems to get blocked. I have nothing special about randomness sources on this particular laptop except $ grep RANDOM_TRUST_CPU /usr/src/linux/.config # CONFIG_RANDOM_TRUST_CPU is not set Previously this has never been a problem (/dev/urandom was a source of cryptographically strong random numbers, which isn't blocking). At the moment of hang: $ cat /proc/sys/kernel/random/entropy_avail 82 I am wondering as to what has changed so that python tools started to block in get_random() after the migration. (In reply to Alexander Bezrukov from comment #4) > (In reply to Michał Górny from comment #3) > > They are orphan. Note that the path in 'equery b' output is different than > > the one you're checking. > > I may be blind or not understanding which paths are actually are checked but > > 1. I don't see difference in paths: > <...> that will be kept in /usr/lib64/: > and > equery b > /usr/lib64/{libXvMCgallium.so,libdb.a,libdb.so,libdb_cxx.a,libdb_cxx.so, > libdb_sql.a,libdb_sql.so,libdb_stl.a,libdb_stl.so,libfreebl3.chk,libnssdbm3. > chk,libsoftokn3.chk} Look at output, not input: | * Searching for /lib/cpp ... | sys-devel/gcc-8.3.0-r1 (/usr/x86_64-pc-linux-gnu/gcc-bin/8.3.0/x86_64-pc-linux-gnu-cpp) It reports ownership of symlink *target*, not the symlink itself. > 2. If (in that particular case) the path is not /usr/lib64, then what is the > other path? Since /usr/lib is symlink to /usr/lib64, the file might have been intended to be installed in either of those directories. Since orphan files were not monitored properly, we need to guess what was the intent. > > Please strace(1) invocation of hanging tools. > > I will create an attachment. What I see is getrandom() seems to get blocked. > I have nothing special about randomness sources on this particular laptop > except > $ grep RANDOM_TRUST_CPU /usr/src/linux/.config > # CONFIG_RANDOM_TRUST_CPU is not set > Previously this has never been a problem (/dev/urandom was a source of > cryptographically strong random numbers, which isn't blocking). > > At the moment of hang: > $ cat /proc/sys/kernel/random/entropy_avail > 82 > > I am wondering as to what has changed so that python tools started to block > in get_random() after the migration. I suppose it hangs on something that's not monitored by strace(1). gdb(1) is your next choice. Created attachment 579318 [details]
strace -f emerge --info
This is the strace log for emerge --info. I see that same hang in other python applications (get_random() has not returned). The flags are 0, so it should not block but it does.
(In reply to Michał Górny from comment #5) > (In reply to Alexander Bezrukov from comment #4) > > (In reply to Michał Górny from comment #3) > It reports ownership of symlink *target*, not the symlink itself. Ah, I see now. Thank you for the explanation. Since qfile has got broken several years ago (-f option stopped to having been recognized despite was documented at the time and is still mentioned in the qfile(1) man page now), I stopped to trace orphan files in the system. > I suppose it hangs on something that's not monitored by strace(1). gdb(1) > is your next choice. It is monitored. get_random() in its non-blocking variant (default) has not returned. I am wondering what may have changed so that get_random() with zero flags started to block after unsymlink-lib --migrate. By the way, hangs disappeared just after about an hour of idle system usage (the amount of entropy is still low). I was reasonably patient (waited 10 minutes) when the hang was manifesting. I will try to debug the issue when have more spare time (I am afraid, not within a couple of weeks). I am closing the bug for now. The problem is well reproducible on at least one system but it is probably related to glibc. I will append or open a new bug if I will find something worthy mentioning about the issue. Oh, sorry, I misunderstood you. That's weird indeed. Did you have any custom library override maybe? I investigated a little more. First of all, a reproducer as trivial as the one below demonstrates the hang: #include <sys/random.h> #include <stdlib.h> #include <stdio.h> int main(int argc, char *argv[]) { size_t buflen=256; if(argc==2) { buflen=atoi(argv[1]); printf("Requesting %zd bytes of random data\n", buflen); } if(buflen<=0) return EXIT_FAILURE; void *const buf=malloc(buflen); if(!buf) { fprintf(stderr, "Could not allocate %zu bytes\n", buflen); return EXIT_FAILURE; } const ssize_t result=getrandom(buf, buflen, 0); printf("Resulted in %zd bytes of random data\n", result); return result!=buflen; } $ ./reproduce 1 Requesting 1 bytes of random data ^C (hung) $ /etc/init.d/urandom status * status: started $ stat /var/lib/misc/random-seed File: /var/lib/misc/random-seed Size: 512 Blocks: 8 IO Block: 4096 regular file Device: 2ah/42d Inode: 596713 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-06-09 19:06:52.693999951 +0300 Modify: 2019-06-09 19:06:52.693999951 +0300 Change: 2019-06-09 19:06:52.693999951 +0300 Birth: - But if I do # /etc/init.d/urandom restart the hung goes away. I think the real cause of the issue is that because of some random changes to the system made by unsymlink-lib, /etc/init.d/urandom became starting too early, before /dev/urandom ever created (asynchronously). On the other hand, I cannot support this view by experiment, adding just a little debug to /etc/init.d/urandom causes the hung to disappear, even adding a single echo to the beginning of this init script. I am not sure who creates /dev/urandom, kernel on its own or (e)udev. I worked around the issue for myself by sleeping 1 second in /etc/init.d/urandom on start (not the most elegant solution, of course). Please advise if I should file a bug on this finding or it is the user's duty to keep track of such things, not openrc's. (In reply to Michał Górny from comment #8) > Oh, sorry, I misunderstood you. That's weird indeed. Did you have any custom > library override maybe? No, this is a very standard installation with no exotic software. It has a relatively fast SSD, the whole system boots in about couple of seconds. This probably was the culprit: /etc/init.d/urandom was started to early. Is /dev/urandom a custom script or sth installed by a Gentoo package? Maybe it needs 'after udev-settle' or sth like that (presuming you're using udev). (In reply to Michał Górny from comment #11) > Is /dev/urandom a custom script or sth installed by a Gentoo package? Maybe > it needs 'after udev-settle' or sth like that (presuming you're using udev). /etc/init.d/urandom is the script installed by openrc, nothing special. Now I added a single line at the very beginning of the start() function [ -c /dev/urandom ] || sleep 1 and this helped for several reboots. But after half a dozen successfull reboots hang has manifested again and manual re-initialization of /dev/urandom doesn't help anymore. But with the original /etc/init.d/urandom (that is, without this line with sleep), the hung was reproducible very well, I checked 5 times in the row. It looks like the issue is floating. I will try to investigate more. I'm CC-ing openrc@ in case they have any idea what could be wrong with the script. |