I've lately been experimenting with doing some development in Podman containers, but I keep hitting a hang related to newconfd/doins. The following is in process of building dev-lang/python-3.10.1 but I have hit this hang with other packages. Running portage processes: 6e4871941b40 / # ps aux | grep portage root 41010 0.0 0.0 2592 1760 pts/0 S+ 15:48 0:00 [dev-lang/python-3.10.1] sandbox /usr/lib/portage/python3.9/ebuild.sh install root 41011 0.0 0.0 10952 8216 pts/0 S+ 15:48 0:00 /bin/bash /usr/lib/portage/python3.9/ebuild.sh install root 41027 0.0 0.0 11216 7600 pts/0 S+ 15:48 0:00 /bin/bash /usr/lib/portage/python3.9/ebuild.sh install root 46673 0.0 0.0 7524 4892 pts/0 S+ 15:49 0:00 /bin/bash /usr/lib/portage/python3.9/ebuild-helpers/newconfd /var/tmp/portage/dev-lang/python-3.10.1/files/pydoc.conf pydoc-3.10 root 46676 0.0 0.0 7524 4620 pts/0 S+ 15:49 0:00 /bin/bash /usr/lib/portage/python3.9/ebuild-helpers/doins /var/tmp/portage/dev-lang/python-3.10.1/temp/.newins_tmp_YVWNlp/pydoc-3.10 root 46677 92.1 0.0 24788 19800 pts/0 R+ 15:49 28:29 /usr/bin/python3.9 /usr/lib/portage/python3.9/doins.py --preserve_symlinks --helpers_can_die --insoptions=-m0644 --diroptions=-m0755 --distdir=/var/cache/distfiles --enable_copy_xattr --xattr_exclude=btrfs.* security.evm security.ima .security.selinux system.nfs4_acl user.apache_handler .user.Beagle.* user.dublincore.* user.mime_encoding user.xdg.* --helper=doins --dest=/var/tmp/portage/dev-lang/python-3.10.1/image/etc/conf.d/ -- /var/tmp/portage/dev-lang/python-3.10.1/temp/.newins_tmp_YVWNlp/pydoc-3.10 The doins.py process seems to be infinitely looping (calling strace with the PID from the host's namespace as the container doesn't have strace): ~ # strace -p 3891753 2>&1 | head -30 strace: Process 3891753 attached lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 lseek(3, 0, SEEK_DATA) = 0 lseek(3, 0, SEEK_HOLE) = 169 copy_file_range(3, [0], 4, [0], 169, 0) = 0 Checking the process's file descriptors: 6e4871941b40 / # ls -l /proc/46677/fd total 0 lr-x------ 1 root root 64 Dec 10 15:56 0 -> /dev/null lrwx------ 1 root root 64 Dec 10 16:37 1 -> /dev/pts/2 lrwx------ 1 root root 64 Dec 10 16:37 2 -> /dev/pts/2 lr-x------ 1 root root 64 Dec 10 16:37 3 -> /var/tmp/portage/dev-lang/python-3.10.1/temp/.newins_tmp_YVWNlp/pydoc-3.10 l-wx------ 1 root root 64 Dec 10 16:37 4 -> /var/tmp/portage/dev-lang/python-3.10.1/image/etc/conf.d/pydoc-3.10 6e4871941b40 / # cat /var/tmp/portage/dev-lang/python-3.10.1/temp/.newins_tmp_YVWNlp/pydoc-3.10 # /etc/init.d/pydoc.conf # This file contains the configuration for pydoc's internal webserver. # Default port for Python's pydoc server. @PYDOC_PORT_VARIABLE@="7464" 6e4871941b40 / # cat /var/tmp/portage/dev-lang/python-3.10.1/image/etc/conf.d/pydoc-3.10 [empty] 6e4871941b40 / # du -b /var/tmp/portage/dev-lang/python-3.10.1/temp/.newins_tmp_YVWNlp/pydoc-3.10 169 /var/tmp/portage/dev-lang/python-3.10.1/temp/.newins_tmp_YVWNlp/pydoc-3.10 I don't think there's anything special about the filesystem; it's just ext4 under podman (no SELinux or anything, either). Attaching to the doins.py process with gdb and running py-bt isn't helpful; the plugin can't make sense of the backtrace, but: (gdb) bt #0 0x00007f6072296667 in lseek64 () from /lib64/libc.so.6 #1 0x00007f607121b562 in ?? () from /usr/lib/python3.9/site-packages/portage/util/file_copy/reflink_linux.cpython-39-x86_64-linux-gnu.so #2 0x00007f60724763c4 in ?? () from /usr/lib64/libpython3.9.so.1.0 #3 0x00007f607243b1cd in _PyObject_MakeTpCall () from /usr/lib64/libpython3.9.so.1.0 #4 0x00007f60723fe859 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0 #5 0x00007f60723f79db in ?? () from /usr/lib64/libpython3.9.so.1.0 #6 0x00007f60723fdd05 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0 #7 0x00007f60723f79db in ?? () from /usr/lib64/libpython3.9.so.1.0 #8 0x00007f60723fed9b in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0 #9 0x00007f60723f79db in ?? () from /usr/lib64/libpython3.9.so.1.0 #10 0x00007f60723fed9b in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0 #11 0x00007f60723f79db in ?? () from /usr/lib64/libpython3.9.so.1.0 #12 0x00007f60723fdd05 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0 #13 0x00007f60723f79db in ?? () from /usr/lib64/libpython3.9.so.1.0 #14 0x00007f60723fdd05 in _PyEval_EvalFrameDefault () from /usr/lib64/libpython3.9.so.1.0 #15 0x00007f607250064d in ?? () from /usr/lib64/libpython3.9.so.1.0 #16 0x00007f607250097e in _PyEval_EvalCodeWithName () from /usr/lib64/libpython3.9.so.1.0 #17 0x00007f60725009cb in PyEval_EvalCodeEx () from /usr/lib64/libpython3.9.so.1.0 #18 0x00007f60725009fb in PyEval_EvalCode () from /usr/lib64/libpython3.9.so.1.0 #19 0x00007f607253c993 in ?? () from /usr/lib64/libpython3.9.so.1.0 #20 0x00007f607253cba7 in ?? () from /usr/lib64/libpython3.9.so.1.0 #21 0x00007f607253f6cc in ?? () from /usr/lib64/libpython3.9.so.1.0 #22 0x00007f607253f86b in PyRun_SimpleFileExFlags () from /usr/lib64/libpython3.9.so.1.0 #23 0x00007f607255af17 in Py_RunMain () from /usr/lib64/libpython3.9.so.1.0 #24 0x00007f607255b1ca in Py_BytesMain () from /usr/lib64/libpython3.9.so.1.0 #25 0x00007f60721c61ca in ?? () from /lib64/libc.so.6 #26 0x00007f60721c6278 in __libc_start_main () from /lib64/libc.so.6 #27 0x000055f0dc830081 in _start () I'm at a loss for how to debug this further, but this seems to be easy to reproduce so I'm happy to add anything that would help debugging.
Hacking in a `-m trace -t` to the Python invocation in /usr/lib/portage/python3.9/ebuild-helpers/doins and running a triggering build eventually hangs here: --- modulename: doins, funcname: _is_install_allowed doins.py(228): try: doins.py(229): dest_lstat = os.lstat(dest) doins.py(230): except OSError as e: doins.py(233): if e.errno == errno.ENOENT: doins.py(234): return True doins.py(187): try: doins.py(188): os.unlink(dest) doins.py(189): except OSError as e: doins.py(192): if e.errno != errno.ENOENT: doins.py(194): try: doins.py(195): copyfile(source, dest) --- modulename: __init__, funcname: _optimized_copyfile __init__.py(28): with open(src, "rb", buffering=0) as src_file, open( __init__.py(29): dst, "wb", buffering=0 __init__.py(28): with open(src, "rb", buffering=0) as src_file, open( __init__.py(30): ) as dst_file: __init__.py(31): _file_copy(src_file.fileno(), dst_file.fileno())
Could you try turning off the native extensions flag on Portage?
(In reply to Sam James from comment #2) > Could you try turning off the native extensions flag on Portage? This works!
Perhaps being triggered by this while loop: https://github.com/gentoo/portage/blob/master/src/portage_util_file_copy_reflink_linux.c#L246
(In reply to John Helmert III from comment #4) > Perhaps being triggered by this while loop: > > https://github.com/gentoo/portage/blob/master/src/ > portage_util_file_copy_reflink_linux.c#L246 Indeed: (gdb) bt #0 0x00007ffb4b8f6667 in lseek64 () from /lib64/libc.so.6 #1 0x00007ffb4a885385 in do_lseek_data (fd_out=4, fd_in=3, off_out=0x7ffd333fca20) at src/portage_util_file_copy_reflink_linux.c:137 #2 0x00007ffb4a8855a6 in _reflink_linux_file_copy (self=<module at remote 0x7ffb4a8ab9a0>, Python Exception <class 'gdb.error'>: There is no member named ob_item. args=) at src/portage_util_file_copy_reflink_linux.c:247
(In reply to John Helmert III from comment #0) > ~ # strace -p 3891753 2>&1 | head -30 > strace: Process 3891753 attached > lseek(3, 0, SEEK_HOLE) = 169 > copy_file_range(3, [0], 4, [0], 169, 0) = 0 > lseek(3, 0, SEEK_DATA) = 0 > lseek(3, 0, SEEK_HOLE) = 169 > copy_file_range(3, [0], 4, [0], 169, 0) = 0 > lseek(3, 0, SEEK_DATA) = 0 > lseek(3, 0, SEEK_HOLE) = 169 It looks like copy_file_range copies zero bytes, and then we continuously retry without making any progress because copy_file_range "successfully" copies zero bytes on each try. > I don't think there's anything special about the filesystem; it's just ext4 > under podman (no SELinux or anything, either). Is /var/tmp/portage a bind mount to a plain ext4 filesystem? If not, I would expect it to using one of podmans filesystem drivers such as overlayfs or fuse-overlayfs. Anyway, I'll look into handling the case where copy_file_range copies zero bytes somehow (we can fallback to sendfile or a copy with a byte array in this case).
Created attachment 758621 [details, diff] When copy_file_range copied zero bytes, fall back to sendfile (untested) Please test this patch.
(In reply to Zac Medico from comment #7) > Created attachment 758621 [details, diff] [details, diff] > When copy_file_range copied zero bytes, fall back to sendfile (untested) > > Please test this patch. This seems to work! Tested-by: John Helmert III <ajak@gentoo.org>
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=fe2e58325ffd1d4424564998f64bed4cb4ab8ffa commit fe2e58325ffd1d4424564998f64bed4cb4ab8ffa Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2021-12-11 20:40:04 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2021-12-12 01:14:13 +0000 file_copy: handle zero bytes copied by copy_file_range (bug 828844) When copy_file_range copied zero bytes, fall back to sendfile, so that we don't call copy_file_range in an infinite loop. Bug: https://bugs.gentoo.org/828844 Tested-by: John Helmert III <ajak@gentoo.org> Signed-off-by: Zac Medico <zmedico@gentoo.org> src/portage_util_file_copy_reflink_linux.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=82fde90c8a9f82d2f5a18424100b0676fcb71f73 commit 82fde90c8a9f82d2f5a18424100b0676fcb71f73 Author: Sam James <sam@gentoo.org> AuthorDate: 2021-12-12 08:09:20 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2021-12-12 08:13:31 +0000 sys-apps/portage: add 3.0.30 Closes: https://bugs.gentoo.org/828844 Closes: https://bugs.gentoo.org/828966 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-3.0.30.ebuild | 267 +++++++++++++++++++++++++++++++++ 2 files changed, 268 insertions(+)