Summary: | sys-apps/portage: newconfd/doins hang when running in podman | ||
---|---|---|---|
Product: | Portage Development | Reporter: | John Helmert III <ajak> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | sam |
Priority: | Normal | Keywords: | InVCS |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://github.com/gentoo/portage/pull/778 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 635020 | ||
Attachments: | When copy_file_range copied zero bytes, fall back to sendfile (untested) |
Description
John Helmert III
![]() ![]() ![]() ![]() Hacking in a `-m trace -t` to the Python invocation in /usr/lib/portage/python3.9/ebuild-helpers/doins and running a triggering build eventually hangs here: --- modulename: doins, funcname: _is_install_allowed doins.py(228): try: doins.py(229): dest_lstat = os.lstat(dest) doins.py(230): except OSError as e: doins.py(233): if e.errno == errno.ENOENT: doins.py(234): return True doins.py(187): try: doins.py(188): os.unlink(dest) doins.py(189): except OSError as e: doins.py(192): if e.errno != errno.ENOENT: doins.py(194): try: doins.py(195): copyfile(source, dest) --- modulename: __init__, funcname: _optimized_copyfile __init__.py(28): with open(src, "rb", buffering=0) as src_file, open( __init__.py(29): dst, "wb", buffering=0 __init__.py(28): with open(src, "rb", buffering=0) as src_file, open( __init__.py(30): ) as dst_file: __init__.py(31): _file_copy(src_file.fileno(), dst_file.fileno()) Could you try turning off the native extensions flag on Portage? (In reply to Sam James from comment #2) > Could you try turning off the native extensions flag on Portage? This works! Perhaps being triggered by this while loop: https://github.com/gentoo/portage/blob/master/src/portage_util_file_copy_reflink_linux.c#L246 (In reply to John Helmert III from comment #4) > Perhaps being triggered by this while loop: > > https://github.com/gentoo/portage/blob/master/src/ > portage_util_file_copy_reflink_linux.c#L246 Indeed: (gdb) bt #0 0x00007ffb4b8f6667 in lseek64 () from /lib64/libc.so.6 #1 0x00007ffb4a885385 in do_lseek_data (fd_out=4, fd_in=3, off_out=0x7ffd333fca20) at src/portage_util_file_copy_reflink_linux.c:137 #2 0x00007ffb4a8855a6 in _reflink_linux_file_copy (self=<module at remote 0x7ffb4a8ab9a0>, Python Exception <class 'gdb.error'>: There is no member named ob_item. args=) at src/portage_util_file_copy_reflink_linux.c:247 (In reply to John Helmert III from comment #0) > ~ # strace -p 3891753 2>&1 | head -30 > strace: Process 3891753 attached > lseek(3, 0, SEEK_HOLE) = 169 > copy_file_range(3, [0], 4, [0], 169, 0) = 0 > lseek(3, 0, SEEK_DATA) = 0 > lseek(3, 0, SEEK_HOLE) = 169 > copy_file_range(3, [0], 4, [0], 169, 0) = 0 > lseek(3, 0, SEEK_DATA) = 0 > lseek(3, 0, SEEK_HOLE) = 169 It looks like copy_file_range copies zero bytes, and then we continuously retry without making any progress because copy_file_range "successfully" copies zero bytes on each try. > I don't think there's anything special about the filesystem; it's just ext4 > under podman (no SELinux or anything, either). Is /var/tmp/portage a bind mount to a plain ext4 filesystem? If not, I would expect it to using one of podmans filesystem drivers such as overlayfs or fuse-overlayfs. Anyway, I'll look into handling the case where copy_file_range copies zero bytes somehow (we can fallback to sendfile or a copy with a byte array in this case). Created attachment 758621 [details, diff]
When copy_file_range copied zero bytes, fall back to sendfile (untested)
Please test this patch.
(In reply to Zac Medico from comment #7) > Created attachment 758621 [details, diff] [details, diff] > When copy_file_range copied zero bytes, fall back to sendfile (untested) > > Please test this patch. This seems to work! Tested-by: John Helmert III <ajak@gentoo.org> The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=fe2e58325ffd1d4424564998f64bed4cb4ab8ffa commit fe2e58325ffd1d4424564998f64bed4cb4ab8ffa Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2021-12-11 20:40:04 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2021-12-12 01:14:13 +0000 file_copy: handle zero bytes copied by copy_file_range (bug 828844) When copy_file_range copied zero bytes, fall back to sendfile, so that we don't call copy_file_range in an infinite loop. Bug: https://bugs.gentoo.org/828844 Tested-by: John Helmert III <ajak@gentoo.org> Signed-off-by: Zac Medico <zmedico@gentoo.org> src/portage_util_file_copy_reflink_linux.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=82fde90c8a9f82d2f5a18424100b0676fcb71f73 commit 82fde90c8a9f82d2f5a18424100b0676fcb71f73 Author: Sam James <sam@gentoo.org> AuthorDate: 2021-12-12 08:09:20 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2021-12-12 08:13:31 +0000 sys-apps/portage: add 3.0.30 Closes: https://bugs.gentoo.org/828844 Closes: https://bugs.gentoo.org/828966 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-3.0.30.ebuild | 267 +++++++++++++++++++++++++++++++++ 2 files changed, 268 insertions(+) |