I have system where there are multiple chroots. When app-forensics/aflplusplus is running in one of chroots, emerge starts to say everywhere: Unable to unshare: ENOSPC (for FEATURES="ipc-sandbox network-sandbox pid-sandbox") but 'df -h' shows that there are no problem related to space. It looks like the message comes from process.py here: https://github.com/gentoo/portage/blob/7ee654ca628d0b018b781b0efba0f455d04f0a44/lib/portage/process.py#L721 How we can debug the issue?
The error comes from the 'unshare' system call. According to unshare(2) it may set errno to ENOSPC under any of the below conditions. ENOSPC (since Linux 3.7) CLONE_NEWPID was specified in flags, but the limit on the nest‐ ing depth of PID namespaces would have been exceeded; see pid_namespaces(7). ENOSPC (since Linux 4.9; beforehand EUSERS) CLONE_NEWUSER was specified in flags, and the call would cause the limit on the number of nested user namespaces to be ex‐ ceeded. See user_namespaces(7). From Linux 3.11 to Linux 4.8, the error diagnosed in this case was EUSERS. ENOSPC (since Linux 4.9) One of the values in flags specified the creation of a new user namespace, but doing so would have caused the limit defined by the corresponding file in /proc/sys/user to be exceeded. For further details, see namespaces(7). My guess is that you are somehow not cleaning up your chroots properly, and are somehow leaking PID namespaces.
Or possibly aflplusplus does something weird with PID namespaces.
06:17 <@ajak> ago: what environment produced it? Are these are "plain" chroots? Please share anything interesting about your environment related to namespaces, mounts, etc.
when afl++ runs in chroot (didn't try outside) the ENOSPC issue affects also the host, so I doubt it is anything related to the chroots. I will try to reproduce on another machine without chroots to see what happen.
(In reply to Agostino Sarubbo from comment #4) > when afl++ runs in chroot (didn't try outside) the ENOSPC issue affects also > the host, so I doubt it is anything related to the chroots. That would seem to imply that afl++ is changing some global system setting and your chroot does not protect against that.
Or possibly it is exhausting some namespace-related resource.
When afl++ is running I also get failures like this: sandbox:setup_sandbox could not read fd path: /proc/self/fd: No such file or directory /usr/lib/portage/python3.11/ebuild.sh: line 628: /usr/portage/dev-python/click/click-8.1.3.ebuild: No such file or directory * ERROR: dev-python/click-8.1.3::gentoo failed (unpack phase): * error sourcing ebuild * * Call stack: * ebuild.sh, line 628: Called die * The specific snippet of code: * source "${EBUILD}" || die "error sourcing ebuild" * * If you need support, post the output of `emerge --info '=dev-python/click-8.1.3::gentoo'`, * the complete build log and the output of `emerge -pqv '=dev-python/click-8.1.3::gentoo'`. /usr/lib/portage/python3.11/isolated-functions.sh: line 207: /var/tmp/portage/dev-python/click-8.1.3/.die_hooks: No such file or directory * The complete build log is located at '/var/log/emerge-log/build/dev-python/click-8.1.3:20230109-190612.log'. * For convenience, a symlink to the build log is located at '/var/tmp/portage/dev-python/click-8.1.3/temp/build.log'. * Working directory: '/usr/lib/python3.11/site-packages' * S: '/var/tmp/portage/dev-python/click-8.1.3/work/click-8.1.3' Traceback (most recent call last): File "/usr/lib/portage/python3.11/ebuild-ipc.py", line 319, in <module> sys.exit(ebuild_ipc_main(sys.argv[1:])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/portage/python3.11/ebuild-ipc.py", line 315, in ebuild_ipc_main return ebuild_ipc.communicate(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/portage/python3.11/ebuild-ipc.py", line 158, in communicate lock_obj = portage.locks.lockfile(self.ipc_lock_file, unlinkfile=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/portage/locks.py", line 167, in lockfile lock = _lockfile_iteration( ^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/portage/locks.py", line 242, in _lockfile_iteration raise DirectoryNotFound(os.path.dirname(mypath)) portage.exception.DirectoryNotFound: /var/tmp/portage/dev-python/click-8.1.3/.ipc * The ebuild phase 'unpack' has exited unexpectedly. This type of behavior * is known to be triggered by things such as failed variable assignments * (bug #190128) or bad substitution errors (bug #200313). Normally, before * exiting, bash should have displayed an error message above. If bash did * not produce an error message above, it's possible that the ebuild has * called `exit` when it should have called `die` instead. This behavior * may also be triggered by a corrupt bash binary or a hardware problem * such as memory or cpu malfunction. If the problem is not reproducible or * it appears to occur randomly, then it is likely to be triggered by a * hardware problem. If you suspect a hardware problem then you should try * some basic hardware diagnostics such as memtest. Please do not report * this as a bug unless it is consistently reproducible and you are sure * that your bash binary and hardware are functioning properly.
This really sounds more like an afl++ issue than a Portage bug. Please elaborate on how you have afl++ set up in case someone wants to try and reproduce the issue.
I opened an upstream discussion here: https://github.com/AFLplusplus/AFLplusplus/discussions/1613 I installed a new virtual machine for this purpose and everything is directly in the system (no chroots). It looks like I can reproduce the issue when running =sys-kernel/gentoo-kernel-bin-5.15.85-r1 I do not get the issue with a custom 4.19 kernel. To start fuzzing you need to: 1) emerge app-forensics/aflplusplus 2) recompile pax-utils with CC="afl-gcc" CXX="afl-g++" (no need to compile with asan for this reproduction) 3) mkdir -p /tmp/afl/scanelf/in /tmp/afl/scanelf/out 4) cp /bin/ls /tmp/afl/scanelf/in/ 5) afl-fuzz -i /tmp/afl/scanelf/in -o /tmp/afl/scanelf/out -t 9000 -Z /usr/bin/scanelf -a @@ (maybe under tmux) 6) emerge whatever, I use: while true ; do emerge -v1O push ; done 7) You should immediately hit the ENOSPC issue
(In reply to Agostino Sarubbo from comment #9) > It looks like I can reproduce the issue when running > =sys-kernel/gentoo-kernel-bin-5.15.85-r1 > > I do not get the issue with a custom 4.19 kernel. UPDATE: this is not about how gentoo-kernel-bin is done but it looks the kernel itself. I can reproduce with a custom 5.15, I can't reproduce with 5.10 but it is a bit hard to know exactly when it was broken