Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 890305 - =sys-apps/portage-3.0.41-r2: Unable to unshare: ENOSPC (for FEATURES="ipc-sandbox network-sandbox pid-sandbox")
Summary: =sys-apps/portage-3.0.41-r2: Unable to unshare: ENOSPC (for FEATURES="ipc-san...
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-09 19:24 UTC by Agostino Sarubbo
Modified: 2023-01-11 18:15 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Agostino Sarubbo gentoo-dev 2023-01-09 19:24:59 UTC
I have system where there are multiple chroots.

When app-forensics/aflplusplus is running in one of chroots, emerge starts to say everywhere:
Unable to unshare: ENOSPC (for FEATURES="ipc-sandbox network-sandbox pid-sandbox")

but 'df -h' shows that there are no problem related to space.

It looks like the message comes from process.py here:
https://github.com/gentoo/portage/blob/7ee654ca628d0b018b781b0efba0f455d04f0a44/lib/portage/process.py#L721


How we can debug the issue?
Comment 1 Mike Gilbert gentoo-dev 2023-01-09 19:36:01 UTC
The error comes from the 'unshare' system call. According to unshare(2) it may set errno to ENOSPC under any of the below conditions.

       ENOSPC (since Linux 3.7)
              CLONE_NEWPID  was specified in flags, but the limit on the nest‐
              ing depth of  PID  namespaces  would  have  been  exceeded;  see
              pid_namespaces(7).

       ENOSPC (since Linux 4.9; beforehand EUSERS)
              CLONE_NEWUSER  was  specified in flags, and the call would cause
              the limit on the number of nested  user  namespaces  to  be  ex‐
              ceeded.  See user_namespaces(7).

              From  Linux  3.11 to Linux 4.8, the error diagnosed in this case
              was EUSERS.

       ENOSPC (since Linux 4.9)
              One of the values in flags specified the creation of a new  user
              namespace,  but  doing so would have caused the limit defined by
              the corresponding file in /proc/sys/user to  be  exceeded.   For
              further details, see namespaces(7).

My guess is that you are somehow not cleaning up your chroots properly, and are somehow leaking PID namespaces.
Comment 2 Mike Gilbert gentoo-dev 2023-01-09 19:37:32 UTC
Or possibly aflplusplus does something weird with PID namespaces.
Comment 3 John Helmert III archtester Gentoo Infrastructure gentoo-dev Security 2023-01-09 19:49:54 UTC
06:17 <@ajak> ago: what environment produced it?

Are these are "plain" chroots? Please share anything interesting about your environment related to namespaces, mounts, etc.
Comment 4 Agostino Sarubbo gentoo-dev 2023-01-09 21:39:38 UTC
when afl++ runs in chroot (didn't try outside) the ENOSPC issue affects also the host, so I doubt it is anything related to the chroots.

I will try to reproduce on another machine without chroots to see what happen.
Comment 5 Mike Gilbert gentoo-dev 2023-01-09 21:58:05 UTC
(In reply to Agostino Sarubbo from comment #4)
> when afl++ runs in chroot (didn't try outside) the ENOSPC issue affects also
> the host, so I doubt it is anything related to the chroots.

That would seem to imply that afl++ is changing some global system setting and your chroot does not protect against that.
Comment 6 Mike Gilbert gentoo-dev 2023-01-09 22:06:20 UTC
Or possibly it is exhausting some namespace-related resource.
Comment 7 Agostino Sarubbo gentoo-dev 2023-01-11 05:19:39 UTC
When afl++ is running I also get failures like this:

sandbox:setup_sandbox  could not read fd path: /proc/self/fd: No such file or directory

/usr/lib/portage/python3.11/ebuild.sh: line 628: /usr/portage/dev-python/click/click-8.1.3.ebuild: No such file or directory
 * ERROR: dev-python/click-8.1.3::gentoo failed (unpack phase):
 *   error sourcing ebuild
 * 
 * Call stack:
 *   ebuild.sh, line 628:  Called die
 * The specific snippet of code:
 *                      source "${EBUILD}" || die "error sourcing ebuild"
 * 
 * If you need support, post the output of `emerge --info '=dev-python/click-8.1.3::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-python/click-8.1.3::gentoo'`.
/usr/lib/portage/python3.11/isolated-functions.sh: line 207: /var/tmp/portage/dev-python/click-8.1.3/.die_hooks: No such file or directory
 * The complete build log is located at '/var/log/emerge-log/build/dev-python/click-8.1.3:20230109-190612.log'.
 * For convenience, a symlink to the build log is located at '/var/tmp/portage/dev-python/click-8.1.3/temp/build.log'.
 * Working directory: '/usr/lib/python3.11/site-packages'
 * S: '/var/tmp/portage/dev-python/click-8.1.3/work/click-8.1.3'
Traceback (most recent call last):
  File "/usr/lib/portage/python3.11/ebuild-ipc.py", line 319, in <module>
    sys.exit(ebuild_ipc_main(sys.argv[1:]))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/portage/python3.11/ebuild-ipc.py", line 315, in ebuild_ipc_main
    return ebuild_ipc.communicate(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/portage/python3.11/ebuild-ipc.py", line 158, in communicate
    lock_obj = portage.locks.lockfile(self.ipc_lock_file, unlinkfile=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/portage/locks.py", line 167, in lockfile
    lock = _lockfile_iteration(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/portage/locks.py", line 242, in _lockfile_iteration
    raise DirectoryNotFound(os.path.dirname(mypath))
portage.exception.DirectoryNotFound: /var/tmp/portage/dev-python/click-8.1.3/.ipc
 * The ebuild phase 'unpack' has exited unexpectedly. This type of behavior
 * is known to be triggered by things such as failed variable assignments
 * (bug #190128) or bad substitution errors (bug #200313). Normally, before
 * exiting, bash should have displayed an error message above. If bash did
 * not produce an error message above, it's possible that the ebuild has
 * called `exit` when it should have called `die` instead. This behavior
 * may also be triggered by a corrupt bash binary or a hardware problem
 * such as memory or cpu malfunction. If the problem is not reproducible or
 * it appears to occur randomly, then it is likely to be triggered by a
 * hardware problem. If you suspect a hardware problem then you should try
 * some basic hardware diagnostics such as memtest. Please do not report
 * this as a bug unless it is consistently reproducible and you are sure
 * that your bash binary and hardware are functioning properly.
Comment 8 Mike Gilbert gentoo-dev 2023-01-11 16:17:04 UTC
This really sounds more like an afl++ issue than a Portage bug.

Please elaborate on how you have afl++ set up in case someone wants to try and reproduce the issue.
Comment 9 Agostino Sarubbo gentoo-dev 2023-01-11 16:48:16 UTC
I opened an upstream discussion here:
https://github.com/AFLplusplus/AFLplusplus/discussions/1613

I installed a new virtual machine for this purpose and everything is directly in the system (no chroots).
It looks like I can reproduce the issue when running =sys-kernel/gentoo-kernel-bin-5.15.85-r1

I do not get the issue with a custom 4.19 kernel.


To start fuzzing you need to:
1) emerge app-forensics/aflplusplus
2) recompile pax-utils with CC="afl-gcc" CXX="afl-g++" (no need to compile with asan for this reproduction)
3) mkdir -p /tmp/afl/scanelf/in /tmp/afl/scanelf/out
4) cp /bin/ls /tmp/afl/scanelf/in/
5) afl-fuzz -i /tmp/afl/scanelf/in -o /tmp/afl/scanelf/out -t 9000 -Z  /usr/bin/scanelf -a @@ (maybe under tmux)
6) emerge whatever, I use: while true ; do emerge -v1O push ; done
7) You should immediately hit the ENOSPC issue
Comment 10 Agostino Sarubbo gentoo-dev 2023-01-11 18:15:11 UTC
(In reply to Agostino Sarubbo from comment #9)
> It looks like I can reproduce the issue when running
> =sys-kernel/gentoo-kernel-bin-5.15.85-r1
> 
> I do not get the issue with a custom 4.19 kernel.

UPDATE:

this is not about how gentoo-kernel-bin is done but it looks the kernel itself. I can reproduce with a custom 5.15, I can't reproduce with 5.10 but it is a bit hard to know exactly when it was broken