925031 – sys-apps/sandbox: makes unpacking much slower

Bug 925031 - sys-apps/sandbox: makes unpacking much slower

Summary: sys-apps/sandbox: makes unpacking much slower

Status:	CONFIRMED

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Sandbox (show other bugs)
Hardware:	All Linux

Importance:	Normal critical
Assignee:	Sandbox Maintainers

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2024-02-20 02:17 UTC by Sam James
Modified:	2025-02-23 03:21 UTC (History)
CC List:	10 users (show)

See Also:	910273 925032 447970
Package list:
Runtime testing required:	---

Attachments
emerge --info (file_925031.txt,10.00 KB, text/plain) 2024-02-20 02:37 UTC, Sam James	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sam James archtester

2024-02-20 02:17:27 UTC

For a while now, I've noticed Portage hanging when unpacking GCC:
```
portage  1495795  2.6  1.5 1178528 1014644 pts/27 SNl+ 22:31   0:04  |                       |                   \_ xz -T32 -d -c -- /var/tmp/portage/cross-aarch64_be-unknown-linux-gnu/gcc-14.0.1_pre20240218/distdir/gcc-14-20240218.tar.xz
portage  1495796 83.5  0.0   8400  3376 pts/27   RN+  22:31   2:11  |                       |                   \_ tar xof -
```

It often takes several minutes.

Running things manually:
``` (no pipes)
# time xz -T32 -d -c /var/tmp/portage/cross-aarch64_be-unknown-linux-gnu/gcc-14.0.1_pre20240218/distdir/gcc-14-20240218.tar.xz > /var/tmp/portage/test
real    0m3.926s
user    0m2.937s
sys     0m3.603s

# time tar xof /var/tmp/portage/test > /var/tmp/portage/test2

real    0m10.285s
user    0m0.357s
sys     0m8.036s
```
``` (pipe)
/var/tmp/portage # time xz -T32 -d -c -- /var/tmp/portage/cross-aarch64_be-unknown-linux-gnu/gcc-14.0.1_pre20240218/distdir/gcc-14-20240218.tar.xz | tar xof -
real    0m7.706s
user    0m3.364s
sys     0m7.865s
```

Then finally:
``` (pipe + sandbox)
/var/tmp/portage # time xz -T32 -d -c -- /var/tmp/portage/cross-aarch64_be-unknown-linux-gnu/gcc-14.0.1_pre20240218/distdir/gcc-14-20240218.tar.xz | tar xof -
real    0m25.906s
user    0m5.380s
sys     0m23.741s
```

Running perf (perf record --call-graph dwarf -g -- sandbox 'xz -T32 -d -c -- /var/tmp/portage/cross-aarch64_be-unknown-linux-gnu/gcc-14.0.1_pre20240218/distdir/gcc-14-20240218.tar.xz | tar xof -'), the third entry in the profile is:
```
    47.03%     0.22%  tar      libsandbox.so         [.] before_syscall.localalias
             --46.81%--before_syscall.localalias
                       |
                       |--12.71%--free.localalias
[...]
                       |--11.95%--realpath
                       |          |
                       |          |--10.94%--readlink
                       |          |          |
                       |          |          |--10.17%--entry_SYSCALL_64_after_hwframe
                       |          |          |          |
                       |          |          |           --10.04%--do_syscall_64
                       |          |          |                     |
                       |          |          |                      --9.66%--__x64_sys_readlink
                       |          |          |                                |
                       |          |          |                                 --9.55%--do_readlinkat
                       |          |          |                                           |
                       |          |          |                                            --9.13%--user_path_at_empty
                       |          |          |                                                      |
                       |          |          |                                                      |--6.44%--filename_lookup
                       |          |          |                                                      |          |
                       |          |          |                                                      |           --6.22%--path_lookupat
[...]
                       |--11.81%--__xmalloc
                       |          |
                       |           --11.80%--malloc.localalias
                       |                     |
                       |                     |--6.74%--__mmap
                       |                     |          |
                       |                     |           --6.14%--entry_SYSCALL_64_after_hwframe
                       |                     |                     |
                       |                     |                      --6.09%--do_syscall_64
                       |                     |                                |
                       |                     |                                 --5.91%--vm_mmap_pgoff
                       |                     |                                           |
                       |                     |                                            --5.61%--do_mmap
                       |                     |                                                      |
                       |                     |                                                       --5.33%--mmap_region
[...]
                        --9.20%--canonicalize_filename_mode.localalias
                                  |
                                  |--6.04%--fstatat64
                                  |          |
                                  |           --5.63%--entry_SYSCALL_64_after_hwframe
                                  |                     |
                                  |                      --5.57%--do_syscall_64
                                  |                                |
                                  |                                 --5.35%--__do_sys_newfstatat
                                  |                                           |
                                  |                                            --5.04%--vfs_fstatat
                                  |                                                      |
                                  |                                                      |--3.67%--vfs_statx
[...]
```

Comment 1 Sam James archtester

2024-02-20 02:35:19 UTC

With /var/tmp/portage on zram (w/ zstd), I get:
```
$ hyperfine        "sandbox 'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -"             'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'
hyperfine       \
        "sandbox 'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'"            \
        'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'
Benchmark 1: sandbox 'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'
  Time (mean ± σ):     36.998 s ± 23.093 s    [User: 5.517 s, System: 33.107 s]
  Range (min … max):   27.402 s … 102.629 s    10 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (102.629 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -
  Time (mean ± σ):      9.193 s ±  1.781 s    [User: 3.641 s, System: 9.261 s]
  Range (min … max):    8.366 s … 14.213 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof - ran
    4.02 ± 2.63 times faster than sandbox 'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'
```

The first run was much slower, which is consistent with what I've seen in-the-wild, presumably because of caches (I could use the hyperfine options for it blah blah but I don't really think it would demonstrate anything here).

Comment 2 Sam James archtester

2024-02-20 02:36:57 UTC

While I appreciate that some slowdown is kind of unavoidable for what sandbox does, it feels like a huge waste and for the overhead to be ~2-4x (in reality, it's really 4, given nobody is unpacking and then unpacking repeatedly, and it seems to consistently be around that for me on cold runs) to be excessive.

Comment 3 Sam James archtester

2024-02-20 02:37:34 UTC

Created attachment 885479 [details]
emerge --info

Comment 4 Sam James archtester

2024-02-20 10:34:19 UTC

I think I might be hitting two distinct issues here, actually.

1. sandbox adds some not-insignificant amount of time to tar extraction
2. ext4 on zram tmpfs has some sort of pathological inode allocation problem?

When the profile is good, the hot paths are all in liblzma, as you'd expect. When it's bad, it's in tar in ext4's find_inode_bit/find_get_block/ext4_create/ext4_new_inode.

If I change the zram tmpfs to xfs, things become way happier, the peak is way lower, but I also can't reproduce the serious spikes either.

So, I think we might be able to put 2. down to a kernel bug?

But for 1., it's still there, just less significant than before:
```
# hyperfine                "sandbox 'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'"                    'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'
Benchmark 1: sandbox 'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'
  Time (mean ± σ):     12.101 s ±  1.613 s    [User: 4.886 s, System: 9.578 s]
  Range (min … max):    7.578 s … 13.040 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -
  Time (mean ± σ):      4.271 s ±  0.095 s    [User: 3.430 s, System: 3.210 s]
  Range (min … max):    4.158 s …  4.514 s    10 runs

Summary
  xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof - ran
    2.83 ± 0.38 times faster than sandbox 'xz -T32 -d -c -- /var/cache/distfiles/gcc-14-20240218.tar.xz | tar xof -'
```

2x is still a pretty big differnce, but there's not this huge range of several minutes anymore.

Comment 5 Sam James archtester

2024-02-20 10:36:39 UTC

Also, the profile w/ the xfs zram tmpfs is way more like you'd expect - some in the kernel, but a healthy mix of liblzma + tar + sandbox in tar.

Comment 6 Matt Whitlock 2024-11-27 23:46:40 UTC

A question that has been nagging at the back of my mind about this for some time now: why are we still implementing sandbox via an LD_PRELOAD wedge in the age of mount namespaces? LD_PRELOAD="libsandbox.so" isn't airtight in the face of statically linked executables or code that directly invokes filesystem syscalls without going through libc wrapper functions. We could plug those holes and eliminate all of the performance issues that have vexed sandbox by switching to an in-kernel solution in the form of mount namespaces. (Ironically, sandbox supports initializing a new mount namespace with a minimal /dev, but it totally ignores the killer namespace feature that would eliminate the need for LD_PRELOAD.)

In short, it is possible to bind mount "/" on top of itself in a new mount namespace and then remount the bind mount as read-only. The read-only flag applies only to the bind mount, not to the original mount (or to the filesystem instance to which it refers). The effect is that processes in the new mount namespace behave as though the file system mounted at "/" is read-only, even though the underlying filesystem instance and the mount referencing it in the outer namespace are read-write. Exceptions for subtrees that should allow write access from inside the sandbox can be created by similarly bind mounting those subtrees in place but without remounting the bind mounts as read-only. After all exceptions have been added, the entire hierarchy can be locked down — preventing even privileged processes from changing the read-only flags of the bind mounts or unmounting any of them — by switching to a new user namespace and then switching to another new mount namespace (conceptually a grandchild of the original mount namespace). The kernel then will disallow any modifications to the mounts propagated from the "more privileged mount namespace" to the "less privileged mount namespace." See mount_namespaces(7) for more details.

There are a couple of drawbacks I can see with this approach:

* It would no longer be possible for processes running inside the sandbox to add new exceptions on the fly by modifying environment variables. This, however, is arguably a benefit, not a drawback.

* It becomes probably impossible (or at least very gross) to allow write access but not read access to a given subtree. (The possible bind mount modes are no-access, read-only, or read-write. There is no write-only mode.) I suspect, however, that essentially all use cases for sandbox are concerned only about restricting write access and do not care about restricting read access.

If there is interest in redesigning sandbox to make use of mount namespaces to implement access control (and in sunsetting the LD_PRELOAD wedge), I would be willing to work up a proof of concept.

Comment 7 Matt Whitlock 2024-11-28 10:34:40 UTC

(In reply to Matt Whitlock from comment #6)
> In short, it is possible to bind mount "/" on top of itself in a new mount
> namespace and then remount the bind mount as read-only.

Experimentation reveals that the bind mounting step is superfluous. All of the mounts in a new mount namespace are already independent copies of the mounts in the original namespace, so the read-only flag can be set on them right away, without the need for any bind mounting.

> There are a couple of drawbacks I can see with this approach

Thinking about it further, I can see some more drawbacks:

* There is no longer any provision for generating a log of sandbox violations or for failing an ebuild phase function that has violated the sandbox at some point during its execution. Sandbox violations would manifest solely as errors returned from system calls, and it would be up to the build process to translate those errors into failures (or not).

* Writable device nodes are still writable even when they exist beneath a read-only mountpoint. Sandbox currently implements support for substituting the contents of /dev with a minimal set of "safe" device nodes (full, null, ptmx, tty, urandom, zero, and the "pts" subdirectory), but if ebuilds expect to be able to find other device nodes beneath /dev, then this can't be used. I can think of no way, aside from the LD_PRELOAD wedge, to allow a privileged process to read from a device node but to prohibit it from writing to it. (Unprivileged processes respect the file permissions of device nodes, but privileged processes ignore them.)

If I can find some way to make open("/dev/foo", O_RDWR) fail for a privileged user while O_RDONLY succeeds, then I do think there could be a way for mount namespaces to replace all of the functionality of the LD_PRELOAD wedge except for the violation log. I'll have to ruminate on it.

Comment 8 YiFei Zhu 2024-11-28 15:46:07 UTC

> Unprivileged processes respect the file permissions of device nodes, but privileged processes ignore them.

How privileged? If you don't have CAP_DAC_OVERRIDE, the DAC is respected:

```
zhuyifei1999@zhuyifei1999-ThinkPad-P14s-Gen-4a ~ $ sudo -i
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # ls -l /dev/console
crw--w---- 1 root tty 5, 1 Nov 28 07:42 /dev/console
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # echo 1 > /dev/console
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # capsh --drop=cap_dac_override --
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # echo 1 > /dev/console
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # chown nobody:nogroup /dev/console
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # echo 1 > /dev/console
bash: /dev/console: Permission denied
```

Alternatively, since you are using userns anyways, CAP_DAC_OVERRIDE is useless if the IDs of the file is not mapped in the userns:

```
zhuyifei1999@zhuyifei1999-ThinkPad-P14s-Gen-4a ~ $ ls -l /dev/console
crw--w---- 1 root tty 5, 1 Nov 28 07:42 /dev/console
zhuyifei1999@zhuyifei1999-ThinkPad-P14s-Gen-4a ~ $ unshare -r
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # ls -l /dev/console
crw--w---- 1 nobody nobody 5, 1 Nov 28 07:42 /dev/console
zhuyifei1999-ThinkPad-P14s-Gen-4a ~ # echo 1 > /dev/console
-bash: /dev/console: Permission denied
```

Comment 9 Mike Gilbert gentoo-dev

2024-11-28 16:04:57 UTC

I think trying to hack together a "sandbox" using mount namespaces will be troublesome.

From a security perspective, I think it makes more sense to build binpkgs inside a full fledged container managed by docker, podman, systemd-nspawn, etc.

Comment 10 Michał Górny archtester

2024-11-28 16:52:00 UTC

> If there is interest in redesigning sandbox to make use of mount namespaces to
> implement access control (and in sunsetting the LD_PRELOAD wedge), I would be
> willing to work up a proof of concept.

The interest for a new and better sandbox has always been there (see, e.g. fusebox GSoC project).  However, it's a lot of work, and needs someone dedicated to do it all, integrate with Portage and maintain afterwards.

Also, note that sandbox is *not a security* thing.  It's a QA thing.  While I don't mind extending it, you need to make sure that either everything keeps working, or can be reasonably easily fixed.  Furthermore, having access violations reported rather than silently squandered is also a very useful feature.

Comment 11 Arsen Arsenović gentoo-dev

2024-11-28 18:30:21 UTC

FWIW I've already implemented part of that, but never got it integrated into portage: https://git.sr.ht/~arsen/sandbox2

In retrospect I might've overthought some issues here

The implementation is rather simplistic but also had a goal of increasing build reliability by masking off parts of the environment not owned by a dependency package.  This lead it to generating a whole new directory structure in a tmpfs.

(In reply to Matt Whitlock from comment #6)
> There are a couple of drawbacks I can see with this approach:
> 
> * It would no longer be possible for processes running inside the sandbox to
> add new exceptions on the fly by modifying environment variables. This,
> however, is arguably a benefit, not a drawback.
I don't see why - just needs some IPC.  Sure, it won't work by editing env, but we have a wrapper around the env editing anyway.
 
> * It becomes probably impossible (or at least very gross) to allow write
> access but not read access to a given subtree. (The possible bind mount
> modes are no-access, read-only, or read-write. There is no write-only mode.)
> I suspect, however, that essentially all use cases for sandbox are concerned
> only about restricting write access and do not care about restricting read
> access.
I see no reason why that'd be useful, anyway

> * There is no longer any provision for generating a log of sandbox violations or for failing an ebuild phase function that has violated the sandbox at some point during its execution. Sandbox violations would manifest solely as errors returned from system calls, and it would be up to the build process to translate those errors into failures (or not).
I think this is fine, personally.  Getting a more controlled build environment is good.  However, it is worth testing how fanotify would interact with this (I had intended to do that in sandbox2, but never got around to it).

Heck, I wouldn't mind if portage started using ostree.

Comment 12 Sam James archtester

2024-11-28 18:32:35 UTC

(In reply to Arsen Arsenović from comment #11) 
> > * There is no longer any provision for generating a log of sandbox violations or for failing an ebuild phase function that has violated the sandbox at some point during its execution. Sandbox violations would manifest solely as errors returned from system calls, and it would be up to the build process to translate those errors into failures (or not).
> I think this is fine, personally.  Getting a more controlled build
> environment is good.  However, it is worth testing how fanotify would
> interact with this (I had intended to do that in sandbox2, but never got
> around to it).

I think it's more of a problem in the lens of sandbox being a QA tool. Not all platforms (e.g. prefix) can use the sandbox(es). So if stuff is silently falling back, that might be a problem.

But it's also possible that it's not a real problem in reality and we're overthinking it.

I definitely do not want to discourage work on doing something better, as I think we're all agreed the status quo is poor.

Comment 13 Mike Gilbert gentoo-dev

2024-11-28 20:26:50 UTC

To satisfy the QA aspect, I wonder if it would be more efficient to utilize fanotify, or even dtrace.

We couldn't use that to block the system calls, but we could at least do some log analysis after each ebuild phase is complete.

Comment 14 Mike Gilbert gentoo-dev

2024-11-28 20:30:34 UTC

Oh, I see fanotify has already been mentioned. It looks like that also allows file access to be controlled via permission events.

Comment 15 Matt Whitlock 2024-11-28 20:45:11 UTC

(In reply to YiFei Zhu from comment #8)
> > Unprivileged processes respect the file permissions of device nodes, but privileged processes ignore them.
> 
> How privileged? If you don't have CAP_DAC_OVERRIDE, the DAC is respected

Right, but I'm not sure we can drop that capability since the principal use of sandbox is by root running the src_install phase. We need to be root (or at least have a complete set of filesystem capabilities) so we can manipulate file ownerships and permissions, set trusted extended attributes, create device nodes, etc. I could foresee some src_install functions failing if they don't have CAP_DAC_OVERRIDE.

> Alternatively, since you are using userns anyways, CAP_DAC_OVERRIDE is
> useless if the IDs of the file is not mapped in the userns

Yes, but root has to be mapped in the user namespace for all the aforementioned reasons. We need the user that is executing src_install to have all filesystem capabilities within ${D}, but we want it to have no filesystem capabilities outside of ${D}. Unfortunately, capabilities are a bitmask, not a rule set.

Very thoughtful comment, though. Thank you.

Comment 16 Matt Whitlock 2024-11-28 21:41:23 UTC

(In reply to Mike Gilbert from comment #14)
> Oh, I see fanotify has already been mentioned. It looks like that also
> allows file access to be controlled via permission events.

fanotify looks like it provides functionality that could exactly match the features of the current sandbox when combined with a mount namespace to constrain the events to those generated by the sandboxed processes only. The only hole I see is: if a sandboxed process creates a new mount, then we won't receive events for objects accessed via that new mount. fanotify does not allow us to deny creating new mounts or even to detect when a new mount is created. If we are willing to drop CAP_SYS_ADMIN inside the sandbox, then creating new mounts would be disallowed. (I don't see any reason why any ebuild would need CAP_SYS_ADMIN, so I think this is feasible.)

Comment 17 Sam James archtester

2024-11-28 22:58:44 UTC

(In reply to Mike Gilbert from comment #13)
> To satisfy the QA aspect, I wonder if it would be more efficient to utilize
> fanotify, or even dtrace.
> 
> We couldn't use that to block the system calls, but we could at least do
> some log analysis after each ebuild phase is complete.

Interesting: https://github.com/oracle/solaris-userland/blob/f9be47ec2478f540cd09daa3e083f94355999703/tools/build-watch.d.

Comment 18 Sam James archtester

2024-11-28 23:03:14 UTC

(In reply to Matt Whitlock from comment #16)

This sounds OK to me.

Comment 19 Matt Whitlock 2024-11-28 23:31:28 UTC

(In reply to Matt Whitlock from comment #16)
> fanotify looks like it provides functionality that could exactly match the
> features of the current sandbox

I may have spoken too soon. I'm not sure fanotify can provide notification of metadata changes. For example, if the `install` recipe of a well-meaning Makefile does `chmod a+rw /dev/dri/card[0-9]*`, I'm not sure fanotify would let us intercept that and deny it or even notice that it happened. We can use the read-only mount trick to prevent metadata changes outside of allowed subtrees, but that won't provide any audit log for QA purposes. Dtrace may be the only way to go.

Comment 20 Eli Schwartz gentoo-dev

2024-11-28 23:33:53 UTC

It bears repeatedly noting as many times as conceivably possible, that sandbox is not a security function, it is a QA function.

Any and all proposals to implement security functions are at best something that has to demonstrate it doesn't decrease the usefulness of a QA tool, and at worst a strawman waste of time.

There are many things that could be done, undoubtedly, to increase the usefulness as a QA tool and which have a side effect of having useful security properties. No one is saying that the current design of sandbox is the best possible world.

I just don't get any people keep on harping about security. It's always a distracting time sink away from being productive about the topic. Taking away the ability to define certain access patterns as acceptable ***from within a so-called "untrusted" ebuild*** doesn't seem like a reasonable objective to me. And removing the ability to generate violation logs renders the entire thing *completely* pointless. Not mostly pointless -- completely pointless. If I just wanted to stop malicious ebuilds from attacking my system I would simply build packages in docker and wait to get hacked by pkg_postinst, or, you know, have it compile a malicious executable and install it to /usr/bin (can be combined with a pkg_postinst that dodges suspicion from casual readers by claiming it needs to rebuild an index the way XDG desktop files or vim help docs do).

If that's not good enough for you exherbo is right around the corner waiting for you. I hear they have a very nice OCI runtime alternative that you can use with docker/podman to replace runc/crun, which they use to replace `sandbox` as well. Literally docker for your ebuilds. Enjoy.

Comment 21 Eli Schwartz gentoo-dev

2024-11-28 23:43:33 UTC

On the topic of running src_install as root, you could always just not do that. There are other package managers that routinely use `fakeroot` to run ALL package builds where the install phase is bamboozled into thinking it runs as root, and metadata such as file permissions and whatnot are tracked by the fakeroot environment and passed through to tar at the time of creating a binpackage.

That's actually the only reason you need root at install time by the way -- so that you can translate Makefiles that want to set file ownership etc into a permission set that portage can apply itself e.g. when writing out tarfile metadata or reading it in. (All processes running under fakeroot remember the bamboozled set of file permissions which bamboozled processes are creating or modifying.)

Root processes can of course get permission errors -- and so can fakeroot. A fakeroot install that tries to install directly to /usr will fail with permission denied, whether or not it tries to escape the sandbox at all.

Comment 22 Matt Whitlock 2024-11-29 00:01:15 UTC

Alright, Eli. You've successfully killed my interest in contributing my sandbox improvements back to Gentoo. I'll shut up and go away after this comment.

If Gentoo's "sandbox" is only meant as a QA tool, then it's misleadingly named. A "sandbox," as other applications (e.g., Chromium) use the term, *is* a security container first and foremost, meant to prevent untrusted application code from escaping the container.

It's not that I don't trust the ebuilds or their authors' intentions; it's that I don't trust the upstream projects not to introduce subtle, *well-meaning* changes in their build scripts that will nevertheless compromise the cleanliness of my system, and I don't trust the Gentoo package maintainers to notice when such subtle changes are introduced. (Of course sandbox cannot thwart an upstream project that introduces *malicious* code, but that's not the threat model that I expect sandbox to address.)

Comment 23 Sam James archtester

2024-11-29 00:10:04 UTC

(In reply to Matt Whitlock from comment #22)
> 
> If Gentoo's "sandbox" is only meant as a QA tool, then it's misleadingly
> named. A "sandbox," as other applications (e.g., Chromium) use the term,
> *is* a security container first and foremost, meant to prevent untrusted
> application code from escaping the container.
> [...]

That is what we're saying when we call it a QA tool though. We're just saying it's not supposed to be robust against malicious upstream projects.

That said, I'm not sure why Eli commented that -- to me, it looked like the discussion wasn't fixating on it being a security tool at all. I assume he was just responding to Mike's earlier remark.

Comment 24 Sam James archtester

2024-11-29 00:10:59 UTC

(I quoted the wrong part there -- obviously I meant your last sentence. Anyway, I don't really think there's any actual disagreement here.)

Comment 25 Sam James archtester

2024-11-29 00:18:05 UTC

(In reply to Matt Whitlock from comment #22)
> Alright, Eli. You've successfully killed my interest in contributing my
> sandbox improvements back to Gentoo. I'll shut up and go away after this
> comment.

I hope you can reconsider. The discussion was taking a productive direction.

Comment 26 Eli Schwartz gentoo-dev

2024-11-29 00:41:25 UTC

(In reply to Matt Whitlock from comment #22)
> If Gentoo's "sandbox" is only meant as a QA tool, then it's misleadingly
> named. A "sandbox," as other applications (e.g., Chromium) use the term,
> *is* a security container first and foremost, meant to prevent untrusted
> application code from escaping the container.

It is not misleadingly named! Your definition of sandbox isn't the common definition of sandbox.


https://en.wikipedia.org/wiki/Sandbox_(software_development)

"Not to be confused with Sandbox (computer security)"

> A sandbox is a testing environment that isolates untested code changes
> and outright experimentation from the production environment or [...]

> Sandboxing (see also ' soft launching') is often considered a best practice
> when making any changes to a system, regardless of whether that change is
> considered 'development', a modification of configuration state, or updating the system.

> Wikis also typically employ a shared sandbox model of testing, though it is
> intended principally for learning and outright experimentation with features

https://en.wikipedia.org/wiki/Sandbox_(computer_security)

"This article is about the computer security mechanism. For the software testing environment, see Sandbox (software development)."

> In computer security, a sandbox is a security mechanism for separating running
> programs, usually in an effort to mitigate system failures and/or software
> vulnerabilities from spreading.

And as a matter of curiosity, consider that the sandbox package in gentoo is a very old project. It was started in 2002, and already had the name "sandbox". It was intended to test software. It's half a decade older than Chromium, and used the term "sandbox" long before Chromium did.

And that's the thing. A "sandbox" in computer science is just like a sandbox in children's playtime: a place where you can play and experiment. Sandboxes can be *used* for security, and security is one of the most interesting things you can do with a sandbox, to most people, but a sandbox isn't about security.

A "security sandbox" is about security. You *have* to use both words together.



(In reply to Matt Whitlock from comment #22)
> It's not that I don't trust the ebuilds or their authors' intentions; it's
> that I don't trust the upstream projects not to introduce subtle,
> *well-meaning* changes in their build scripts that will nevertheless
> compromise the cleanliness of my system, and I don't trust the Gentoo
> package maintainers to notice when such subtle changes are introduced. (Of
> course sandbox cannot thwart an upstream project that introduces *malicious*
> code, but that's not the threat model that I expect sandbox to address.)


And yet, the sandbox as it is today does a pretty good job of handling the cases where subtle but well meaning changes are introduced! It's just really slow at it sometimes.

How do you circumvent the gentoo sandbox? Well, there are two main ways:

- delete LD_PRELOAD
- set variables named SANDBOX_WRITE etc.

No well meaning build script will do the latter, and generally no well-meaning build script will do the former either (and it will also break other distributions that use fakeroot).

- statically linked executables

QA violation in its own right. Also, probably no well meaning build script will compile throwaway helper scripts and take care to statically link them just to throw them away after the build completes.

- or code that directly invokes filesystem syscalls without going through libc wrapper functions

Is this rooted in a practical scenario? Because there's a pretty good reason even every brand new language that bundles all its dependencies and compiles down to static binaries still dynamically links to libc. Especially, once again, which well meaning build script compiles its own throwaway helper programs that directly invoke filesystem syscalls like that during the build?

The cause for concern here isn't that this is going to circumvent the sandbox, the cause for concern here is that it's just not at all portable and will break horribly on anything other than an amd64 linux.


(In reply to Matt Whitlock from comment #6)
> There are a couple of drawbacks I can see with this approach:
> 
> * It would no longer be possible for processes running inside the sandbox to
> add new exceptions on the fly by modifying environment variables. This,
> however, is arguably a benefit, not a drawback.


Well clearly if it's a benefit to deny the ability for foo-1.0.ebuild to add new exceptions on the fly as the package maintainer sees fit, this is an argument that the focus is on security against malicious software, not QA.

Also, PMS appears to require that code running inside/as a child of bash when invoking src_install() shall be permitted to do this, so violating PMS is definitely a drawback. I suppose one could propose to have it removed in EAPI 9 and then use different sandboxes depending on EAPI.

All this notwithstanding, it is indeed very useful to have a better, more capable sandbox that allows code inside of an ebuild to more accurately communicate its PMS-granted intent to the parent sandbox, in ways that are both faster and more robust to strange environments.

Comment 27 Sam James archtester

2025-02-23 03:21:44 UTC

commit 8b95a0077380fc24dfd3839a3eec586a0b2609d9
Author: Mike Gilbert <floppym@gentoo.org>
Date:   Wed Jan 22 20:37:22 2025 -0500

    Rework path manipulation code

    Drop erealpath, canonicalize, and all of gnulib.

    New functions:

    sb_abspathat computes an absolute path logically with no symlink
    handling.

    sb_realpathat uses openat(..., O_PATH) and /proc/self/fd to resolve
    paths using the kernel.

    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

which in part solved bug 925032 (and the point it was making about ideally dropping gnulib) may help a bit here.