Summary: | sys-apps/sandbox: rust's direct use of clone3 syscall leads to internal sandbox deadlocks | ||
---|---|---|---|
Product: | Portage Development | Reporter: | 12101111 <w12101111> |
Component: | Sandbox | Assignee: | Sandbox Maintainers <sandbox> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | flow, jistone, O01eg, rust, sam, toolchain |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://github.com/rust-lang/rust/issues/89522 | ||
See Also: | https://github.com/rust-lang/rust/issues/89522 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 803482 |
Description
12101111
2021-10-05 09:45:07 UTC
As far as I can tell, this happens because the sandbox tool has a global lock, and hooks fork to acquire that lock before forking and drop it afterwards (so that fork doesn't happen while another thread holds the lock), but does not have a similar hook for clone or clone3. It's possible to create another process using clone or clone3 (not just a thread), if the flags do not include CLONE_VM. I think the right fix is to hook clone and clone3, and if the flags do *not* include CLONE_VM, use the same lock/unlock logic. (Note that since clone3 is a syscall and not a library call, library interposing wouldn't help here. It may be necessary to do external ptracing on programs that use clone3.) (In reply to Josh Triplett from comment #1) yes, sandbox relies on the program not making syscalls directly and forking its own processes. if it does that, all bets are off. considering how painful it has been to use the clone syscall, no one has bothered, especially when portable code already has access to everything it needs via the fork() & pthread C library APIs. (In reply to Josh Triplett from comment #2) we currently only ptrace static programs. as painful as interposing is, it's still less flaky than ptrace, especially on some architectures. while detecting if a particular ELF is static is trivial, detecting if it uses a specific syscall is practically impossible. it'd prob help to have someone dig a bit into rust/cargo source to see where the clone3 call is coming from. if it were glibc, our existing symbol interposition should have handled it. there is a clone C library wrapper we could/should interpose and check the flags, but i suspect that isn't the source of the trouble here. the clone function still hits the clone syscall only, not clone3. the only way glibc itself hits clone3 is via __clone_internal, and that's only used with pthread & spawn APIs. so if rust itself is hitting clone3 syscall directly via its internal code, we're a bit buggered. we could maybe add an env var hook to sandbox to force it to use its ptrace mode, and rust ebuilds/eclasses could set that. rust 1.56 is indeed doing a direct clone3 syscall: https://github.com/rust-lang/rust/pull/81825/files#diff-7015a38ee6056bbfa832b33281ffeaad5531c4dbfaff60ddfce0934475e040f4R163 Also, while I'm not too familiar with Rust, this change appears to be in the rust standard library (or whatever they call it) - so it would affect anything built with rust, not just cargo. I've already run into a related issue trying to do profile-guided optimization in firefox (which runs firefox in a virtual X server during build, to generate profile data - it hangs forever, just like cargo). I did manage to "fix" cargo by patching this line to default "HAS_CLONE3" to false rather than true: https://github.com/rust-lang/rust/blob/5e02151318ddd431aea6d58e23948246c1446044/library/std/src/sys/unix/process/process_unix.rs#L147 Looks like they disabled this feature (in most cases) in 1.56 final, so hopefully it won't be an issue: https://github.com/rust-lang/rust/pull/89924 There's still a lot of discussion in the original bug though ( https://github.com/rust-lang/rust/issues/89522 ), so it's not clear what's going to happen in future versions. we'll have to rework the sandbox startup logic so we only run in ptrace mode. i'm not sure how much cutting this will take though. first we'll have to fix a long standing bug where ptrace only works one-level deep. if the traced process forks, we don't keep track of any of those descendants. i have this somewhat working. FWIW, Rust's "direct" clone3 syscall is actually using libc's syscall (2), so in theory you could interpose that and decode number==SYS_clone3. I don't know if it's worth the effort though, because I think Rust is going to have to stop using clone3 anyway, for all the libc arguments against it. (In reply to Josh Stone from comment #8) thanks ... i had been including syscall() here, but you're certainly right we could interpose that interface too. a cursory scan of my system shows that not too many programs go through that, so it shouldn't (hopefully) be too much overhead if we want to go that route. looks like workaround made in in 1.56.0 i have seccomp+ptrace working locally, and i'll prob ship it in sandbox-2.28. waiting to hear back on the new 2.27 release which includes NNP enabled (and is a requirement for seccomp usage). however, ptrace is still only used when running static or set*id programs, it isn't used all the time. will have to do more work to pull the ptrace logic out into the main code path as an option (vs using LD_PRELOAD at all). we also have the problem that the ptrace support hasn't been ported to all architectures. i did it for a bunch that i had easy access to via Gentoo dev machines, but that's not all of them. so i think the LD_PRELOAD logic will be with us for quite some time. also need to see what kind of perf hit this takes overall. or maybe it's a win. (In reply to SpanKY from comment #11) > i have seccomp+ptrace working locally, and i'll prob ship it in > sandbox-2.28. waiting to hear back on the new 2.27 release which includes > NNP enabled (and is a requirement for seccomp usage). > yeah, it's been fine so far, but giving it a bit longer is a good idea. > we also have the problem that the ptrace support hasn't been ported to all > architectures. i did it for a bunch that i had easy access to via Gentoo > dev machines, but that's not all of them. so i think the LD_PRELOAD logic > will be with us for quite some time. > which are you missing? sandbox arch list: https://gitweb.gentoo.org/proj/sandbox.git/tree/libsandbox/trace/linux?h=v2.27 Gentoo arch list: https://gitweb.gentoo.org/repo/gentoo.git/tree/profiles/arch.list at this point, i'm considering implementing generic layers in the kernel and sending those upstream. linux-5.3 has PTRACE_GET_SYSCALL_INFO which generalizes the reading side of the equation, but there's no generic "set" operation for forcing an error. then i wouldn't have to fight with sandbox always being behind the curve with newer Linux ports. updated port status: * i got an arm64 laptop got that mostly working (i think i hit a kernel bug) * i got access to the sparc dev box and fixed the sparc32 & sparc64 ports * i fixed the ppc64 ptrace logic the big ones we're still missing are mips and maybe riscv. also m68k & sh, but we know those are lesser used. talking with some folks about getting a common set ptrace API in the kernel. optimistic that it'll happen, but it'll be a while before we can rely on it. relevant ptrace discussion in case people are interested: https://lists.strace.io/pipermail/strace-devel/2021-October/010745.html but at this point, i don't think it's a blocker for progress with most arches. sandbox-3.0 is out w/support for tracing of children of static programs. that's going to need a while to bake i think. rust thing ported to beta branches, so it's not only in 1.56* now. https://github.com/rust-lang/rust/pull/90938 so what's the status here regarding glibc-2.34? pretty sure they reverted all clone3 related changes, so it shouldn't be a blocker anymore. but i don't know what rust releases, if any, it's part of. rust is ok, all affected releases are gone. one currently in the tree is not affected. (In reply to SpanKY from comment #19) > pretty sure they reverted all clone3 related changes, so it shouldn't be a > blocker anymore. but i don't know what rust releases, if any, it's part of. (In reply to Georgy Yakovlev from comment #20) > rust is ok, all affected releases are gone. one currently in the tree is not > affected. OK unblocking then, thanks |