Summary: | sys-apps/portage uses /var/tmp insecurely | ||
---|---|---|---|
Product: | Gentoo Security | Reporter: | herypt |
Component: | Vulnerabilities | Assignee: | Gentoo Security <security> |
Status: | IN_PROGRESS --- | ||
Severity: | normal | CC: | 1i5t5.duncan, ago, dev-portage, floppym, mgorny, mjo, sam, toralf |
Priority: | Normal | Keywords: | PullRequest |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://github.com/gentoo/gentoo/pull/27871 https://github.com/gentoo/portage/pull/926 https://bugs.gentoo.org/show_bug.cgi?id=891863 https://bugs.gentoo.org/show_bug.cgi?id=603222 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
herypt
2022-06-21 07:15:52 UTC
Adding another Portage maintainer I think this behavior is pretty well known, and I see no reason to treat this bug as "confidential". The permissions/owner get reset whenever portage.util.ensure_dirs() is called with PORTAGE_TMPDIR as an argument.
For example, this happens in _emerge.EbuildBuild.async_lock().
Example call trace:
(Pdb) bt
/home/floppym/src/portage/bin/emerge(60)<module>()
-> retval = emerge_main()
/home/floppym/src/portage/lib/_emerge/main.py(1294)emerge_main()
-> return run_action(emerge_config)
/home/floppym/src/portage/lib/_emerge/actions.py(4065)run_action()
-> retval = action_build(emerge_config, spinner=spinner)
/home/floppym/src/portage/lib/_emerge/actions.py(677)action_build()
-> retval = mergetask.merge()
/home/floppym/src/portage/lib/_emerge/Scheduler.py(1149)merge()
-> rval = self._merge()
/home/floppym/src/portage/lib/_emerge/Scheduler.py(1599)_merge()
-> self._main_loop()
/home/floppym/src/portage/lib/_emerge/Scheduler.py(1551)_main_loop()
-> self._event_loop.run_until_complete(self._main_exit)
/home/floppym/src/portage/lib/portage/util/_eventloop/asyncio_event_loop.py(129)_run_until_complete()
-> return self._loop.run_until_complete(future)
/usr/lib/python3.11/asyncio/base_events.py(637)run_until_complete()
-> self.run_forever()
/usr/lib/python3.11/asyncio/base_events.py(604)run_forever()
-> self._run_once()
/usr/lib/python3.11/asyncio/base_events.py(1909)_run_once()
-> handle._run()
/usr/lib/python3.11/asyncio/events.py(80)_run()
-> self._context.run(self._callback, *self._args)
/home/floppym/src/portage/lib/_emerge/AsynchronousTask.py(210)_exit_listener_cb()
-> listener(self)
/home/floppym/src/portage/lib/_emerge/EbuildBuild.py(122)_start_with_metadata()
-> self._prefetch_exit(prefetcher)
/home/floppym/src/portage/lib/_emerge/EbuildBuild.py(210)_prefetch_exit()
-> AsyncTaskFuture(future=self._build_dir.async_lock()), self._start_pre_clean
/home/floppym/src/portage/lib/_emerge/EbuildBuildDir.py(107)async_lock()
-> portage.util.ensure_dirs(
/home/floppym/src/portage/lib/portage/util/__init__.py(1616)ensure_dirs()
-> perms_modified = apply_permissions(dir_path, **kwargs)
> /home/floppym/src/portage/lib/portage/util/__init__.py(1221)apply_permissions()
-> modified = False
(Pdb) p filename
'/x/portage'
We would probably need to make some adjustment to ensure_dirs() or apply_permissions().
(In reply to Mike Gilbert from comment #2) > I think this behavior is pretty well known, and I see no reason to treat > this bug as "confidential". Unchecking this bug as private then. An easy workaround for this would be to install a tmpfiles snippet to create /var/tmp/portage on boot and prevent it from being cleaned. (In reply to Mike Gilbert from comment #5) > An easy workaround for this would be to install a tmpfiles snippet to create > /var/tmp/portage on boot and prevent it from being cleaned. I don't think the PR is enough. The directory /var/tmp/portage is still subject to being cleaned by systemd, or anything else really -- the FHS explicitly allows it. And as soon as the directory goes away, the exploit is available again. Note also that running tmpfiles_process() during (re)installs might not be safe here. I don't remember any of the details, but it's working in a directory with unusual security properties, and the systemd docs specifically warn about it: (You can get away with using guessable names, if you pre-create subdirectories below /tmp/ for them, like X11 does with /tmp/.X11-unix/ through tmpfiles.d/ drop-ins. However this is not recommended, as it is fully safe only if these directories are pre-created during early boot, and thus problematic if package installation during runtime is permitted.) (being cheeky and CCing you as I want your input on the bug generally ;)) (In reply to Michael Orlitzky from comment #6) > (In reply to Mike Gilbert from comment #5) > > An easy workaround for this would be to install a tmpfiles snippet to create > > /var/tmp/portage on boot and prevent it from being cleaned. > > I don't think the PR is enough. The directory /var/tmp/portage is still > subject to being cleaned by systemd It's not subject to cleaning by systemd if we install a tmpfiles snippet that disables said cleaning. > > Note also that running tmpfiles_process() during (re)installs might not be > safe here. If the user is currently running emerge to install sys-apps/portage, and /var/tmp/portage has already been compromised, they are screwed anyway and running tmpfiles_process will not hurt them any further. (In reply to Mike Gilbert from comment #8) > (In reply to Michael Orlitzky from comment #6) > > Note also that running tmpfiles_process() during (re)installs might not be > > safe here. > > If the user is currently running emerge to install sys-apps/portage, and > /var/tmp/portage has already been compromised, they are screwed anyway and > running tmpfiles_process will not hurt them any further. That said, running tmpfiles_process is also rather pointless, since /var/tmp/portage must already exist if we are executing ebuild code. I will adjust the PR to remove it. (In reply to Mike Gilbert from comment #8) > > It's not subject to cleaning by systemd if we install a tmpfiles snippet > that disables said cleaning. Yep, I didn't read the tmpfiles spec closely enough. But now I'm realizing another problem: this directory isn't fixed, it's controlled by $PORTAGE_TMPDIR at runtime. So, for example, the people who put it under /tmp won't be helped by hard-coding a fix for /var/tmp/portage. Obviously we can't fix all security issues that result from a poor (user) choice of PORTAGE_TMPDIR, but we should probably aim to make it safe in any situation that is not already inherently insecure. At the simple-but-disruptive end of the spectrum, we could give up on the idea of using a single hierarchical directory for all builds. If instead we had one temporary directory for each build, portage could use the python facilities for creating that temporary directory on the fly, passing dir=PORTAGE_TMPDIR. That should be secure anywhere that anything is secure. I'm not quite sure what exists at the other end. With a great deal of effort, it's possible in C and on Linux (systemd-tmpfiles does it) to create/chmod/chown things safely in a hostile directory. But I don't think it can be done in python, and (if my knowledge is not outdated) it relies on the particulars of the Linux /proc implementation as well as several non-default kernel options. I think it might make sense to update the ensure_dirs function to raise an error if the target directory already exists and the current owner is neither root (uid 0) nor the uid passed to the function via kwargs. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=d9eeba71d686e10711cba09e547ed5e20d8c6bfb commit d9eeba71d686e10711cba09e547ed5e20d8c6bfb Author: Mike Gilbert <floppym@gentoo.org> AuthorDate: 2022-10-20 15:45:51 +0000 Commit: Mike Gilbert <floppym@gentoo.org> CommitDate: 2022-10-21 14:32:08 +0000 sys-apps/portage: install tmpfiles.d to create /var/tmp/portage This ensures the directory is created at boot, and prevents other users from creating it before emerge is run. Bug: https://bugs.gentoo.org/853283 Closes: https://github.com/gentoo/gentoo/pull/27871 Signed-off-by: Mike Gilbert <floppym@gentoo.org> sys-apps/portage/files/portage-tmpdir.conf | 1 + .../portage/{portage-3.0.38.1.ebuild => portage-3.0.38.1-r1.ebuild} | 2 +- sys-apps/portage/portage-9999.ebuild | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) (In reply to Mike Gilbert from comment #11) > I think it might make sense to update the ensure_dirs function to raise an > error if the target directory already exists and the current owner is > neither root (uid 0) nor the uid passed to the function via kwargs. That's part of what I'm talking about, but you have to be careful not to e.g. follow symlinks because I can create a symlink that makes /var/tmp/portage look good for a split second, and then delete it and recreate it as directory that I own. Enumerating badness is hard to get right, and relies on the cleverest hacker being on your team. Another thing to think about is, when portage crashes, what are you going to tell people to do to fix the issue? Is *that* safe? (In reply to Michael Orlitzky from comment #13) > Another thing to think about is, when portage crashes, what are you going to > tell people to do to fix the issue? Is *that* safe? If a random user owns PORTAGE_TMPDIR and the sysadmin doesn't know how that happened, they probably need to quarantine and audit the system. If they created the directory with the wrong owner by accident, it should be safe to just remove it. I'm not sure it's really feasible make ensure_dirs() "safe" while still allowing random unprivileged users to run commands like ebuild(1). It makes my head hurt. Short of revamping PORTAGE_TMPDIR entirely, perhaps we might document that changing PORTAGE_TMPDIR carries security implications and advise the user to take care. (In reply to Mike Gilbert from comment #15) > > Short of revamping PORTAGE_TMPDIR entirely How annoying would this really be? It's easy, "obviously correct," and would simplify portage a bit. When I need to inspect a failed build, it's usually interactively, so personally I wouldn't mind portage telling me to look in /var/tmp/portage-wwgpae9g or whatever. And PORTAGE_LOGDIR let's you put the build logs, at least, somewhere predictable. Mainly I'd be worried about annoying people like Agostino and Toralf who (in an automated fashion) need to retrieve things like config.log. But I also don't think it would be too huge of a hurdle to parse the output from portage to figure out where to look for them. > perhaps we might document that changing PORTAGE_TMPDIR carries security implications and advise the user to take care. Even the default is unsafe if something other than systemd is doing the clean up, such as app-admin/tmpreaper. Using a random temp directory for every portage invocation makes resuming builds impossible. Another possible solution would be to move the default PORTAGE_TMPDIR out of /var/tmp and into a location where random users don't have write access. (In reply to Mike Gilbert from comment #17) > Using a random temp directory for every portage invocation makes resuming > builds impossible. I think we would need to provide some additional command line option to manually specify the actual build directory to support resuming. (In reply to Mike Gilbert from comment #17) > Using a random temp directory for every portage invocation makes resuming > builds impossible. This could be worked around by (say) saving a package -> directory map outside of /var/tmp, but raises a good point: how long do we expect these directories to stick around? For ebuild(1) to work as promised, we likely shouldn't be treating them as temporary in the first place... > Another possible solution would be to move the default PORTAGE_TMPDIR out of > /var/tmp and into a location where random users don't have write access. ...making this a good idea. Specifically, we will run into trouble in any situation where ebuild(1) wants to reuse a directory but enough time has passed that the directory could have been cleaned. Ok, so where exactly to put it? :) Somewhere under /var makes sense. Nothing in FHS really fits perfectly thoguh. https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05.html (In reply to Mike Gilbert from comment #20) > Ok, so where exactly to put it? :) > > Somewhere under /var makes sense. Nothing in FHS really fits perfectly > thoguh. > > https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05.html /var/lib/portage/tmp would be my first instinct. But that's the easy part. Are we going to remove PORTAGE_TMPDIR, or will we still allow people to set it to, say, /tmp where it will be insecure? The docs allude to that in various places, and there are good reasons to do it. There's ultimately a conflict between two goals: 1. Being able to re-access the temporary build directory at a later time based on the package name alone. 2. Being able (and having a good reason to) put PORTAGE_TMPDIR under /tmp or /var/tmp. While (1) is important, I think it's only developers who use it regularly. And those developers would be relatively well equipped to jump through some hoops if we have to break things to fix this. On the other hand (2) is an end-user feature, likely used all over the place, and is the security-sensitive bit. The goals are mutually exclusive: you can't re-access a path by name under /tmp after a "significant" amount of time has passed, even if the name was originally chosen randomly. So, for example, we can't allow developers to put PORTAGE_TMPDIR=/tmp and somehow still use `ebuild compile` safely. Should we make people choose between (1) and (2)? One option that comes to mind is to have the default be PORTAGE_TMPDIR=<unset>, which would use /var/lib/portage/tmp with a predictable naming scheme. Then if PORTAGE_TMPDIR is set, we'd use a secure naming scheme, breaking `ebuild` in the process. In effect we'd be making developers use the default if they want `ebuild` to be able to pick up where it left off, but making sure that end-users get security out of the box with the new version, regardless of their existing configuration. I'm just thinking out loud, this will probably have to rattle around in the back of my head for a while longer before it all makes sense. (In reply to Michael Orlitzky from comment #21) > The goals are mutually exclusive: you can't re-access a path by name under > /tmp after a "significant" amount of time has passed, even if the name was > originally chosen randomly. So, for example, we can't allow developers to > put PORTAGE_TMPDIR=/tmp and somehow still use `ebuild compile` safely. > Should we make people choose between (1) and (2)? If the goal is to use a tmpfs for building, users could mount a separate tmpfs with a more secure mode. For example, I have PORTAGE_TMPDIR="/x" and this entry in fstab on my main Gentoo system: > tmpfs /x tmpfs mode=0770,uid=root,gid=portage,size=20g 0 0 (In reply to Mike Gilbert from comment #22) > > If the goal is to use a tmpfs for building, users could mount a separate > tmpfs with a more secure mode. For example, I have PORTAGE_TMPDIR="/x" and > this entry in fstab on my main Gentoo system: In concert with a free-form PORTAGE_TMPDIR, I feel like this is a cop-out, because what tempfile security issues can't be avoided by giving each program its own dedicated tmpfs? We're still placing the pitfall at users' feet and expecting them to know that (a) there is a problem and (b) how to avoid it. But, as a response to "I want to use tmpfs instead of the secure default," I think it's great. Could we get away with deprecating PORTAGE_TMPDIR, switching the default to /var/lib/portage/tmp, and documenting how to mount a tmpfs at that location for the people who want a tmpfs? The same approach should be able to handle network drives, and with bind mounts, almost anything else. There are still some questions, * How and when does /var/lib/portage/tmp get cleaned? * Is this flexible enough for ccache, prefix, and everything else that PORTAGE_TMPDIR gets used for? But I think I like the idea. It's secure out of the box, and applies retroactively to people who have PORTAGE_TMPDIR already set. We could emit a warning if PORTAGE_TMPDIR is defined, so that no one is left with a silently broken config. The code changes should be easy enough and actually simplify portage. And if you really want to, the door is still open to shoot yourself in the foot with bind mounts or symlinks (or simply by hacking the source). I hate that I'm even commenting this, but it's worth keeping in mind that /var/lib/portage/tmp is longer than /var/tmp/portage, so we'll hit even more of those annoying path limit length bugs in test suites with socket names, etc. (In reply to Sam James from comment #24) > I hate that I'm even commenting this, but it's worth keeping in mind that > /var/lib/portage/tmp is longer than /var/tmp/portage, so we'll hit even more > of those annoying path limit length bugs in test suites with socket names, > etc. As payment for this, you would receive consistency: the path will always be /var/lib/portage/tmp, and not whatever monster the user has entered in PORTAGE_TMPDIR. I'm not on board with removing the ability for users to override PORTAGE_TMPDIR. We can document the security implication, and if they choose to shoot themselves in the foot that's on them. Anyway, I'm getting kind of tired of going back and forth on this bug. The issue originally reported has been addressed via the tmpfiles.d drop-in. sys-apps/portage is no longer vulnerable in its default configuration. Patches to make Portage more secure are always welcome, provided they don't remove existing flexibility. My suggestion to the security team for any potential GLSA would be: 1. Advise users not to not remove /var/tmp/portage via automated cleanup scripts like tmpreaper. 2. Advise users take care when overriding PORTAGE_TMPDIR. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c046ee45bcc9d2e6164b70f73acf7074e03d31f1 commit c046ee45bcc9d2e6164b70f73acf7074e03d31f1 Author: Mike Gilbert <floppym@gentoo.org> AuthorDate: 2022-10-26 00:18:33 +0000 Commit: Mike Gilbert <floppym@gentoo.org> CommitDate: 2022-10-26 00:20:24 +0000 sys-apps/portage: update tmpfiles.d to create /tmp/portage This is common alternate location for PORTAGE_TMPDIR. Bug: https://bugs.gentoo.org/853283 Signed-off-by: Mike Gilbert <floppym@gentoo.org> sys-apps/portage/files/portage-tmpdir.conf | 1 + .../portage/{portage-3.0.38.1-r1.ebuild => portage-3.0.38.1-r2.ebuild} | 0 2 files changed, 1 insertion(+) (In reply to Mike Gilbert from comment #27) > Anyway, I'm getting kind of tired of going back and forth on this bug. > > The issue originally reported has been addressed via the tmpfiles.d drop-in. > sys-apps/portage is no longer vulnerable in its default configuration. > It's still vulnerable when something other than systemd is doing the cleanup. (In reply to Michael Orlitzky from comment #29) > It's still vulnerable when something other than systemd is doing the cleanup. We can't control the actions of users. (In reply to Mike Gilbert from comment #22) > If the goal is to use a tmpfs for building, users could mount a separate > tmpfs with a more secure mode. For example, I have PORTAGE_TMPDIR="/x" and > this entry in fstab on my main Gentoo system: > > > tmpfs /x tmpfs mode=0770,uid=root,gid=portage,size=20g 0 0 Doesn't that just open up a possible memory-exhaustion DoS due to multiple tmpfs with a total capacity greater than RAM (likely plus swap), that's arguably way less obscure and way easier to trigger, even inadvertently on a single-human-user system? Or I can hope there's a total-tmpfs-limit kernel knob I missed, that can limit say 20 gig /tmp tmpfs plus 20 gig portage tmpfs plus misc other tmpfs to say 25 gig total? (In reply to Duncan from comment #31) The sysadmin can override the size of any tmpfs mounts on the system by specifying them in fstab. (In reply to Mike Gilbert from comment #32) > (In reply to Duncan from comment #31) > > The sysadmin can override the size of any tmpfs mounts on the system by > specifying them in fstab. Per-tmpfs max size doesn't help. Verbosity++ (There's a TLDR stop about 1/3 of the way thru, below.) Using my (hopefully to be replaced this year) system numbers... * tmpfs on /tmp: 5 gig size (min) to deal with a DVD image * tmpfs for portage: 12 gig size to prevent ENOSPC when building * 16 gig physical RAM system * 64 gig swap as four 16 swap partitions at equal swap priority on separate ssds for effective striping (system's too old for cost effective NVME). General scenario: Forgetting I had been working with a DVD image that's still in /tmp when I try to merge something big. Scenario 1: Single big 12 gig /tmp tmpfs with $PORTAGE_TMPDIR pointed into it, and various other tmpfs (/dev/shm, /run, ...) with a total configured size under a gig, so (under) 13 gig of tmpfs total. Scenario 2: 5 gig /tmp, separate 12 gig portage tmpfs, other misc tmpfs totalling (under) a gig, so 18 gig of tmpfs total... on a 16 gig physical-RAM system you already know where I'm headed. (TLDR stop) Scenario 3: As 2, but with a single legacy spinning rust swap (or possibly worse, swap on a new tiled segment drive), or no configured swap (16 gig should be enough). Scenario 1 results: ENOSPC when the build runs out of room in the 12 gig tmpfs due to that DVD image taking near 5 gig. Total tmpfs usage cannot exceed 13 gig (12+misc), still well under the 16 gig system RAM, so I'll likely swap some, but with the sdd-striped-swap it'll stay within reason and while it might be slightly slow deleting that DVD image after being prompted by the ENOSPC, I'll still be able to do it without /too/ much trouble, after which a manual ebuild command can pick up where emerge ENOSPCed. Scenario 2 results: Without the ENOSPC protection of exceeding the limits of the single large tmpfs, system RAM is far exceeded and I'm multiple gigs into swap, definitely uncomfortable even on the stripe-RAIDed SSD swap. Without that ENOSPC error the thing could be dragging for hours without much progress due to thrashing given the depth into swap even with it being striped-raid-ssd-swap, tho given enough time (days?) it should eventually finish. I'd likely remember the DVD image when I noticed the dragging and could delete it, but it'd take some patience. With the striped-RAID ssd swap it'd recover reasonably after that, tho. Scenario 3 results: BRB time?? (Umm... "big red button", since "be right back"... wouldn't be!) In practice, this (general) scenario seems far more likely to trigger, even accidentally as presented, than obscure perms issues which are both unlikely to trigger accidentally and unlikely to be much of an issue on the single-human-user systems that are the worry for many gentooers. Even on multi-human-user-systems with a hostile user as a given, it seems to me the space issue is likely to be easier to trigger, almost certainly so if said hostile user wishes to maintain plausible innocence. While the consequences of a memory DoS may appear lower, given the knowledge required to successfully execute the perms attack, I'd suggest such a user has a reasonable chance at using the memory DoS as an escalatory attack, meaning the worst-cases aren't so far apart after all, and given the far higher chance of the memory DoS... Conclusion: This is an explicit counter to the comment #22 dedicated portage tmpfs idea. While comment #22 demonstrated that works for systems with the memory to handle it, this should demonstrate it's not a general solution appropriate to more memory-constrained systems that never-the-less still have enough memory to do tmpfs-based $PORTAGE_TMPDIR if they're carefully configured (and of course now, assuming this perms-based attack is considered low risk enough by the admin). Unless as suggested/hoped, there's a kernel knob somewhere I missed to set the global combined tmpfs usage max, such that it can be lower than the sum of the fstab-configurable maxes of each individual tmpfs. Would be nice... Sorry to be rude, but I have marked comment 33 as "obsolete" to hide it from view by default. It's a long-winded explanation of how an admin could hypothetically misconfigure their system. I don't see how it is relevant to this bug. (In reply to Mike Gilbert from comment #34) > Sorry to be rude, but I have marked comment 33 as "obsolete" to hide it from > view by default. It's a long-winded explanation of how an admin could > hypothetically misconfigure their system. I don't see how it is relevant to > this bug. Agreed. |