I have been using an automounted tmp directory for portage since years but lately it stopped working, portage bails out with: >>> Running pre-merge checks for www-client/google-chrome-109.0.5414.74 /var/tmp/portage is not writable. Likely cause is that you've mounted it as readonly. # mount|grep tmp/portage systemd-1 on /var/tmp/portage type autofs (rw,relatime,fd=55,pgrp=1,timeout=60,minproto=5,maxproto=5,direct,pipe_ino=3457) I'm not sure what caused it to stop working, several components have been updated at the same time, e.g. systemd (probably some 251.x) and the kernel (from 5.15 to 6.1) but also portage. However, a downgrade of portage didn't seem to fix it. When changing to the /var/tmp/portage directory (thus forcing the automounter to mount the directory) before invoking emerge successfully works around it. Reproducible: Always Steps to Reproduce: 1. Setup a tmpfs automount for portage as listed above 2. Ensure that only automount is active and tmpfs is currently not engaged 3. Run emerge and start building packages Actual Results: Error output: /var/tmp/portage is not writable. Likely cause is that you've mounted it as readonly. Expected Results: The packages should build. It looks like portage looking up the permissions of the directory no longer triggers the automounter, so it ends up reading wrong permissions. Portage should maybe first try opening the directory then, or try to actually create a file. I'm not sure where the new behavior actually comes from, so it may affect other software, too.
Does the behavior change if you set FEATURES="-mount-sandbox -pid-sandbox" in make.conf?
No, it doesn't... (tried every combination of disabling sandboxes) I don't think this is caused by anything that portage changed but rather what the kernel or systemd changed. And portage probably needs to adapt.
If you call "stat /var/tmp/portage", does that trigger the automount?
Could you attach the automount and mount units you are using for this? That would help in attempting to reproduce the problem.
Running stat doesn't trigger the automount: # LANG=C stat /var/tmp/portage File: /var/tmp/portage Size: 0 Blocks: 0 IO Block: 1024 directory Device: 0,50 Inode: 5416 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2023-01-15 00:16:54.037823778 +0100 Modify: 2023-01-14 23:06:06.359999729 +0100 Change: 2023-01-14 23:06:06.359999729 +0100 Birth: - After manually triggering the automount: # LANG=C stat /var/tmp/portage File: /var/tmp/portage Size: 40 Blocks: 0 IO Block: 4096 directory Device: 0,375 Inode: 1 Links: 2 Access: (0770/drwxrwx---) Uid: ( 250/ portage) Gid: ( 250/ portage) Access: 2023-01-15 11:49:49.220744200 +0100 Modify: 2023-01-15 11:49:48.974069700 +0100 Change: 2023-01-15 11:49:48.974069700 +0100 Birth: 2023-01-15 11:49:48.974069700 +0100 # /run/systemd/generator/var-tmp-portage.automount # Automatically generated by systemd-fstab-generator [Unit] SourcePath=/etc/fstab Documentation=man:fstab(5) man:systemd-fstab-generator(8) [Automount] Where=/var/tmp/portage TimeoutIdleSec=1min # /run/systemd/generator/var-tmp-portage.mount # Automatically generated by systemd-fstab-generator [Unit] Documentation=man:fstab(5) man:systemd-fstab-generator(8) SourcePath=/etc/fstab Before=local-fs.target [Mount] What=tmpfs Where=/var/tmp/portage Type=tmpfs Options=noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=12G,mode=770,uid=portage,gid=portage These are generated from the fstab entry: # grep portage /etc/fstab tmpfs /var/tmp/portage tmpfs noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=12G,mode=770,uid=portage,gid=portage Additionally, I'm using tmpfiles to ensure the directory exists: # cat /etc/tmpfiles.d/portage.conf D! /var/tmp/portage 0775 portage portage So after some fiddling around: After I stopped the automount unit, the directory remains owned by root although tmpfiles sets it to owner portage during boot. I manually changed the owner back to portage, verified this using `ls -al`, then started the automount unit again which changes the owner back to root. According to the `man systemd.automount`, there's no way of setting the owner using systemd parameters in the unit. I'm guessing that since some systemd update, the automount point no longer inherits the original inode owner. OTOH, I'm not sure if this was the same behavior before it started to fail for me. After some research, it looks like the behavior for the mount point owned by root may date back to at least 2020: https://bbs.archlinux.org/viewtopic.php?id=251882
So I found out that the systems that still run 5.15 LTS are not affected by this. Something seems to have changed in the kernel between 5.15 and 6.1.
(In reply to Kai Krakow from comment #6) > So I found out that the systems that still run 5.15 LTS are not affected by > this. Something seems to have changed in the kernel between 5.15 and 6.1. Could you elaborate on the differing behavior between 5.15 and 6.1? A kernel bug seems much more likely than blaming this on Portage.
(In reply to Mike Gilbert from comment #7) > Could you elaborate on the differing behavior between 5.15 and 6.1? I'd need to find bi-sect this or at least find a commit which changes the behavior. Or systemd handles that differently between both kernel versions. > A kernel bug seems much more likely than blaming this on Portage. Sounds reasonable. A work-around would probably be to use a portage tmp directly one level below the mount point, or just statically mount the tmpfs. But I wanted to avoid that and unmount on idle so it will clean up memory usage even if some files are left over in the tmpfs.
I found some commits between 5.15..6.1 which may have changed behavior around inode caching and inode lookup of autofs mount points, probably for better correctness. Only maybe relevant thread found: https://lore.kernel.org/all/165724445154.30914.10970894936827635879.stgit@donald.themaw.net/ I think the behavior may be intentional: If you just stat directory entries from the parent, you're not supposed to open the directories (portage calls stat for `/var/tmp/portage`). Otherwise, scanning directories could trigger mounts unintentionally and could cause amounts of IO or network overhead, or even timeouts in case of connectivity problems. Thus, if I explicitly stat `/var/tmp/portage/.`, I get the stat data of the mounted directory instead of the mount point - which is what portage really wants to ask for: We don't need the stat data of the parent device but the device inside the mount point. So I think the correct way for portage to check for available space is to actually trigger a mount and ask for that information by adding `/.` to the path for the stat call. What do you think?
I'm not sure I buy into the idea that stat /var/tmp/portage should not trigger an automount. I don't think we want to introduce an extra "/." in every place we refer to $PORTAGE_TMPDIR/portage. As a workaround, you set up an automount at some other location (/mnt/foo), and make portage a directory within the automounted filesystem. You would then set PORTAGE_TMPDIR=/mnt/foo, or set up a symlink to it at /var/tmp/portage. That way, attempts to access /mnt/foo/portage should trigger an automount since it is accessing a child path instead of the mount point itself.
Created attachment 851118 [details, diff] trigger automounts properly (hacky) I understand your argument, and the proposed work-around is what I was going to implement. But historically, it looks like there have been some changes around that - and actually stat() should not trigger automounts: https://patchwork.kernel.org/comment/20796411/ The thing is, Python's os.access() (which is used in portage) does not use stat(), it uses "man 2 access" according to the Python docs but that system call should be avoided anyways according to the man page (because it opens chances for a race condition, which is probably completely unimportant here). The difference between stat() and access() does not matter here, the essence of the linked discussion is that stat() is not supposed to trigger automounts, and the point of the patch was that automount triggering has been always suppressed previously without user-space being able to use "follow automounts". So user-space should probably flag if they want to explicitly follow automounts which Python probably doesn't do (and there's no flag for it). But then again, this test isn't really testing for write permissions in a directory, the test would pass even for files: ``` # doebuild.py 1607 # as some people use a separate PORTAGE_TMPDIR mount 1608 # we prefer that as the checks below would otherwise be pointless 1609 # for those people. 1610 checkdir = first_existing(os.path.join(settings["PORTAGE_TMPDIR"], "portage")) 1611 1612 if not os.access(checkdir, os.W_OK): 1613 writemsg( 1614 _( 1615 "%s is not writable.\n" 1616 "Likely cause is that you've mounted it as readonly.\n" 1617 ) 1618 % checkdir, 1619 noiselevel=-1, 1620 ) 1621 return 1 ``` I didn't check if it tests for being a directory earlier in the code but certainly this part of the code is incomplete for what it should test (and portage.util.path.first_existing doesn't care for the inode type at all, so we get what we should from that function, and there's probably also no check for being a directory somewhere earlier in the code). For now, I'm using the attached patch because I think the test isn't doing what it's supposed to do. But I'm not even sure if using the result of first_existing() is the correct result for checking permissions here because if the first existing directory is just "/var" you'll never have write permissions there. Unless portage is supposed to recreate the complete directory structure - but then, the "correct fix" is most probably not adjusting the permissions as the error message would suggest. BTW, according to git history, using a symlink here could violate sandbox expectations (commit be2312f4f9bf854897431440734a765f5279c7d1). In light of this commit, the current check for write permissions using first_existing() also doesn't seem to be correct. So consider my patch quick and dirty. It fixes my specific use case, it doesn't care about the semantic flaws that seem to be in this logic.
It looks like we already use a NamedTemporaryFile to check if PORTAGE_TMPDIR has noexec set. https://gitweb.gentoo.org/proj/portage.git/tree/lib/portage/package/ebuild/doebuild.py?h=portage-3.0.48.1#n1622 I suppose we could just drop the os.access() check, and move the messaging down to the NamedTemporaryFile check.
Please give this PR a spin. https://github.com/gentoo/portage/pull/1051
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=37b108a7a4630583921484c0b5f513a7e384d851 commit 37b108a7a4630583921484c0b5f513a7e384d851 Author: Mike Gilbert <floppym@gentoo.org> AuthorDate: 2023-06-06 18:03:50 +0000 Commit: Mike Gilbert <floppym@gentoo.org> CommitDate: 2023-06-14 19:21:01 +0000 doebuild: do not rely on os.access() for PORTAGE_TMPDIR write check Calling os.access() on ${PORTAGE_TMPDIR}/portage will not trigger any automount that the user may have configured there. Instead, just try to create a file and catch PermissionError. Bug: https://bugs.gentoo.org/890812 Signed-off-by: Mike Gilbert <floppym@gentoo.org> lib/portage/package/ebuild/doebuild.py | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-)
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=9502761c5bef818dbec90f062909d46dc22289df commit 9502761c5bef818dbec90f062909d46dc22289df Author: Sam James <sam@gentoo.org> AuthorDate: 2023-06-21 19:09:31 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-06-21 19:11:05 +0000 sys-apps/portage: add 3.0.49 Closes: https://bugs.gentoo.org/485100 Cloess: https://bugs.gentoo.org/592880 Closes: https://bugs.gentoo.org/596664 Closes: https://bugs.gentoo.org/631490 Closes: https://bugs.gentoo.org/764365 Closes: https://bugs.gentoo.org/793992 Closes: https://bugs.gentoo.org/890812 Closes: https://bugs.gentoo.org/905660 Closes: https://bugs.gentoo.org/907949 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-3.0.49.ebuild | 296 +++++++++++++++++++++++++++++++++ 2 files changed, 297 insertions(+)