Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 890812 - portage fails to build packages on automounted tmp directories
Summary: portage fails to build packages on automounted tmp directories
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal minor
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on: 907949
Blocks:
  Show dependency tree
 
Reported: 2023-01-14 12:22 UTC by Kai Krakow
Modified: 2023-06-21 19:12 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
trigger automounts properly (hacky) (automount_fix.patch,657 bytes, patch)
2023-02-14 16:59 UTC, Kai Krakow
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kai Krakow 2023-01-14 12:22:09 UTC
I have been using an automounted tmp directory for portage since years but lately it stopped working, portage bails out with:

>>> Running pre-merge checks for www-client/google-chrome-109.0.5414.74
/var/tmp/portage is not writable.
Likely cause is that you've mounted it as readonly.

# mount|grep tmp/portage
systemd-1 on /var/tmp/portage type autofs (rw,relatime,fd=55,pgrp=1,timeout=60,minproto=5,maxproto=5,direct,pipe_ino=3457)

I'm not sure what caused it to stop working, several components have been updated at the same time, e.g. systemd (probably some 251.x) and the kernel (from 5.15 to 6.1) but also portage. However, a downgrade of portage didn't seem to fix it.

When changing to the /var/tmp/portage directory (thus forcing the automounter to mount the directory) before invoking emerge successfully works around it.

Reproducible: Always

Steps to Reproduce:
1. Setup a tmpfs automount for portage as listed above
2. Ensure that only automount is active and tmpfs is currently not engaged
3. Run emerge and start building packages
Actual Results:  
Error output:

/var/tmp/portage is not writable.
Likely cause is that you've mounted it as readonly.


Expected Results:  
The packages should build.

It looks like portage looking up the permissions of the directory no longer triggers the automounter, so it ends up reading wrong permissions. Portage should maybe first try opening the directory then, or try to actually create a file.

I'm not sure where the new behavior actually comes from, so it may affect other software, too.
Comment 1 Mike Gilbert gentoo-dev 2023-01-15 00:51:12 UTC
Does the behavior change if you set FEATURES="-mount-sandbox -pid-sandbox" in make.conf?
Comment 2 Kai Krakow 2023-01-15 01:12:05 UTC
No, it doesn't... (tried every combination of disabling sandboxes)

I don't think this is caused by anything that portage changed but rather what the kernel or systemd changed. And portage probably needs to adapt.
Comment 3 Mike Gilbert gentoo-dev 2023-01-15 01:29:00 UTC
If you call "stat /var/tmp/portage", does that trigger the automount?
Comment 4 Mike Gilbert gentoo-dev 2023-01-15 02:21:06 UTC
Could you attach the automount and mount units you are using for this? That would help in attempting to reproduce the problem.
Comment 5 Kai Krakow 2023-01-15 11:09:33 UTC
Running stat doesn't trigger the automount:

# LANG=C stat /var/tmp/portage
  File: /var/tmp/portage
  Size: 0               Blocks: 0          IO Block: 1024   directory
Device: 0,50    Inode: 5416        Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-01-15 00:16:54.037823778 +0100
Modify: 2023-01-14 23:06:06.359999729 +0100
Change: 2023-01-14 23:06:06.359999729 +0100
 Birth: -

After manually triggering the automount:

# LANG=C stat /var/tmp/portage
  File: /var/tmp/portage
  Size: 40              Blocks: 0          IO Block: 4096   directory
Device: 0,375   Inode: 1           Links: 2
Access: (0770/drwxrwx---)  Uid: (  250/ portage)   Gid: (  250/ portage)
Access: 2023-01-15 11:49:49.220744200 +0100
Modify: 2023-01-15 11:49:48.974069700 +0100
Change: 2023-01-15 11:49:48.974069700 +0100
 Birth: 2023-01-15 11:49:48.974069700 +0100


# /run/systemd/generator/var-tmp-portage.automount
# Automatically generated by systemd-fstab-generator

[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)

[Automount]
Where=/var/tmp/portage
TimeoutIdleSec=1min

# /run/systemd/generator/var-tmp-portage.mount
# Automatically generated by systemd-fstab-generator

[Unit]
Documentation=man:fstab(5) man:systemd-fstab-generator(8)
SourcePath=/etc/fstab
Before=local-fs.target

[Mount]
What=tmpfs
Where=/var/tmp/portage
Type=tmpfs
Options=noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=12G,mode=770,uid=portage,gid=portage

These are generated from the fstab entry:

# grep portage /etc/fstab
tmpfs /var/tmp/portage tmpfs noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=12G,mode=770,uid=portage,gid=portage

Additionally, I'm using tmpfiles to ensure the directory exists:

# cat /etc/tmpfiles.d/portage.conf
D! /var/tmp/portage 0775 portage portage


So after some fiddling around:

After I stopped the automount unit, the directory remains owned by root although tmpfiles sets it to owner portage during boot. I manually changed the owner back to portage, verified this using `ls -al`, then started the automount unit again which changes the owner back to root.

According to the `man systemd.automount`, there's no way of setting the owner using systemd parameters in the unit. I'm guessing that since some systemd update, the automount point no longer inherits the original inode owner. OTOH, I'm not sure if this was the same behavior before it started to fail for me. After some research, it looks like the behavior for the mount point owned by root may date back to at least 2020: https://bbs.archlinux.org/viewtopic.php?id=251882
Comment 6 Kai Krakow 2023-02-02 22:40:00 UTC
So I found out that the systems that still run 5.15 LTS are not affected by this. Something seems to have changed in the kernel between 5.15 and 6.1.
Comment 7 Mike Gilbert gentoo-dev 2023-02-03 04:11:56 UTC
(In reply to Kai Krakow from comment #6)
> So I found out that the systems that still run 5.15 LTS are not affected by
> this. Something seems to have changed in the kernel between 5.15 and 6.1.

Could you elaborate on the differing behavior between 5.15 and 6.1?

A kernel bug seems much more likely than blaming this on Portage.
Comment 8 Kai Krakow 2023-02-05 15:44:56 UTC
(In reply to Mike Gilbert from comment #7)
> Could you elaborate on the differing behavior between 5.15 and 6.1?

I'd need to find bi-sect this or at least find a commit which changes the behavior. Or systemd handles that differently between both kernel versions.

> A kernel bug seems much more likely than blaming this on Portage.

Sounds reasonable.

A work-around would probably be to use a portage tmp directly one level below the mount point, or just statically mount the tmpfs. But I wanted to avoid that and unmount on idle so it will clean up memory usage even if some files are left over in the tmpfs.
Comment 9 Kai Krakow 2023-02-14 15:52:45 UTC
I found some commits between 5.15..6.1 which may have changed behavior around inode caching and inode lookup of autofs mount points, probably for better correctness.

Only maybe relevant thread found:
https://lore.kernel.org/all/165724445154.30914.10970894936827635879.stgit@donald.themaw.net/

I think the behavior may be intentional:

If you just stat directory entries from the parent, you're not supposed to open the directories (portage calls stat for `/var/tmp/portage`). Otherwise, scanning directories could trigger mounts unintentionally and could cause amounts of IO or network overhead, or even timeouts in case of connectivity problems.

Thus, if I explicitly stat `/var/tmp/portage/.`, I get the stat data of the mounted directory instead of the mount point - which is what portage really wants to ask for: We don't need the stat data of the parent device but the device inside the mount point.

So I think the correct way for portage to check for available space is to actually trigger a mount and ask for that information by adding `/.` to the path for the stat call.

What do you think?
Comment 10 Mike Gilbert gentoo-dev 2023-02-14 16:04:32 UTC
I'm not sure I buy into the idea that stat /var/tmp/portage should not trigger an automount. I don't think we want to introduce an extra "/." in every place we refer to $PORTAGE_TMPDIR/portage.

As a workaround, you set up an automount at some other location (/mnt/foo), and make portage a directory within the automounted filesystem.

You would then set PORTAGE_TMPDIR=/mnt/foo, or set up a symlink to it at /var/tmp/portage.

That way, attempts to access /mnt/foo/portage should trigger an automount since it is accessing a child path instead of the mount point itself.
Comment 11 Kai Krakow 2023-02-14 16:59:11 UTC
Created attachment 851118 [details, diff]
trigger automounts properly (hacky)

I understand your argument, and the proposed work-around is what I was going to implement.

But historically, it looks like there have been some changes around that - and actually stat() should not trigger automounts:

https://patchwork.kernel.org/comment/20796411/

The thing is, Python's os.access() (which is used in portage) does not use stat(), it uses "man 2 access" according to the Python docs but that system call should be avoided anyways according to the man page (because it opens chances for a race condition, which is probably completely unimportant here).

The difference between stat() and access() does not matter here, the essence of the linked discussion is that stat() is not supposed to trigger automounts, and the point of the patch was that automount triggering has been always suppressed previously without user-space being able to use "follow automounts". So user-space should probably flag if they want to explicitly follow automounts which Python probably doesn't do (and there's no flag for it).

But then again, this test isn't really testing for write permissions in a directory, the test would pass even for files:
```
# doebuild.py
1607     # as some people use a separate PORTAGE_TMPDIR mount
1608     # we prefer that as the checks below would otherwise be pointless
1609     # for those people.
1610     checkdir = first_existing(os.path.join(settings["PORTAGE_TMPDIR"], "portage"))
1611
1612     if not os.access(checkdir, os.W_OK):
1613         writemsg(
1614             _(
1615                 "%s is not writable.\n"
1616                 "Likely cause is that you've mounted it as readonly.\n"
1617             )
1618             % checkdir,
1619             noiselevel=-1,
1620         )
1621         return 1
```

I didn't check if it tests for being a directory earlier in the code but certainly this part of the code is incomplete for what it should test (and portage.util.path.first_existing doesn't care for the inode type at all, so we get what we should from that function, and there's probably also no check for being a directory somewhere earlier in the code).

For now, I'm using the attached patch because I think the test isn't doing what it's supposed to do.

But I'm not even sure if using the result of first_existing() is the correct result for checking permissions here because if the first existing directory is just "/var" you'll never have write permissions there. Unless portage is supposed to recreate the complete directory structure - but then, the "correct fix" is most probably not adjusting the permissions as the error message would suggest.

BTW, according to git history, using a symlink here could violate sandbox expectations (commit be2312f4f9bf854897431440734a765f5279c7d1). In light of this commit, the current check for write permissions using first_existing() also doesn't seem to be correct.

So consider my patch quick and dirty. It fixes my specific use case, it doesn't care about the semantic flaws that seem to be in this logic.
Comment 12 Mike Gilbert gentoo-dev 2023-06-06 17:55:01 UTC
It looks like we already use a NamedTemporaryFile to check if PORTAGE_TMPDIR has noexec set.

https://gitweb.gentoo.org/proj/portage.git/tree/lib/portage/package/ebuild/doebuild.py?h=portage-3.0.48.1#n1622

I suppose we could just drop the os.access() check, and move the messaging down to the NamedTemporaryFile check.
Comment 13 Mike Gilbert gentoo-dev 2023-06-06 18:23:36 UTC
Please give this PR a spin.

https://github.com/gentoo/portage/pull/1051
Comment 14 Larry the Git Cow gentoo-dev 2023-06-14 19:23:25 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=37b108a7a4630583921484c0b5f513a7e384d851

commit 37b108a7a4630583921484c0b5f513a7e384d851
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2023-06-06 18:03:50 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2023-06-14 19:21:01 +0000

    doebuild: do not rely on os.access() for PORTAGE_TMPDIR write check
    
    Calling os.access() on ${PORTAGE_TMPDIR}/portage will not trigger any
    automount that the user may have configured there.
    
    Instead, just try to create a file and catch PermissionError.
    
    Bug: https://bugs.gentoo.org/890812
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 lib/portage/package/ebuild/doebuild.py | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)
Comment 15 Larry the Git Cow gentoo-dev 2023-06-21 19:12:58 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=9502761c5bef818dbec90f062909d46dc22289df

commit 9502761c5bef818dbec90f062909d46dc22289df
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-06-21 19:09:31 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-06-21 19:11:05 +0000

    sys-apps/portage: add 3.0.49
    
    Closes: https://bugs.gentoo.org/485100
    Cloess: https://bugs.gentoo.org/592880
    Closes: https://bugs.gentoo.org/596664
    Closes: https://bugs.gentoo.org/631490
    Closes: https://bugs.gentoo.org/764365
    Closes: https://bugs.gentoo.org/793992
    Closes: https://bugs.gentoo.org/890812
    Closes: https://bugs.gentoo.org/905660
    Closes: https://bugs.gentoo.org/907949
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-3.0.49.ebuild | 296 +++++++++++++++++++++++++++++++++
 2 files changed, 297 insertions(+)