Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 831014

Summary: net-fs/autofs-5.1.8 has trouble mounting /net shares
Product: Gentoo Linux Reporter: Fabian Groffen <grobian>
Component: Current packagesAssignee: Yixun Lan <dlan>
Status: CONFIRMED ---    
Severity: normal CC: grobian, musl, sam
Priority: Normal    
Version: unspecified   
Hardware: PPC64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: autofs-revert-nonstrict-offset-mount

Description Fabian Groffen gentoo-dev 2022-01-11 14:07:36 UTC
I upgraded to 5.1.8 on a musl system and noticed all kinds of weird issues with automounted volumes.  Mostly that the mountpoints could not be made, and missing exports.

Unfortunately old versions were wiped from the tree, but I reverted to 5.1.6-r2 and immediately got everything in working order again.

I understand this is very little detail, but this system needs to work, so had no time to investigate at this point.

Things from the log I observed:

Jan 11 13:53:09 khnum automount[3614]: mount_autofs_offset: failed to mount offset trigger  at 
Jan 11 13:56:24 khnum automount[3614]: set_tsd_user_vars: failed to get buffer s
ize for getpwuid_r
Jan 11 14:56:52 khnum kernel: autofs4:pid:10191:validate_dev_ioctl: invalid path
 supplied for cmd(0xc018937e)

Is there a reason why 5.1.6-r2 was dropped (other than cleanup)?
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-01-11 14:10:37 UTC
Is 5.8.1-r1 broken? I've sent a bunch of the patches upstream but I wasn't able to runtime test any of it.

dlan gave it a go and it worked for him on musl though.

I'm not super familiar with autofs but is there anything special about your setup that I can use to reproduce? 

Anyway, I'll let dlan comment re cleanup etc
Comment 2 Fabian Groffen gentoo-dev 2022-01-11 19:15:19 UTC
2022-01-11T09:06:11 >>> net-fs/autofs-5.1.8-r1

ppc64-musl

autofs works for "some" cases, e.g. the auto.home works, but auto.net, doesn't.  It is the net one (/net/host/exports...) that I saw issues with.

showmount is working fine (for auto.net case), I don't quite see how home and net differ.  /home/me was mounted fine, for example, while /net/server/export/home (where me comes from) didn't mount.  The invalid path errors observed somehow suggest that stuff goes wrong, perhaps in the area of fsid, which is what goes wrong when unset in /etc/exports on musl (to allow other machines to mount from the musl machine).
Comment 3 Yixun Lan archtester gentoo-dev 2022-01-12 01:25:49 UTC
I've tested at amd64-musl machine (on qemu-kvm hw), with a simple case of local mount and nfs, both work fine for me


(In reply to Fabian Groffen from comment #0)
> I upgraded to 5.1.8 on a musl system and noticed all kinds of weird issues
> with automounted volumes.  Mostly that the mountpoints could not be made,
> and missing exports.
> 
> Unfortunately old versions were wiped from the tree, but I reverted to
> 5.1.6-r2 and immediately got everything in working order again.
> 
> I understand this is very little detail, but this system needs to work, so
> had no time to investigate at this point.
> 
> Things from the log I observed:
> 
> Jan 11 13:53:09 khnum automount[3614]: mount_autofs_offset: failed to mount
> offset trigger  at 
> Jan 11 13:56:24 khnum automount[3614]: set_tsd_user_vars: failed to get
> buffer s
> ize for getpwuid_r
> Jan 11 14:56:52 khnum kernel: autofs4:pid:10191:validate_dev_ioctl: invalid
> path
>  supplied for cmd(0xc018937e)

need more info to reproduce this.. 

> 
> Is there a reason why 5.1.6-r2 was dropped (other than cleanup)?

it's just a regular cleanup after 5.1.8 stabilized, I would restore old 5.1.6-r2  while before we fix this..
Comment 4 Larry the Git Cow gentoo-dev 2022-01-12 01:46:35 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f56f52579b14217aecb24c2165cc317ec3b92c04

commit f56f52579b14217aecb24c2165cc317ec3b92c04
Author:     Yixun Lan <dlan@gentoo.org>
AuthorDate: 2022-01-12 01:45:19 +0000
Commit:     Yixun Lan <dlan@gentoo.org>
CommitDate: 2022-01-12 01:46:13 +0000

    net-fs/autofs: restore 5.1.6 due to musl breakage
    
    5.1.8 break net mount on ppc64-musl,
    let's temporarily restore 5.1.6 for now
    
    this effectively revert part of:  30f36210abdf
    
    Bug: https://bugs.gentoo.org/831014
    Package-Manager: Portage-3.0.30, Repoman-3.0.3
    RepoMan-Options: --force
    Signed-off-by: Yixun Lan <dlan@gentoo.org>

 net-fs/autofs/Manifest                       |   1 +
 net-fs/autofs/autofs-5.1.6-r2.ebuild         | 128 +++++++++++++++++++++++++++
 net-fs/autofs/files/autofs-5.1.6-glibc.patch | 110 +++++++++++++++++++++++
 net-fs/autofs/files/autofs-5.1.6-musl.patch  |  12 +++
 net-fs/autofs/files/autofs-5.1.6-pid.patch   |  14 +++
 5 files changed, 265 insertions(+)
Comment 5 Fabian Groffen gentoo-dev 2022-01-12 07:15:44 UTC
thanks for restoring 5.1.6-r2.

The set of messages that seem hinting at a problem (from logwatch):

 mount_subtree: parse(sun): failed to mount offset triggers: 1 Time(s)
 mount_autofs_offset: failed to mount offset trigger  at : 21 Time(s)

the whitespace here, seems to indicate it didn't parse the file correctly perhaps?

Just to be clear, I use the exact same config with 5.1.6 which works, and it doesn't seem like the auto.net file changed between 5.1.6 and 5.1.8.  I see 5.1.8 is stable now on amd64 too, so will try that today and see if I get the same kind of problems, just to be sure.
Comment 6 Fabian Groffen gentoo-dev 2022-01-12 18:35:29 UTC
just upgraded on an amd64-glibc machine, and I see the exact same problems on /net shares.

shares become unmountable, seem absent, or appear empty
Comment 7 Fabian Groffen gentoo-dev 2022-01-12 18:46:32 UTC
perhaps relevant, this is all nfs4; opts from auto.net:

opts="-fstype=nfs4,hard,sec=sys,nodev,nosuid,wsize=32768,rsize=32768"

(same options seem to work for auto.home)
Comment 8 Yixun Lan archtester gentoo-dev 2022-01-13 02:23:16 UTC
so here is my test, and sounds work fine for me

# cat /etc/autofs/auto.nfs
gentoo -fstype=nfs4,hard,sec=sys,nodev,nosuid,wsize=32768,rsize=32768  10.0.63.25:/home/gentoo

# mount |grep "/nfs/gentoo"
10.0.63.25:/home/gentoo on /nfs/gentoo type nfs4 (rw,nosuid,nodev,relatime,vers=4.1,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.63.80,local_lock=none,addr=10.0.63.25)
Comment 9 Yixun Lan archtester gentoo-dev 2022-01-13 03:33:10 UTC
Created attachment 762004 [details, diff]
autofs-revert-nonstrict-offset-mount

https://git.kernel.org/pub/scm/linux/storage/autofs/autofs.git/commit/?id=386bfd65589869d3b83228f5c0e68caf99251097

probably introduced by this commit, could you try to apply the patch (try user patch)? and re-test?
Comment 10 Fabian Groffen gentoo-dev 2022-01-13 09:40:54 UTC
will try, thanks for digging into this
Comment 11 Fabian Groffen gentoo-dev 2022-01-16 11:28:49 UTC
unfortunately that makes no change, thanks anyway
Comment 12 Yixun Lan archtester gentoo-dev 2022-01-18 08:16:04 UTC
(In reply to Fabian Groffen from comment #11)
> unfortunately that makes no change, thanks anyway

Fabian, could you give an example conf? so probably I can reproduce this issue myself..

Also, I'd like to report this to autofs upstream, will CC you.
Comment 13 Fabian Groffen gentoo-dev 2022-01-19 07:34:11 UTC
my auto.net:

key="$1"

# add "nosymlink" here if you want to suppress symlinking local filesystems
# add "nonstrict" to make it OK for some filesystems to not mount
opts="-fstype=nfs4,hard,sec=sys,nodev,nosuid,wsize=32768,rsize=32768"

SMNT=/usr/sbin/showmount

[ -x $SMNT ] || exit 1

# Newer distributions get this right
SHOWMOUNT="$SMNT --no-headers -e $key"

$SHOWMOUNT | LC_ALL=C cut -d' ' -f1 | LC_ALL=C sort -u | \
    awk -v key="$key" -v opts="$opts" -- '
    BEGIN   { ORS=""; first=1 }
        { if (first) { print opts; first=0 }; print " \\\n\t" $1, key ":" $1 }
    END { if (!first) print "\n"; else exit 1 }
    ' | sed 's/#/\\#/g'

auto.master:

/home   /etc/autofs/auto.home
/net    /etc/autofs/auto.net

autofs.conf:

[ autofs ]
timeout = 300
browse_mode = no
[ amd ]
dismount_interval = 300
Comment 14 Fabian Groffen gentoo-dev 2022-01-30 15:03:36 UTC
This problem seems to be limited to ppc64 (big-endian) musl, and may be related to the GNU-ism of assuming strerror_r returns a pointer (instead of length written, e.g. POSIX).

The automountd binary actually crashes during startup on this system.

Program received signal SIGSEGV, Segmentation fault.
0x00003ffff7fab738 in tss_get () from /lib/ld-musl-powerpc64.so.1
(gdb) bt
#0  0x00003ffff7fab738 in tss_get () from /lib/ld-musl-powerpc64.so.1
#1  0x00003ffff7ec78f0 in prepare_attempt_prefix (
    msg=0x100036d38 "Starting automounter version %s, master map %s")
    at log.c:41
#2  0x00003ffff7ec7c14 in log_info (logopt=1,
    msg=0x100036d38 "Starting automounter version %s, master map %s")
    at log.c:103
#3  0x00000001000113b4 in main (argc=0, argv=0x3fffffffef88)
    at automount.c:2635

(gdb) down
#1  0x00003ffff7ec78f0 in prepare_attempt_prefix (
    msg=0x100036d38 "Starting automounter version %s, master map %s")
    at log.c:41
41              attempt_id = pthread_getspecific(key_thread_attempt_id);
(gdb) l
36      {
37              unsigned long *attempt_id;
38              char buffer[ATTEMPT_ID_SIZE + 1];
39              char *prefixed_msg = NULL;
40
41              attempt_id = pthread_getspecific(key_thread_attempt_id);
42              if (attempt_id) {
43                      int len = sizeof(buffer) + 1 + strlen(msg) + 1;
44
45                      snprintf(buffer, ATTEMPT_ID_SIZE, "%02lx",
*attempt_id);
Comment 15 Fabian Groffen gentoo-dev 2022-02-01 19:41:00 UTC
update: been triaging this with upstream, some observations

1. musl support was a bit opportunistic, there's a bunch of stuff missing
2. there's a bulk of fixes from upstream already post 5.1.8, them in conjunction with 1. seem to result in a working setup