Summary: | net-fs/autofs-5.1.8 has trouble mounting /net shares | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Fabian Groffen <grobian> |
Component: | Current packages | Assignee: | Yixun Lan <dlan> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | grobian, musl, sam |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | PPC64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | autofs-revert-nonstrict-offset-mount |
Description
Fabian Groffen
2022-01-11 14:07:36 UTC
Is 5.8.1-r1 broken? I've sent a bunch of the patches upstream but I wasn't able to runtime test any of it. dlan gave it a go and it worked for him on musl though. I'm not super familiar with autofs but is there anything special about your setup that I can use to reproduce? Anyway, I'll let dlan comment re cleanup etc 2022-01-11T09:06:11 >>> net-fs/autofs-5.1.8-r1 ppc64-musl autofs works for "some" cases, e.g. the auto.home works, but auto.net, doesn't. It is the net one (/net/host/exports...) that I saw issues with. showmount is working fine (for auto.net case), I don't quite see how home and net differ. /home/me was mounted fine, for example, while /net/server/export/home (where me comes from) didn't mount. The invalid path errors observed somehow suggest that stuff goes wrong, perhaps in the area of fsid, which is what goes wrong when unset in /etc/exports on musl (to allow other machines to mount from the musl machine). I've tested at amd64-musl machine (on qemu-kvm hw), with a simple case of local mount and nfs, both work fine for me (In reply to Fabian Groffen from comment #0) > I upgraded to 5.1.8 on a musl system and noticed all kinds of weird issues > with automounted volumes. Mostly that the mountpoints could not be made, > and missing exports. > > Unfortunately old versions were wiped from the tree, but I reverted to > 5.1.6-r2 and immediately got everything in working order again. > > I understand this is very little detail, but this system needs to work, so > had no time to investigate at this point. > > Things from the log I observed: > > Jan 11 13:53:09 khnum automount[3614]: mount_autofs_offset: failed to mount > offset trigger at > Jan 11 13:56:24 khnum automount[3614]: set_tsd_user_vars: failed to get > buffer s > ize for getpwuid_r > Jan 11 14:56:52 khnum kernel: autofs4:pid:10191:validate_dev_ioctl: invalid > path > supplied for cmd(0xc018937e) need more info to reproduce this.. > > Is there a reason why 5.1.6-r2 was dropped (other than cleanup)? it's just a regular cleanup after 5.1.8 stabilized, I would restore old 5.1.6-r2 while before we fix this.. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f56f52579b14217aecb24c2165cc317ec3b92c04 commit f56f52579b14217aecb24c2165cc317ec3b92c04 Author: Yixun Lan <dlan@gentoo.org> AuthorDate: 2022-01-12 01:45:19 +0000 Commit: Yixun Lan <dlan@gentoo.org> CommitDate: 2022-01-12 01:46:13 +0000 net-fs/autofs: restore 5.1.6 due to musl breakage 5.1.8 break net mount on ppc64-musl, let's temporarily restore 5.1.6 for now this effectively revert part of: 30f36210abdf Bug: https://bugs.gentoo.org/831014 Package-Manager: Portage-3.0.30, Repoman-3.0.3 RepoMan-Options: --force Signed-off-by: Yixun Lan <dlan@gentoo.org> net-fs/autofs/Manifest | 1 + net-fs/autofs/autofs-5.1.6-r2.ebuild | 128 +++++++++++++++++++++++++++ net-fs/autofs/files/autofs-5.1.6-glibc.patch | 110 +++++++++++++++++++++++ net-fs/autofs/files/autofs-5.1.6-musl.patch | 12 +++ net-fs/autofs/files/autofs-5.1.6-pid.patch | 14 +++ 5 files changed, 265 insertions(+) thanks for restoring 5.1.6-r2. The set of messages that seem hinting at a problem (from logwatch): mount_subtree: parse(sun): failed to mount offset triggers: 1 Time(s) mount_autofs_offset: failed to mount offset trigger at : 21 Time(s) the whitespace here, seems to indicate it didn't parse the file correctly perhaps? Just to be clear, I use the exact same config with 5.1.6 which works, and it doesn't seem like the auto.net file changed between 5.1.6 and 5.1.8. I see 5.1.8 is stable now on amd64 too, so will try that today and see if I get the same kind of problems, just to be sure. just upgraded on an amd64-glibc machine, and I see the exact same problems on /net shares. shares become unmountable, seem absent, or appear empty perhaps relevant, this is all nfs4; opts from auto.net: opts="-fstype=nfs4,hard,sec=sys,nodev,nosuid,wsize=32768,rsize=32768" (same options seem to work for auto.home) so here is my test, and sounds work fine for me # cat /etc/autofs/auto.nfs gentoo -fstype=nfs4,hard,sec=sys,nodev,nosuid,wsize=32768,rsize=32768 10.0.63.25:/home/gentoo # mount |grep "/nfs/gentoo" 10.0.63.25:/home/gentoo on /nfs/gentoo type nfs4 (rw,nosuid,nodev,relatime,vers=4.1,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.63.80,local_lock=none,addr=10.0.63.25) Created attachment 762004 [details, diff] autofs-revert-nonstrict-offset-mount https://git.kernel.org/pub/scm/linux/storage/autofs/autofs.git/commit/?id=386bfd65589869d3b83228f5c0e68caf99251097 probably introduced by this commit, could you try to apply the patch (try user patch)? and re-test? will try, thanks for digging into this unfortunately that makes no change, thanks anyway (In reply to Fabian Groffen from comment #11) > unfortunately that makes no change, thanks anyway Fabian, could you give an example conf? so probably I can reproduce this issue myself.. Also, I'd like to report this to autofs upstream, will CC you. my auto.net: key="$1" # add "nosymlink" here if you want to suppress symlinking local filesystems # add "nonstrict" to make it OK for some filesystems to not mount opts="-fstype=nfs4,hard,sec=sys,nodev,nosuid,wsize=32768,rsize=32768" SMNT=/usr/sbin/showmount [ -x $SMNT ] || exit 1 # Newer distributions get this right SHOWMOUNT="$SMNT --no-headers -e $key" $SHOWMOUNT | LC_ALL=C cut -d' ' -f1 | LC_ALL=C sort -u | \ awk -v key="$key" -v opts="$opts" -- ' BEGIN { ORS=""; first=1 } { if (first) { print opts; first=0 }; print " \\\n\t" $1, key ":" $1 } END { if (!first) print "\n"; else exit 1 } ' | sed 's/#/\\#/g' auto.master: /home /etc/autofs/auto.home /net /etc/autofs/auto.net autofs.conf: [ autofs ] timeout = 300 browse_mode = no [ amd ] dismount_interval = 300 This problem seems to be limited to ppc64 (big-endian) musl, and may be related to the GNU-ism of assuming strerror_r returns a pointer (instead of length written, e.g. POSIX). The automountd binary actually crashes during startup on this system. Program received signal SIGSEGV, Segmentation fault. 0x00003ffff7fab738 in tss_get () from /lib/ld-musl-powerpc64.so.1 (gdb) bt #0 0x00003ffff7fab738 in tss_get () from /lib/ld-musl-powerpc64.so.1 #1 0x00003ffff7ec78f0 in prepare_attempt_prefix ( msg=0x100036d38 "Starting automounter version %s, master map %s") at log.c:41 #2 0x00003ffff7ec7c14 in log_info (logopt=1, msg=0x100036d38 "Starting automounter version %s, master map %s") at log.c:103 #3 0x00000001000113b4 in main (argc=0, argv=0x3fffffffef88) at automount.c:2635 (gdb) down #1 0x00003ffff7ec78f0 in prepare_attempt_prefix ( msg=0x100036d38 "Starting automounter version %s, master map %s") at log.c:41 41 attempt_id = pthread_getspecific(key_thread_attempt_id); (gdb) l 36 { 37 unsigned long *attempt_id; 38 char buffer[ATTEMPT_ID_SIZE + 1]; 39 char *prefixed_msg = NULL; 40 41 attempt_id = pthread_getspecific(key_thread_attempt_id); 42 if (attempt_id) { 43 int len = sizeof(buffer) + 1 + strlen(msg) + 1; 44 45 snprintf(buffer, ATTEMPT_ID_SIZE, "%02lx", *attempt_id); update: been triaging this with upstream, some observations 1. musl support was a bit opportunistic, there's a bunch of stuff missing 2. there's a bulk of fixes from upstream already post 5.1.8, them in conjunction with 1. seem to result in a working setup |