Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 168170 - net-fs/nfs-utils-1.0.10 intermittently breaks NFS3 mounts
Summary: net-fs/nfs-utils-1.0.10 intermittently breaks NFS3 mounts
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Network Filesystems
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-23 18:01 UTC by Arthur Hagen
Modified: 2007-03-25 06:50 UTC (History)
8 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arthur Hagen 2007-02-23 18:01:55 UTC
After installing nfs-utils-1.0.10 on both clients and servers, stopping nfs, cleaning out /var/lib/nfs/[e|rm]tab, starting nfs, nfs v3 mounts intermittently fail.

In addition, and more seriously, existing NFS v3 mounts intermittently go instantaneously stale on the client side and unmounted on the server side.

Reproducible: Sometimes

Steps to Reproduce:
1. mount -t nfs -o nfsvers=3 server:/path/to/folder /path/to/folder
Actual Results:  
Client reports:
NFSv3 not supported!

Server reports:
rpc.mountd: /var/lib/nfs/etab:1: unknown keyword "acl"

Expected Results:  
Directory mounted with nfsv3 and acls

Kernel:  gentoo-sources 2.6.19-r5

Both client and server have acl enabled in kernel, file system, as USE flag, and both client and server have NFS3 and NFS3 acl enabled in the kernel.

No problems up until upgrading to nfs-utils 1.0.10 - mounts used hundreds of times daily (mostly trough automount).
Comment 1 Dale Pontius 2007-03-02 19:14:15 UTC
(In reply to comment #0)
> After installing nfs-utils-1.0.10 on both clients and servers, stopping nfs,
> cleaning out /var/lib/nfs/[e|rm]tab, starting nfs, nfs v3 mounts intermittently
> fail.

I also see this problem, but with a little more detail. I have 1 x86 (hardened) server, 1 x86 client, and 1 amd64 client. Both x86 machines were upgraded from 1.0.6 to 1.0.10 as it hit stable on x86, and the amd64 machine has remained at 1.0.6. I didn't fiddle when building, so I got the default "-nonfsv4".

Everything built ok and came up ok. But my mounts on both clients dropped after 15-20 minutes. Go through whatever hoops to remount a client, and it would still only stay connected for only 15-20 minutes.

I took both machines back to 1.0.6, and all was well. This morning I took the x86 client back to 1.0.10, and the mount has remained solid all day. So the 1.0.10 problem appears to be at the server.

Incidentally, when I took the server back to 1.0.6 I didn't reboot, but just ran /etc/init.d/nfs restart. I tried some queries, and it appears that there are portions of nfsv4 still running. I can still see the nfsv4 root with "showmount -a" (but not "showmount -e") but cannot mount it. ("mount: permission denied")

(nfs newbie)
Comment 2 ProTech 2007-03-02 19:28:34 UTC
I have the same problem with an x86 server (1.0.10) and an amd64 client (1.0.6). I cannot even mount the shares. I allays got the following error:

mount: stale NFS file handle 

After downgrading to 1.0.6 everything works as before.
Comment 3 SpanKY gentoo-dev 2007-03-03 08:24:35 UTC
1.0.12 is in portage, it'd be trivial to simply test that
Comment 4 Arthur Hagen 2007-03-03 10:34:02 UTC
Neither the portage changelog nor sourceforge relnotes for 1.0.12 have anything in them that seems relevant to this problem.  Since taking all NFS and autofs servers and ciients offline, cleaning out /var/lib/nfs/*tab on both clients and servers, and then restart NFS and all apps that depend on NFS/autofs is a rather large "simple" operation (double that for the probable backout), I'd rather wait.
If anyone else feels like crapshoot testing on a system with acls and nfs3, feel free.  :-)
Comment 5 ProTech 2007-03-03 22:08:20 UTC
I tried 1.0.12, and it's working correctly so far. I have selinux installed in the server, but not in strict mode.
Comment 6 Holger Hoffstätte 2007-03-03 22:41:16 UTC
I can also confirm that my NFS locking problems (which only seem to have appeared in 2.6.20?) went away with 1.0.12 - that's all the crapshoot feedback I can contribute. :)
Comment 7 Arthur Hagen 2007-03-04 00:06:45 UTC
(In reply to comment #5)
> I tried 1.0.12, and it's working correctly so far. I have selinux installed in
> the server, but not in strict mode.
> 

If you have selinux, you almost certainly don't have acl, cause the selinux profile masks out the acl keyword.  Thus you won't be able to test this particular problem.
Comment 8 Arthur Hagen 2007-03-04 00:09:23 UTC
(In reply to comment #6)
> I can also confirm that my NFS locking problems (which only seem to have
> appeared in 2.6.20?) went away with 1.0.12 - that's all the crapshoot feedback
> I can contribute. :)
> 

Yeah - and that's good.  It's a different problem though, I think.  The one in this ticketis with mountd and not with kernel NFS locks (which takes over for rpc.lockd, which is gone with the latest Linux version).
Comment 9 Dale Pontius 2007-03-04 00:31:27 UTC
I brought a spare machine running 2.6.18-hardened and GRSecurity (not enforcing yet) to nfs-utils-1.0.12, and it's able to serve mounts solidly. My problems appear to be only with the server at nfs-utils-1.0.10.

Is there more diagnostic information I can provide? I can take the test machine back to nfs-utils-1.0.12 if it can help. I will say that when my main server was there and failing, there wasn't spit for diagnostic info, at least not that I could see. Just a pile of "???" when doing a "df" for all fields but the server-name/file-export.
Comment 10 ProTech 2007-03-04 07:39:42 UTC
(In reply to comment #7)
> If you have selinux, you almost certainly don't have acl, cause the selinux
> profile masks out the acl keyword.  Thus you won't be able to test this
> particular problem.
> 

Then I have another problem with nfs-utils, which makes my nfs shares not mountable with the error I wrote before:

mount: stale NFS file handle

Which is similar to your problem. And solved by downgrading to 1.0.6, or upgrading to 1.0.12.
Comment 11 SpanKY gentoo-dev 2007-03-04 20:46:46 UTC
so is there anyone that 1.0.12 does not work for ?
Comment 12 Caleb Cushing 2007-03-05 13:05:47 UTC
(In reply to comment #11)
> so is there anyone that 1.0.12 does not work for ?
> 

I'm affected. Will test this evening after work. 
Comment 13 Arthur Hagen 2007-03-06 15:42:14 UTC
Looks good here, with one caveat:  The new 1.0.12 init.d script fails to bring down the old rpc.mountd, meaning that it's not enough to do an /etc/init.d/stop and /etc/init.d/start after upgrading.  Either stop nfs before emerging, or kill rpc.mountd manually before starting again.

(There's also still a problem with getfacl not always returning the acls for very busy NFS mounts, but I believe that was present with 1.0.6 too, and might be due to the recent NFS kernel changes and locking issues, and not nfs-utils.)

Summary:  1.0.12 appears to have fixed the problem with mounting nfs v3, and no longer falls back to v2 and fails if using v3 specific features.

The stop part of the script probably should be fixed, though -- the use of the full path and --oknodo "to smooth things over" is not a good idea for rpc.mountd - it really MUST be killed, even (or especially) after version changes.
Comment 14 Guillermo M. Narvaja 2007-03-12 22:19:52 UTC
I think I have the same problem on two of my Gentoo boxes. I use nfs as root fs for LTSP thin clients, and they coundn't mount.
I tried mounting localy:
# mount -t nfs 192.168.1.2:/opt/ltsp-4.1r1/i386 dir
RPC: Timed out

(192.168.1.2 is the IP of my server and I run this command there, so there can't be a network problem).

One is x86, the other amd64. They were both with nfs-utils-1.0.6-r6 and running fine. 
The last thing I do before the problem begins was upgrading portage and baselayout. 

I tried upgrading to nfs-utils-1.0.10 (x86 box) and 1.0.12 (amd64), but the problem remains.

I've found a strange workaround. Both servers have 2 network interfaces, one conected to LAN, the other to Internet. I've found that running

/etc/init.d/net.eth0 stop
/etc/init.d/portmap restart
sleep 120
/etc/init.d/net.eth0 start

(where eth0 is the interface connected to internet)
In the 120 seconds (i.e. while the internet connection is down) the mounts from LAN work. 

Portage 2.1.2.2 (default-linux/x86/2006.1/desktop, gcc-4.1.1, glibc-2.4-r3, 2.6.19-gentoo-r5 i686)
=================================================================
System uname: 2.6.19-gentoo-r5 i686 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System release 1.12.9
Timestamp of tree: Sat, 10 Mar 2007 23:30:10 +0000
dev-java/java-config: 1.2.11-r1
dev-lang/python:     2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O2 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/lib/mozilla/defaults/pref /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-march=athlon-xp -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig buildpkg distlocks metadata-transfer paralell-fetch sandbox sfperms strict"
GENTOO_MIRRORS="http://mirror.datapipe.net/gentoo http://mirror.datapipe.net/gentoo http://open-systems.ufl.edu/mirrors/gentoo http://adelie.polymtl.ca/"
LANG="es_AR"
LC_ALL="es_AR"
LINGUAS="es es_AR"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X alsa arts avi berkdb bitmap-fonts cairo cdr cli cracklib crypt cups dbus dlloader dri dvd dvdr eds emboss encode esd fam firefox fortran gdbm gif gnome gpm gstreamer gtk gtk2 hal iconv ipv6 isdnlog jpeg kde ldap libg++ mad midi mikmod mp3 mpeg ncurses nls nptl nptlonly ogg opengl oss pam pcre pdflib perl png ppds pppd python qt3 qt4 quicktime readline reflection sdl session spell spl ssl tcpd truetype truetype-fonts type1-fonts udev unicode vorbis win32codecs x86 xml xorg xv zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="es es_AR" USERLAND="GNU" VIDEO_CARDS="apm ark ati chips cirrus cyrix dummy fbdev glint i128 i740 i810 imstt mga neomagic nsc nv rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS


emerge --info
Portage 2.1.2.2 (default-linux/amd64/2006.1, gcc-3.4.4, glibc-2.3.5-r2, 2.6.16-gentoo-r9 x86_64)
=================================================================
System uname: 2.6.16-gentoo-r9 x86_64 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System release 1.12.9
Timestamp of tree: Sun, 11 Mar 2007 03:00:01 +0000
dev-java/java-config: 1.2.11-r1
dev-lang/python:     2.3.5, 2.4.2
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1, 1.10
sys-devel/binutils:  2.16.1
sys-devel/gcc-config: 1.3.12-r6
sys-devel/libtool:   1.5.20
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr//lib/mozilla/defaults/pref /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/lib/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/bind"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/terminfo"
CXXFLAGS="-march=k8 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer parallel-fetch sandbox sfperms strict"
GENTOO_MIRRORS="http://mirror.datapipe.net/gentoo http://mirror.datapipe.net/gentoo http://open-systems.ufl.edu/mirrors/gentoo http://adelie.polymtl.ca/"
LANG="es_AR"
LC_ALL="es_AR"
LINGUAS="es"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 berkdb bitmap-fonts cli cracklib crypt cups doc dri evo fortran gdbm gnome gpm gtk iconv ipv6 isdnlog libg++ midi ncurses nls nptl nptlonly pam pcre perl ppds pppd python readline reflection session spl ssl tcpd truetype-fonts type1-fonts unicode xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="es" USERLAND="GNU" VIDEO_CARDS="apm ark ati chips cirrus cyrix dummy fbdev glint i128 i810 mga neomagic nv rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Comment 15 Ruslan N. Marchenko 2007-03-22 13:25:12 UTC
Have the same problem, got Stale NFS file handle with 1.0.10
After downgrading - works fine.
Maybe it would be proper to remove stable mark from ebuild?
Comment 16 SpanKY gentoo-dev 2007-03-25 06:50:49 UTC
answer: upgrade to 1.0.12

if you want to track stabilization of that, see Bug 172133