I'm running a nfsroot environment on amd64. The client kernel loads from a local harddisk, then mounts its root filesystem via nfs. Latest updates to util-linux and nfs-utils horribly broke my installation. Depending on the versions of both packages in use I cannot mount any further shares after nfsroot is mounted, or I can't even mount root at all. See "Steps to reproduce" for details. Reproducible: Always Steps to Reproduce: 1. Update nfs-utils to any version above 1.1 (tried 1.1.0-r1 and 1.1.1), keep util-linux-2.12r-r8. 2. Reboot. 3. Result: Kernel boots up but init stops at "Remounting root filesystem read/write ...". 40 seconds later I get a "Root filesystem could not be mounted read/write :( [!!] Give root pasword for maintenance (or type Control-D to continue):" 4. Manually mounting the share results in an error message claiming that rpc.statd wasn't started yet. 5. /etc/init.d/rpc.statd start 6. Result: "ERROR: cannot run rpc.statd until sysinit completes, rpc.statd will be started in the boot runlevel" Funny. Sysinit can't complete because root can't be mounted r/w, mounting fails because sysinit can't complete. 7. Downgrade to nfs-utils-1.0.12-r1. 8. Reboot. 9. Result: Everything works again as expected. 10. Upgrade to util-linux-2.13-r2. 11. Reboot. 12. Result: Kernel boots, init completes, all seems fine. But where are my other shares? Turns out that while root now gets mounted without problems, I can't mount any other nfs shares (see bug 200307). 13. As suggested in that bug, upgrade to nfs-utils-1.1.0-r1 (or 1.1.1, results are the same). 14. Reboot. 15. Result: Rien ne va plus. Again init stops at "Remounting root filesystem read/write ..." 16. Downgrade to nfs-utils-1.0.12-r1 and util-linux-2.12r-r8. 17. Reboot. 18. Result: Everything works as expected. Actual Results: To summarize my tested combinations of nfs-utils and util-linux: 1. nfs-utils-1.0.12-r1 + util-linux-2.12r-r8: Works perfectly. 2. nfs-utils-1.1.x + util-linux-2.12r-r8: init hangs when trying to remount root fs r/w. 3. nfs-utils-1.0.12-r1 + util-linux-2.13-r2: Root mounts fine, other shares fail. 4. nfs-utils-1.1.x + util-linux-2.13-r2: init hangs when trying to remount root fs r/w. Expected Results: nfs-utils-1.1.x and util-linux-2.13-r2 should not break my installation. emerge --info: Portage 2.1.3.19 (default-linux/amd64/2006.1/desktop, gcc-4.1.2, glibc-2.6.1-r0, 2.6.23-gentoo-r3-smp x86_64) ================================================================= System uname: 2.6.23-gentoo-r3-smp x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ Timestamp of tree: Sun, 13 Jan 2008 15:16:01 +0000 app-shells/bash: 3.2_p17-r1 dev-java/java-config: 1.3.7, 2.0.33-r1 dev-lang/python: 2.4.4-r6 dev-python/pycrypto: 2.0.1-r6 sys-apps/baselayout: 1.12.10-r5 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.4_p6, 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.24 virtual/os-headers: 2.6.23-r2 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=athlon64 -msse3 -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-march=athlon64 -msse3 -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://mirrors.sec.informatik.tu-darmstadt.de/gentoo/ ftp://ftp.gentoo.mesh-solutions.com/gentoo/ ftp://212.219.56.152/sites/www.ibiblio.org/gentoo/ http://212.219.56.162/sites/www.ibiblio.org/gentoo/" LANG="de_DE@euro" LC_ALL="de_DE@euro" LINGUAS="de" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="X Xaw3d a52 aac aalib alsa amd64 berkdb bitmap-fonts bzip2 cairo cdr cli cracklib crypt cups dbus dri dts dvd dvdr dvdread eds emboss encode exif fam ffmpeg firefox flac fortran gdbm gif glut gpm gstreamer gtk gtk2 gtkhtml hal iconv imagemagick ipv6 isdnlog jack joystick jpeg jpeg2k kde ladspa ldap libsamplerate lm_sensors mad midi mikmod mng mp3 mpeg mudflap mysql ncurses nls nptl nptlonly nsplugin ogg openal opengl openmp oss pam pcre perl png pppd python qt3 quicktime readline reflection sdl session spell spl ssl svg tcpd threads tidy tk truetype truetype-fonts type1-fonts v4l vcd vorbis xine xinerama xml xml2 xorg xv xvid xvmc zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="canon" ELIBC="glibc" INPUT_DEVICES="evdev joystick keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" USERLAND="GNU" VIDEO_CARDS="nvidia nv vesa fbdev" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
you need to set the nolock option on your rootfs in your /etc/fstab or in your kernel cmdline
Setting nolock in /etc/fstab works fine. But is it a good idea to disable locking or may this lead to problems when multiple clients try to access the same files on a share? Does upstream care about this issue or is the only "real" solution a switch to NFSv4, which seems to perform locking on protocol level, without the need of extra programs running?
i highly doubt they'll add locking to the protocol itself ... they split it off into a sep daemon on purpose NFS has always been silently doing no locking, you just never noticed until now where NFS userspace failed early on with an error ... i'm not suggesting that nolock is a fix, just something to get you going until we figure out how to get rpc.statd running early on ... this most likely will require baselayout-2 to get fixed properly though
I thought i had read about that protocol merge somewhere, but maybe I just mixed something up. So am I getting the situation right? Locking works when rpc.statd is running, but doesn't work and never has worked for root nfs? Would it be possible to somehow "attach" locking to root nfs later or is there no way to get that running without an explicit remount of the affected (root) nfs? I'm sorry for those stupid questions, but most of the nfs documentation one can find on the internet is... weird. :-)
you'd have to ask those questions on the nfs sourceforge mailing list ... i really dont know the answers ;)