Using Gentoo in our university's computer science labs, we've observed that if a machine running Gentoo is shut down while someone else has open files on an NFS-mounted directory on that computer (e.g. someone else is SSH'd into the machine and vimming a file), the machine hangs at "Unmounting filesystems ...". Curiously, it gets past the "Unmounting network filesystems ...", "Unmounting NFS filesystems ...", and various other NFS-related things just fine. We are using the following mount command: mount -t nfs peter:/export/users/users9 /home/users9 -o rw,soft,tcp,nolock,rsize=4096,wsize=4096,retrans=30,addr=10.120.1.15 If you add the intr option then the system instead hangs at the "Remounting remaining filesystems read only ..." message. We are not doing any sort of root over NFS stuff here, just mounting users' home directories over NFS. We had a problem like this before where the system would hang if anybody was even cd'd into an NFS-mounted directory, but that was fixed by upgrading to =nfs-utils-1.1.2 ~x86. Steps to reproduce: 1-Mount an NFS filesystem with the following options: rw,soft,tcp,nolock,rsize=4096,wsize=4096,retrans=30,addr=10.120.1.15 2-Log in via SSH from a remote host and start up something which holds a file open in the NFS mounted directory (e.g. vim a file there, write a quick perl script which opens a file and then goes into an infinite loop without closing the file). Or, do a similar thing inside a screen session or with nohup on the system itself. 3-Shut down the system. What happens: The system hangs at "Unmounting filesystems ..." and has to be cold reset. What should happen: The system should successfully shut down or reboot. emerge --info: Portage 2.1.4.4 (unavailable, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r7 i686) ================================================================= System uname: 2.6.25-gentoo-r7 i686 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Timestamp of tree: Unknown ccache version 2.4 [enabled] dev-lang/python: 2.4.4-r9, 2.5.2-r7 sys-devel/autoconf: 2.13, 2.61-r2 sys-devel/automake: 1.4_p6, 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1 sys-devel/binutils: 2.18-r3 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="x86" CFLAGS="-O2 -march=i686 -pipe -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/lib/hsqldb" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-O2 -march=i686 -pipe -fomit-frame-pointer" DISTDIR="/exclude/distfiles" FEATURES="ccache distlocks fixpackages metadata-transfer prelink sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="ftp://gentoo.mirrors.tds.net/gentoo http://gentoo.mirrors.tds.net/gentoo/ ftp://ftp.ussg.iu.edu/pub/linux/gentoo" PKGDIR="/exclude/packages" PORTAGE_TMPDIR="/exclude/port-tmp" PORTDIR="/exclude/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage" USE="X acpi alsa arts audiofile bash-completion bzip2 cdparanoia cdr cups curl dbus dvd dvdr dvdread emacs esd exif ffmpeg flac ftp gcj gif gnome gphoto2 gpm gstreamer gtk hal ieee1394 java javascript jpeg kde lame ldap mad mp3 mpeg mplayer mysql ncurses nsplugin ogg opengl pcre pdf perl png python qt3 qt4 readline ruby samba scanner sdl slang sndfile spell ssl svg symlink truetype usb vim-syntax vorbis win32codecs wxwindows xine xinerama xml xscreensaver xulrunner zlib" Unset: EMERGE_DEFAULT_OPTS
Confirmed.
Here the very same happens for our corporate network where we have two gentoo-based linux servers with home drives automatically being mounted via autofs/nfs. However, after waiting 20+ minutes the system finally reboots. As soon as we shutdown the system it starts hanging after outputting some information on some unsuccessfull umount.nfs executions: -- cut here -- ... * Unmounting network filesystems ... umount.nfs: /mntfwb/mnt/data: not mounted umount.nfs: /home/langner: device is busy umount.nfs: /home/hofheinz: device is busy * Failed to simply unmount filesystems * Unmounting /home/langner ... [ !! ] * Unmounting /home/hofheinz ... [ !! ] [ ok ] * Unmounting NFS filesystems ... umount.nfs: /home/langner: device is busy umount.nfs: /home/hofheinz: device is busy [ !! ] * ERROR: nfsmount failed to stop ... then nfs deamon and the network interfaces are stopped ... * Unmounting filesystems * Unmounting /home/hofheinz/pub ... * in use but fuser finds nothing [ !! ] * Unmounting /var/lib/nfs/rpc_pipefs ... [ ok ] * Unmounting /home/hofheinz/ftp ... * in use but fuser finds nothing [ !! ] * Unmounting /home ... * in use but fuser finds nothing [ !! ] * Setting hardware clock using the system clock [UTC] ... [ ok ] * Remounting remaining filesystems read-only ... * Remounting /home/hofheinz/pub ... * in use but fuser finds nothing [ !! ] * Remounting /home/hofheinz/ftp ... * in use but fuser finds nothing [ !! ] * Remounting /home ... * in use but fuser finds nothing [ !! ] * Remounting / ... [ ok ] [ ok ] Restarting system. -- cut here -- For every above error the system waits a long period (several minutes). The largest time it have to wait is right after "Unmounting filesystems" (e.g. 20 minutes). So the whole shutdown process took more than 30+ minutes, which of course is very inconvienent. This is with having nfs-utils 1.1.4 installed and the rest of the system being completly uptodate (also portage).
> Here the very same happens for our corporate network where we have two gentoo-based linux servers Since you mentioned servers, I want to make it clear that the bug I am reporting is a CLIENT issue, NOT a server issue. I'm not sure if you were reporting a server issue or just mentioning the fact that you have gentoo servers, but the problem in this case is with the client. At my university lab we have an RHEL5 server with gentoo clients. I apologize if I did not make this clear in my initial report.
Of course I meant to say that the problem is with the client part of NFS. Sorry for the confusion, but this machine is acting as an NFS server as well as an NFS client. And here I was supposed to report on the NFS client problems only.
It seems to me that the gentoo shutdown sequence assumes that all NFS file systems are successfully unmounted after the invocation of "umount -a -t nfs,nfs4" in /etc/init.d/nfsmount.sh stop(). However, this is not the case when the file system is currently in use. Then umount fails with a "device busy" and the nfs volume remains mounted. This is a problem later in /etc/init.d/halt.sh when during "Unmounting filesystems" and "mount_readonly()" the "fuser" command is executed. If at that time there's still a nfs volume present in /proc/mounts, the fuser command hangs causing the whole system to hang. So until someone fixes the underlying problem of safely unmounting a busy nfs mount I've worked around this issue by skipping all unsuccessfully unmounted nfs volumes in /etc/init.d/halt.sh: --- halt.sh.orig 2008-12-21 04:49:42.677779805 +0100 +++ halt.sh 2008-12-21 13:17:21.937607946 +0100 @@ -13,7 +13,23 @@ # livecd-functions.sh should _ONLY_ set this differently if CDBOOT is # set, else the default one should be used for normal boots. # say: RC_NO_UMOUNTS="/mnt/livecd|/newroot" -RC_NO_UMOUNTS=${RC_NO_UMOUNTS:-^(/|/dev|/dev/pts|/lib/rcscripts/init.d|/proc|/proc/.*|/sys)$} +RC_NO_UMOUNTS=${RC_NO_UMOUNTS:-^(/|/dev|/dev/pts|/lib/rcscripts/init.d|/proc|/proc/.*|/sys|/sys/kernel/security)$} + +for x in $(awk '{print $3, $2}' /proc/mounts | sort -ur) ; do + x=${x//\\040/ } + + if [[ " ${x} " == " nfs " || " ${x} " == " nfs4 " ]] ; then + nfsnext=1 + continue + fi + + if [[ ${nfsnext} -eq 1 ]] ; then + ewarn "NFS unmount failed for ${x}" + ewarn "Skipping ..." + RC_NO_UMOUNTS="${RC_NO_UMOUNTS}|${x}" + nfsnext=0 + fi +done # Reset pam_console permissions if we are actually using it if [[ -x /sbin/pam_console_apply && ! -c /dev/.devfsd && \
Hello i got this problem too when i shutdown from gnome and have mounted /home via nfs in my opinion the problem is in /etc/init.d/netmount in ne stop() section the umount command with option -r on ja busy device causes not to remount the share readonly it kills simply the entry in /etc/mtab and netmount thinks that everything got ok. but it didn't i simply removed read only option then die script can go further and term or kill the process using the share. and it works my new /etc/init.d/netmount stop() { local rcfilesystems=${NET_FS_LIST} rcfilesystems=${rcfilesystems// /,} # convert to comma-separated ebegin "Unmounting network filesystems" - [[ -z $(umount -art ${rcfilesystems} 2>&1) ]] + [[ -z $(umount -at ${rcfilesystems} 2>&1) ]] eend $? "Failed to simply unmount filesystems" [[ $? == 0 ]] && return 0
(In reply to comment #6) Thomas H., your fix does the trick. Thanks much! /etc/init.d/netmount belongs to sys-apps/baselayout. Is this change something we should try to get merged into that package? If so, how should I or someone else go about that. Thanks again.
Added vapier to CC. This is a real problem with the baselayout, and should be fixed. I have tested baselayout 1.12.12 and the same problem exists there. I can't see why we should remount ro if we can't umount. Seconds after this we will loose networking anyway...
*** Bug 250920 has been marked as a duplicate of this bug. ***
I CCd base-system@ and removed vapier@ cause he will take any mails twice if not.
I can add that baselayout 2 does _not_ seem to have this problem.
I confirm the same problem, except that I'm mounted "hard,intr" instead of "soft". One evening I hit "shutdown", turned off the monitor, and went to bed without waiting. The next morning the system was still hung at "Remounting remaining filesystems read only ..." My most common problem is that Thunderbird starts gpg-agent, then leaves the daemon running after it exits. I've set things up so that gpg-agent uses files on a local filesystem instead of my NFS home, and that has been a good workaround.
I am also affected by this. Maybe other option could be wait for getting baselayout-2/openrc stabilized, but I don't know when will be done :-/ Thanks a lot!
The same problem occours when the remote nfs system is mounted via autofs. The content of my /etc/auto.nfs file: nfs -rw,soft,intr,rsize=8192,wsize=8192 192.168.1.100:/srv/nfs
(In reply to comment #6) > Hello > i got this problem too when i shutdown from gnome and have mounted /home via > nfs > in my opinion the problem is in /etc/init.d/netmount > > in ne stop() section the umount command with option -r on ja busy device causes > not to remount the share readonly it kills simply the entry in /etc/mtab and > netmount thinks that everything got ok. but it didn't > > i simply removed read only option then die script can go further and term or > kill the process using the share. and it works > > my new /etc/init.d/netmount > > stop() { > local rcfilesystems=${NET_FS_LIST} > > rcfilesystems=${rcfilesystems// /,} # convert to comma-separated > > ebegin "Unmounting network filesystems" > - [[ -z $(umount -art ${rcfilesystems} 2>&1) ]] > + [[ -z $(umount -at ${rcfilesystems} 2>&1) ]] > eend $? "Failed to simply unmount filesystems" > [[ $? == 0 ]] && return 0 > In my case I simply add "-f" to the options, it seems to be for nfs as read in "man umount": -f Force unmount (in case of an unreachable NFS system). (Requires kernel 2.1.116 or later.) and works ok for me
One of the NFS servers went flaky. It's mounted automatically on the clients via nfsmount and fstab. When the clients started, nfsmount would start mounting nfs shares, hit the flaky server, and fail. However, it had already mounted all of the other nfs servers. On shutdown, nfsmount stop was never called. The service manager didn't call it because, as far as it was concerned, it was never started. The clients would hang indefinably at Unmounting Filesystems because the NFS shares were still mounted when the system tried to unmount the local filesystems and the network had already been brought down. My solution was to add the following to /etc/conf.d/net: predown() { # The default in the script is to test for NFS root and disallow # downing interfaces in that case. Note that if you specify a # predown() function you will override that logic. Here it is, in # case you still want it... if is_net_fs /; then eerror "root filesystem is network mounted -- can't stop ${IFACE}" return 1 fi #Make sure network filesystems are umounted if grep 'nfs' /proc/mounts; then ewarn "NFS Mounts Failed to Unmount." umount -aft nfs fi # Remember to return 0 on success return 0 } This ensured that nfs shares were unmounted, even if the service scripts failed, before the network went down. It did take about 1 minute 20 seconds for the command to complete. However, the systems stopped hanging at shutdown. I latter fixed the flaky nfs server. This bit of code could be expanded to include other network filesystems.
Maybe adding "-f" to stop() { ebegin "Unmounting NFS filesystems" umount -a -t nfs,nfs4 eend $? } would also solve this :-/
Why is nfs unmounting done by nfsmount and netmount initscripts? I think that would be better that nfsmount would do the work.
More news: I added the following to my local.stop: umount -a -f -t nfs,nfs4 and it also hangs when server is unreachable :-(, but manually running it seems to work :-/
I think that I've found where is the problem: When a file is WRITTEN in nfs mounted filesystem "umount -f" fails to work, on the other side, if the file is simply read, there is not problem and umount -f will work ok when server goes down. Seems that for workarounding this, umount needs to be run with "-l" option -l Lazy unmount. Detach the file system from the file system hier‐ archy now, and cleanup all references to the file system as soon as it is not busy anymore. But I doubt if this is the expected behavior or I should report this problem to upstream Thanks for you help on this :-)
I asked upstream also: http://www.spinics.net/lists/linux-nfs/msg09317.html
I habe the same problem with baselayout-2.0.1. A directory of a remote computer is mounted by auto mounter. The directory is accessed during login of root. Shutdown hangs at "unmounting network filesystems". I have linux-2.6.3[01]-gentoo, nfs-utils-1.2.0 and libnsfidmap-0.21-r1. Is there any recommended solution?
I have added: umount -a -l -f -t nfs,nfs4 to my local.stop ;-)
The hint of comment 23 seems to work for mee too.
*** Bug 268844 has been marked as a duplicate of this bug. ***
I stumbled into this one, too. I get a error on "Remounting remaining filesystems readonly" every now and then, saying the hostname of the NFS server could not be found. Weird thing: sometimes it just reboots / halts normally. Maybe I´ll find the time to provide more Information later.
Putting umount -a -l -f -t nfs,nfs4 in my local.stop does not work for me. The problem is still present with net-fs/nfs-utils-1.2.2-r1
Any update on this. when can we use baselayout-2 . Even I have the same problem at my workplace . The system hangs at "Unmounting filesystems" when user has open files on an NFS mount. Tried out things that were given here and they fail to work at most times .
Indeed. Gentoo has other problems too rebooting with a mounted network FS on.
With baselayout-2, this doesn't seem to be an issue anymore.
I've been using baselayout2 before it's stabilization.
*** Bug 445036 has been marked as a duplicate of this bug. ***