Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 239347 - hang on shutdown at "Unmounting filesystems" when user has open files on an NFS mount
Summary: hang on shutdown at "Unmounting filesystems" when user has open files on an N...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High major with 4 votes (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
: 250920 268844 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-10-02 16:55 UTC by Noah Sheppard
Modified: 2022-04-26 00:01 UTC (History)
25 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Noah Sheppard 2008-10-02 16:55:14 UTC
Using Gentoo in our university's computer science labs, we've observed that if a machine running Gentoo is shut down while someone else has open files on an NFS-mounted directory on that computer (e.g. someone else is SSH'd into the machine and vimming a file), the machine hangs at "Unmounting filesystems ...".  Curiously, it gets past the "Unmounting network filesystems ...", "Unmounting NFS filesystems ...", and various other NFS-related things just fine.

We are using the following mount command:
mount -t nfs peter:/export/users/users9 /home/users9 -o rw,soft,tcp,nolock,rsize=4096,wsize=4096,retrans=30,addr=10.120.1.15

If you add the intr option then the system instead hangs at the "Remounting remaining filesystems read only ..." message.

We are not doing any sort of root over NFS stuff here, just mounting users' home directories over NFS. We had a problem like this before where the system would hang if anybody was even cd'd into an NFS-mounted directory, but that was fixed by upgrading to =nfs-utils-1.1.2 ~x86.

Steps to reproduce:
1-Mount an NFS filesystem with the following options: rw,soft,tcp,nolock,rsize=4096,wsize=4096,retrans=30,addr=10.120.1.15
2-Log in via SSH from a remote host and start up something which holds a file open in the NFS mounted directory (e.g. vim a file there, write a quick perl script which opens a file and then goes into an infinite loop without closing the file).  Or, do a similar thing inside a screen session or with nohup on the system itself.
3-Shut down the system.

What happens:
The system hangs at "Unmounting filesystems ..." and has to be cold reset.

What should happen:
The system should successfully shut down or reboot.

emerge --info:
Portage 2.1.4.4 (unavailable, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r7 i686)
=================================================================
System uname: 2.6.25-gentoo-r7 i686 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Timestamp of tree: Unknown
ccache version 2.4 [enabled]
dev-lang/python:     2.4.4-r9, 2.5.2-r7
sys-devel/autoconf:  2.13, 2.61-r2
sys-devel/automake:  1.4_p6, 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1
sys-devel/binutils:  2.18-r3
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CFLAGS="-O2 -march=i686 -pipe -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-O2 -march=i686 -pipe -fomit-frame-pointer"
DISTDIR="/exclude/distfiles"
FEATURES="ccache distlocks fixpackages metadata-transfer prelink sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://gentoo.mirrors.tds.net/gentoo http://gentoo.mirrors.tds.net/gentoo/ ftp://ftp.ussg.iu.edu/pub/linux/gentoo"
PKGDIR="/exclude/packages"
PORTAGE_TMPDIR="/exclude/port-tmp"
PORTDIR="/exclude/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="X acpi alsa arts audiofile bash-completion bzip2 cdparanoia cdr cups curl dbus dvd dvdr dvdread emacs esd exif ffmpeg flac ftp gcj gif gnome gphoto2 gpm gstreamer gtk hal ieee1394 java javascript jpeg kde lame ldap mad mp3 mpeg mplayer mysql ncurses nsplugin ogg opengl pcre pdf perl png python qt3 qt4 readline ruby samba scanner sdl slang sndfile spell ssl svg symlink truetype usb vim-syntax vorbis win32codecs wxwindows xine xinerama xml xscreensaver xulrunner zlib"
Unset:  EMERGE_DEFAULT_OPTS
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2008-10-02 19:55:30 UTC
Confirmed.
Comment 2 Jens Maus 2008-11-21 16:36:34 UTC
Here the very same happens for our corporate network where we have two gentoo-based linux servers with home drives automatically being mounted via autofs/nfs. However, after waiting 20+ minutes the system finally reboots.

As soon as we shutdown the system it starts hanging after outputting some information on some unsuccessfull umount.nfs executions:

-- cut here --

...

 * Unmounting network filesystems ...
umount.nfs: /mntfwb/mnt/data: not mounted
umount.nfs: /home/langner: device is busy
umount.nfs: /home/hofheinz: device is busy
 * Failed to simply unmount filesystems
 *   Unmounting /home/langner ...
 [ !! ]
 *   Unmounting /home/hofheinz ...
 [ !! ]
 [ ok ]
 * Unmounting NFS filesystems ...
umount.nfs: /home/langner: device is busy
umount.nfs: /home/hofheinz: device is busy
 [ !! ]
 * ERROR: nfsmount failed to stop

... then nfs deamon and the network interfaces are stopped ...

 * Unmounting filesystems
 *   Unmounting /home/hofheinz/pub ...
 *   in use but fuser finds nothing
 [ !! ]
 *   Unmounting /var/lib/nfs/rpc_pipefs ...
 [ ok ]
 *   Unmounting /home/hofheinz/ftp ...
 *   in use but fuser finds nothing
 [ !! ]
 *   Unmounting /home ...
 *   in use but fuser finds nothing
 [ !! ]
 * Setting hardware clock using the system clock [UTC] ...
 [ ok ]
 * Remounting remaining filesystems read-only ...
 *   Remounting /home/hofheinz/pub ...
 *   in use but fuser finds nothing
 [ !! ]
 *   Remounting /home/hofheinz/ftp ...
 *   in use but fuser finds nothing
 [ !! ]
 *   Remounting /home ...
 *   in use but fuser finds nothing
 [ !! ]
 *   Remounting / ...
 [ ok ]
 [ ok ]
Restarting system.

-- cut here --

For every above error the system waits a long period (several minutes). The largest time it have to wait is right after "Unmounting filesystems" (e.g. 20 minutes).

So the whole shutdown process took more than 30+ minutes, which of course is very inconvienent.

This is with having nfs-utils 1.1.4 installed and the rest of the system being completly uptodate (also portage).
Comment 3 Noah Sheppard 2008-11-21 16:41:30 UTC
> Here the very same happens for our corporate network where we have two gentoo-based linux servers

Since you mentioned servers, I want to make it clear that the bug I am reporting is a CLIENT issue, NOT a server issue.  I'm not sure if you were reporting a server issue or just mentioning the fact that you have gentoo servers, but the problem in this case is with the client.  At my university lab we have an RHEL5 server with gentoo clients.  I apologize if I did not make this clear in my initial report.
Comment 4 Jens Maus 2008-11-21 17:10:40 UTC
Of course I meant to say that the problem is with the client part of NFS. Sorry for the confusion, but this machine is acting as an NFS server as well as an NFS client. And here I was supposed to report on the NFS client problems only.
Comment 5 no_bs 2008-12-21 12:44:41 UTC
It seems to me that the gentoo shutdown sequence assumes that all NFS file systems are successfully unmounted after the invocation of "umount -a -t nfs,nfs4" in /etc/init.d/nfsmount.sh stop().
However, this is not the case when the file system is currently in use. Then umount fails with a "device busy" and the nfs volume remains mounted. This is a problem later in /etc/init.d/halt.sh when during "Unmounting filesystems" and "mount_readonly()" the "fuser" command is executed. If at that time there's still a nfs volume present in /proc/mounts, the fuser command hangs causing the whole system to hang.

So until someone fixes the underlying problem of safely unmounting a busy nfs mount I've worked around this issue by skipping all unsuccessfully unmounted nfs volumes in /etc/init.d/halt.sh:

--- halt.sh.orig        2008-12-21 04:49:42.677779805 +0100
+++ halt.sh     2008-12-21 13:17:21.937607946 +0100
@@ -13,7 +13,23 @@
 # livecd-functions.sh should _ONLY_ set this differently if CDBOOT is
 # set, else the default one should be used for normal boots.
 # say:  RC_NO_UMOUNTS="/mnt/livecd|/newroot"
-RC_NO_UMOUNTS=${RC_NO_UMOUNTS:-^(/|/dev|/dev/pts|/lib/rcscripts/init.d|/proc|/proc/.*|/sys)$}
+RC_NO_UMOUNTS=${RC_NO_UMOUNTS:-^(/|/dev|/dev/pts|/lib/rcscripts/init.d|/proc|/proc/.*|/sys|/sys/kernel/security)$}
+
+for x in $(awk '{print $3, $2}' /proc/mounts | sort -ur) ; do
+       x=${x//\\040/ }
+
+       if [[ " ${x} " == " nfs " || " ${x} " == " nfs4 " ]] ; then
+               nfsnext=1
+               continue
+       fi
+
+       if [[ ${nfsnext} -eq 1 ]] ; then
+               ewarn "NFS unmount failed for ${x}"
+               ewarn "Skipping ..."
+               RC_NO_UMOUNTS="${RC_NO_UMOUNTS}|${x}"
+               nfsnext=0
+       fi
+done
 
 # Reset pam_console permissions if we are actually using it
 if [[ -x /sbin/pam_console_apply && ! -c /dev/.devfsd && \

Comment 6 Thomas H. 2009-01-08 10:13:47 UTC
Hello
i got this problem too when i shutdown from gnome and have mounted /home via nfs
in my opinion the problem is in /etc/init.d/netmount

in ne stop() section the umount command with option -r on ja busy device causes not to remount the share readonly it kills simply the entry in /etc/mtab and netmount thinks that everything got ok. but it didn't

i simply removed read only option then die script can go further and term or kill the process using the share. and it works

my new /etc/init.d/netmount

stop() {
        local rcfilesystems=${NET_FS_LIST}

        rcfilesystems=${rcfilesystems// /,}   # convert to comma-separated

        ebegin "Unmounting network filesystems"
-         [[ -z $(umount -art ${rcfilesystems} 2>&1) ]]
+        [[ -z $(umount -at ${rcfilesystems} 2>&1) ]]
        eend $? "Failed to simply unmount filesystems"
        [[ $? == 0 ]] && return 0



 
Comment 7 Noah Sheppard 2009-01-10 20:53:35 UTC
(In reply to comment #6)

Thomas H., your fix does the trick. Thanks much!

/etc/init.d/netmount belongs to sys-apps/baselayout.  Is this change something we should try to get merged into that package?  If so, how should I or someone else go about that.

Thanks again.
Comment 8 Johan Ymerson 2009-01-15 12:02:37 UTC
Added vapier to CC.

This is a real problem with the baselayout, and should be fixed.
I have tested baselayout 1.12.12 and the same problem exists there.

I can't see why we should remount ro if we can't umount. Seconds after this we will loose networking anyway...
Comment 9 Panagiotis Christopoulos (RETIRED) gentoo-dev 2009-01-15 12:15:32 UTC
*** Bug 250920 has been marked as a duplicate of this bug. ***
Comment 10 Panagiotis Christopoulos (RETIRED) gentoo-dev 2009-01-15 12:18:20 UTC
I CCd base-system@ and removed vapier@ cause he will take any mails twice if not. 
Comment 11 Johan Ymerson 2009-01-15 12:42:10 UTC
I can add that baselayout 2 does _not_ seem to have this problem.
Comment 12 Dale Pontius 2009-01-23 18:23:24 UTC
I confirm the same problem, except that I'm mounted "hard,intr" instead of "soft".  One evening I hit "shutdown", turned off the monitor, and went to bed without waiting.  The next morning the system was still hung at "Remounting remaining filesystems read only ..."

My most common problem is that Thunderbird starts gpg-agent, then leaves the daemon running after it exits.  I've set things up so that gpg-agent uses files on a local filesystem instead of my NFS home, and that has been a good workaround.
Comment 13 Pacho Ramos gentoo-dev 2009-03-08 09:57:41 UTC
I am also affected by this. Maybe other option could be wait for getting baselayout-2/openrc stabilized, but I don't know when will be done :-/

Thanks a lot!
Comment 14 Eggert 2009-03-18 22:46:13 UTC
The same problem occours when the remote nfs system is mounted via autofs. The content of my /etc/auto.nfs file:
nfs  -rw,soft,intr,rsize=8192,wsize=8192   192.168.1.100:/srv/nfs
Comment 15 Pacho Ramos gentoo-dev 2009-04-01 22:55:34 UTC
(In reply to comment #6)
> Hello
> i got this problem too when i shutdown from gnome and have mounted /home via
> nfs
> in my opinion the problem is in /etc/init.d/netmount
> 
> in ne stop() section the umount command with option -r on ja busy device causes
> not to remount the share readonly it kills simply the entry in /etc/mtab and
> netmount thinks that everything got ok. but it didn't
> 
> i simply removed read only option then die script can go further and term or
> kill the process using the share. and it works
> 
> my new /etc/init.d/netmount
> 
> stop() {
>         local rcfilesystems=${NET_FS_LIST}
> 
>         rcfilesystems=${rcfilesystems// /,}   # convert to comma-separated
> 
>         ebegin "Unmounting network filesystems"
> -         [[ -z $(umount -art ${rcfilesystems} 2>&1) ]]
> +        [[ -z $(umount -at ${rcfilesystems} 2>&1) ]]
>         eend $? "Failed to simply unmount filesystems"
>         [[ $? == 0 ]] && return 0
> 
In my case I simply add "-f" to the options, it seems to be for nfs as read in "man umount":
 -f     Force unmount (in case of an unreachable NFS system).  (Requires
              kernel 2.1.116 or later.)
and works ok for me


Comment 16 Christopher Hogan 2009-06-30 09:18:40 UTC
One of the NFS servers went flaky. It's mounted automatically on the clients via nfsmount and fstab. When the clients started, nfsmount would start mounting nfs shares, hit the flaky server, and fail. However, it had already mounted all of the other nfs servers.

On shutdown, nfsmount stop was never called. The service manager didn't call it because, as far as it was concerned, it was never started. The clients would hang indefinably at Unmounting Filesystems because the NFS shares were still mounted when the system tried to unmount the local filesystems and the network had already been brought down.

My solution was to add the following to /etc/conf.d/net:

predown() {
        # The default in the script is to test for NFS root and disallow
        # downing interfaces in that case.  Note that if you specify a
        # predown() function you will override that logic.  Here it is, in
        # case you still want it...
        if is_net_fs /; then
                eerror "root filesystem is network mounted -- can't stop ${IFACE}"
                return 1
        fi

        #Make sure network filesystems are umounted
        if grep 'nfs' /proc/mounts; then
                ewarn "NFS Mounts Failed to Unmount."
                umount -aft nfs
        fi

        # Remember to return 0 on success
        return 0
}

This ensured that nfs shares were unmounted, even if the service scripts failed, before the network went down. It did take about 1 minute 20 seconds for the command to complete. However, the systems stopped hanging at shutdown. I latter fixed the flaky nfs server.

This bit of code could be expanded to include other network filesystems.
Comment 17 Pacho Ramos gentoo-dev 2009-07-12 14:46:38 UTC
Maybe adding "-f" to
stop() {
        ebegin "Unmounting NFS filesystems"
        umount -a -t nfs,nfs4
        eend $?
}

would also solve this :-/
Comment 18 Pacho Ramos gentoo-dev 2009-07-20 15:40:55 UTC
Why is nfs unmounting done by nfsmount and netmount initscripts? I think that would be better that nfsmount would do the work.
Comment 19 Pacho Ramos gentoo-dev 2009-09-05 07:45:08 UTC
More news:

I added the following to my local.stop:
umount -a -f -t nfs,nfs4

and it also hangs when server is unreachable :-(, but manually running it seems to work :-/
Comment 20 Pacho Ramos gentoo-dev 2009-09-05 08:26:11 UTC
I think that I've found where is the problem:

When a file is WRITTEN in nfs mounted filesystem "umount -f" fails to work, on the other side, if the file is simply read, there is not problem and umount -f will work ok when server goes down.

Seems that for workarounding this, umount needs to be run with "-l" option       
-l     Lazy unmount. Detach the file system from the file system  hier‐
              archy now, and cleanup all references to the file system as soon
              as it is not busy anymore.

But I doubt if this is the expected behavior or I should report this problem to upstream

Thanks for you help on this :-)
Comment 21 Pacho Ramos gentoo-dev 2009-09-08 17:55:44 UTC
I asked upstream also:
http://www.spinics.net/lists/linux-nfs/msg09317.html
Comment 22 Juergen Rose 2009-09-22 10:00:21 UTC
I habe the same problem with baselayout-2.0.1.
A directory of a remote computer is mounted by auto mounter. The directory is accessed during login of root. Shutdown hangs at "unmounting network filesystems".

I have linux-2.6.3[01]-gentoo,  nfs-utils-1.2.0 and libnsfidmap-0.21-r1. Is there any recommended solution?
Comment 23 Pacho Ramos gentoo-dev 2009-09-22 10:20:03 UTC
I have added:
umount -a -l -f -t nfs,nfs4

to my local.stop ;-)
Comment 24 Juergen Rose 2009-09-22 16:29:58 UTC
The hint of comment 23 seems to work for mee too.
Comment 25 SpanKY gentoo-dev 2009-10-10 22:55:36 UTC
*** Bug 268844 has been marked as a duplicate of this bug. ***
Comment 26 Aljoscha Vollmerhaus 2009-12-29 03:53:07 UTC
I stumbled into this one, too. I get a error on "Remounting remaining filesystems readonly" every now and then, saying the hostname of the NFS server could not be found. 

Weird thing: sometimes it just reboots / halts normally. 
Maybe I´ll find the time to provide more Information later.
Comment 27 Chris Paras 2010-06-20 10:17:46 UTC
Putting umount -a -l -f -t nfs,nfs4 in my local.stop does not work for me.

The problem is still present with net-fs/nfs-utils-1.2.2-r1 
Comment 28 Shrinidhi Rao 2010-08-06 04:11:05 UTC
Any update on this. when can we use baselayout-2 . Even I have the same problem at my workplace . The system hangs at "Unmounting filesystems" when user has open files on an NFS mount. Tried out things that were given here and they fail to work at most times . 
Comment 29 dE 2011-05-11 14:51:29 UTC
Indeed. Gentoo has other problems too rebooting with a mounted network FS on.
Comment 30 Chris Paras 2011-11-26 10:21:04 UTC
With baselayout-2, this doesn't seem to be an issue anymore.
Comment 31 dE 2011-11-26 15:38:04 UTC
I've been using baselayout2 before it's stabilization.
Comment 32 Jeroen Roovers (RETIRED) gentoo-dev 2012-11-28 15:59:15 UTC
*** Bug 445036 has been marked as a duplicate of this bug. ***