Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 455458 - sys-process/psmisc: fuser hangs if processes have files open on dead remotes
Summary: sys-process/psmisc: fuser hangs if processes have files open on dead remotes
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2013-02-04 15:26 UTC by Frieder Bürzele
Modified: 2013-08-17 12:53 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
lazy umount remote filesystems (etc_init.d_netmount_umount_atl_0.11.8.patch,371 bytes, patch)
2013-02-05 13:19 UTC, Frieder Bürzele
Details | Diff
add timeout to fuser command (lib_rc_sh_rc-mount-sh_0.11.8.patch,685 bytes, patch)
2013-02-05 13:22 UTC, Frieder Bürzele
Details | Diff
add timeout to fuser command binary "timeout" from coreutils (lib_rc_sh_rc-mount-sh_alternative_usr-bin-timeout_0.11.8.patch,516 bytes, patch)
2013-02-05 13:26 UTC, Frieder Bürzele
Details | Diff
add timeout to fuser command binary "timeout" from coreutils -- fixed (lib_rc_sh_rc-mount-sh_alternative_usr-bin-timeout_0.11.8.patch,519 bytes, patch)
2013-02-06 12:18 UTC, Frieder Bürzele
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frieder Bürzele 2013-02-04 15:26:17 UTC
stopping the netmount script on shutdown. cifs shares won't umount if remote server is dead. I've added "umount -atl" so it will umount but hangs on fuser.

--- a/etc/init.d/netmount
+++ b/etc/init.d/netmount
@@ -80,7 +80,7 @@ stop()
 		fs="$fs${fs:+,}$x"
 	done
 	if [ -n "$fs" ]; then
-		umount -at $fs || eerror "Failed to simply unmount filesystems"
+		umount -at $fs || umount -atl $fs || eerror "Failed to simply unmount filesystems"
 	fi
 
 	eindent


I've added a timeout to kill fuser after 60 seconds.

+++ /lib/rc/sh/rc-mount.sh   2013-02-04 15:06:45.761257432 +0100
@@ -41,6 +41,17 @@
                retry=4 # Effectively TERM, sleep 1, TERM, sleep 1, KILL, sleep 1
                while ! LC_ALL=C $cmd "$mnt" 2>/dev/null; do
                        if type fuser >/dev/null 2>&1; then
+                               timeout=60
+                               while true;do
+                                       sleep 3s;
+                                       if [ "$timeout" -le 0 ];then
+                                               pid_of_user="`ps -A -o pid,comm,args|grep "fuser $f_opts "$mnt""|awk '$2 !~ /grep/ {print $1}'`"
+                                               [ -n "$pid_of_user" ] && kill -KILL "$pid_of_user"
+                                               break
+                                       fi
+                                       let timeout-=3
+                               done &
+                                               [[ $SPAMD_OPTS =~ \-u( |)([^\ ]*)  ]] && USER=${BASH_REMATCH[2]}
                                pids="$(fuser $f_opts "$mnt" 2>/dev/null)"
                        fi
                        case " $pids " in


The whole purpose is that no matter what I need to shut down the server in case a poweroutage occurs and all the servers should be shutdown regardless if the remote server still running or already shutdown.

Reproducible: Always

Steps to Reproduce:
1. mount remote cifs share
2. edit files in mounted share eg with vim
3. patch netmount with above "mount -ats so it will not hang on umount any longer
4. /etc/init.d/netmount stop --debug


Expected Results:  
stop the service and continue to shut down.
Comment 1 Frieder Bürzele 2013-02-04 15:28:59 UTC
sorry obviously this line went accidently into the patch
[[ $SPAMD_OPTS =~ \-u( |)([^\ ]*)  ]] && USER=${BASH_REMATCH[2]}
Comment 2 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-02-04 19:47:58 UTC
Please attach the patches such that they can be downloaded, also verify that they apply such that they don't have to be manually corrected.
Comment 3 Frieder Bürzele 2013-02-05 13:19:56 UTC
Created attachment 337998 [details, diff]
lazy umount remote filesystems

add umount -atl to /etc/init.d/netmount
Comment 4 Frieder Bürzele 2013-02-05 13:22:22 UTC
Created attachment 338000 [details, diff]
add timeout to fuser command

timeout to fuser command
Comment 5 Frieder Bürzele 2013-02-05 13:26:58 UTC
Created attachment 338002 [details, diff]
add timeout to fuser command binary "timeout" from coreutils

(In reply to comment #4)
> Created attachment 338000 [details, diff] [details, diff]
> add timeout to fuser command
> 
> timeout to fuser command

alternatively patch to patch mentioned in comment #4

This patch uses timeout from coreutils to accomplish the fuser timeout.
This is a much simpler solution but depends on coreutils.
Comment 6 William Hubbs gentoo-dev 2013-02-05 16:33:58 UTC
I was just informed that we should encourage users to add "nofail" to
the mount options in fstab for network file systems.

If you do this, how does that affect this bug?
Comment 7 Frieder Bürzele 2013-02-05 19:28:45 UTC
(In reply to comment #6)
> I was just informed that we should encourage users to add "nofail" to
> the mount options in fstab for network file systems.
> 
> If you do this, how does that affect this bug?


The share mounted is a cifs share. I mounted it with the nofail option than tested it again -- same problem.
Comment 8 William Hubbs gentoo-dev 2013-02-05 20:34:13 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > I was just informed that we should encourage users to add "nofail" to
> > the mount options in fstab for network file systems.
> > 
> > If you do this, how does that affect this bug?
> 
> 
> The share mounted is a cifs share. I mounted it with the nofail option than
> tested it again -- same problem.

Sorry, let me rephrase the question.

If you mount all of your network file systems with the nofail option and remove the lazy unmount option you added to netmount, the netmount script should terminate successfully regardless of the status of the remote host. Does this happen?
Comment 9 Frieder Bürzele 2013-02-05 22:07:57 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > I was just informed that we should encourage users to add "nofail" to
> > > the mount options in fstab for network file systems.
> > > 
> > > If you do this, how does that affect this bug?
> > 
> > 
> > The share mounted is a cifs share. I mounted it with the nofail option than
> > tested it again -- same problem.
> 
> Sorry, let me rephrase the question.
> 
> If you mount all of your network file systems with the nofail option and
> remove the lazy unmount option you added to netmount, the netmount script
> should terminate successfully regardless of the status of the remote host.
> Does this happen?

it will terminate with or without nofail. But if there are open files on this share fuser tries to terminate this processes owning the files and get stuck -- so the script never finishes.
Comment 10 Frieder Bürzele 2013-02-06 12:18:47 UTC
Created attachment 338070 [details, diff]
add timeout to fuser command binary "timeout" from coreutils -- fixed

fixed missing value for -k
Comment 11 William Hubbs gentoo-dev 2013-03-12 16:47:04 UTC
I spoke to Mike Frysinger, our base system lead, and he seems to think
the cleanest solution would be to add timeout functionality to
fuser, and I agree with him, so I am assigning this to base-system.
Comment 12 William Hubbs gentoo-dev 2013-03-12 17:01:34 UTC
I will, however, add a patch to OpenRc that is similar to the one above
but allows the user to configure the length of the timeout.
Comment 13 William Hubbs gentoo-dev 2013-03-12 18:12:56 UTC
I have added a patch in commit 6794441 of OpenRC to handle this
temporarily. However, the real fix should go in fuser; maybe adding some
kind of timeout capability so we don't have to use an external program
to time it out.