Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 31125

Summary: named runscript restart sometimes fails
Product: Gentoo Linux Reporter: phceac
Component: Current packagesAssignee: Brandon Low (RETIRED) <lostlogic>
Status: RESOLVED FIXED    
Severity: critical CC: mholzer
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description phceac 2003-10-14 08:39:39 UTC
race condition due to stop() not waiting for named to finish. A new instance is
started while the old one is still tidying up. Happens on my machine during
heavy load, with lots of swapping.
Akin to bugs 31103, 29932, 28345.

Reproducible: Sometimes
Steps to Reproduce:
1. run memory_eater &  (emerge kdebase will do as well...)
2. /etc/init.d/named restart
3.
Actual Results:  
* Stopping named...           [ ok ]
 * Starting named...          [ !! ]

Expected Results:  
* Stopping named...           [ ok ]
 * Starting named...          [ ok ]

A trivial patch to fix:
--- _named.orig 2003-10-14 12:09:59.000000000 +0200
+++ named       2003-10-14 17:07:15.000000000 +0200
@@ -40,7 +40,7 @@
 stop() {
        ebegin "Stopping named"
        checkconfig || return 2
-       start-stop-daemon --stop --quiet --pidfile $PIDFILE
+       start-stop-daemon  --quiet --retry -TERM/60 --stop  --pidfile $PIDFILE
        eend $?
}
Comment 1 Martin Holzer (RETIRED) gentoo-dev 2004-01-12 13:38:16 UTC
what does -TERM/60  ?
Comment 2 Martin Holzer (RETIRED) gentoo-dev 2004-01-12 13:48:06 UTC
i've added some options to runscript

now in cvs
Comment 3 phceac 2004-01-12 15:44:54 UTC
It's a while since I did this bug, and I found it confusing. I think I ended up using strace to fully understand what was going on. 
In brief,  "--retry -TERM/60 " option will send the SIGTERM, and then wait up to 60 seconds for the process to finish. If it reaches the 60 second timeout, it will return an error. 
If you just use "--retry 60" then I *think* the process just gets SIGKILL before the wait, which isn't very graceful for some daemons, as they don't get a chance to clean up.
It's possible to add some more SIGNAL/timeout pairs onto the sequence, if you need to...
Like I said, my memory is hazy, but "man start-stop-daemon" explains.
Many thanks...
Comment 4 Corey Betka 2004-01-22 09:25:44 UTC
Sorry to reopen, but /etc/init.d/named stop now fails if you are running in chroot. Adding the PIDFILE and KEY path correction logic to the stop function seems to fix this for me, YMMV. Not sure if this needs to be added in other functions or not.

--- /usr/portage/net-dns/bind/files/named.rc6   2004-01-12 16:07:45.000000000 -0
+++ named       2004-01-22 11:13:31.000000000 -0600
@@ -40,6 +40,13 @@
 stop() {
        ebegin "Stopping ${CHROOT:+chrooted }named"
        checkconfig || return 2
+       if [ $CHROOT -a -d $CHROOT ] ; then
+               PIDFILE="${CHROOT}/var/run/named/named.pid"
+               KEY="${CHROOT}/etc/bind/rndc.key"
+       else
+               PIDFILE="/var/run/named/named.pid"
+               KEY="/etc/bind/rndc.key"
+       fi
        start-stop-daemon --stop --quiet --pidfile $PIDFILE \
                --exec /usr/sbin/named -- stop
        eend $?
Comment 5 Ole Tange 2004-06-15 01:45:35 UTC
I applyed the fix. But running 
date;/etc/init.d/named restart;date;/etc/init.d/named restart;date;
still gives errors.