Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 31125 - named runscript restart sometimes fails
Summary: named runscript restart sometimes fails
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Brandon Low (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-10-14 08:39 UTC by phceac
Modified: 2004-06-15 01:45 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description phceac 2003-10-14 08:39:39 UTC
race condition due to stop() not waiting for named to finish. A new instance is
started while the old one is still tidying up. Happens on my machine during
heavy load, with lots of swapping.
Akin to bugs 31103, 29932, 28345.

Reproducible: Sometimes
Steps to Reproduce:
1. run memory_eater &  (emerge kdebase will do as well...)
2. /etc/init.d/named restart
3.
Actual Results:  
* Stopping named...           [ ok ]
 * Starting named...          [ !! ]

Expected Results:  
* Stopping named...           [ ok ]
 * Starting named...          [ ok ]

A trivial patch to fix:
--- _named.orig 2003-10-14 12:09:59.000000000 +0200
+++ named       2003-10-14 17:07:15.000000000 +0200
@@ -40,7 +40,7 @@
 stop() {
        ebegin "Stopping named"
        checkconfig || return 2
-       start-stop-daemon --stop --quiet --pidfile $PIDFILE
+       start-stop-daemon  --quiet --retry -TERM/60 --stop  --pidfile $PIDFILE
        eend $?
}
Comment 1 Martin Holzer (RETIRED) gentoo-dev 2004-01-12 13:38:16 UTC
what does -TERM/60  ?
Comment 2 Martin Holzer (RETIRED) gentoo-dev 2004-01-12 13:48:06 UTC
i've added some options to runscript

now in cvs
Comment 3 phceac 2004-01-12 15:44:54 UTC
It's a while since I did this bug, and I found it confusing. I think I ended up using strace to fully understand what was going on. 
In brief,  "--retry -TERM/60 " option will send the SIGTERM, and then wait up to 60 seconds for the process to finish. If it reaches the 60 second timeout, it will return an error. 
If you just use "--retry 60" then I *think* the process just gets SIGKILL before the wait, which isn't very graceful for some daemons, as they don't get a chance to clean up.
It's possible to add some more SIGNAL/timeout pairs onto the sequence, if you need to...
Like I said, my memory is hazy, but "man start-stop-daemon" explains.
Many thanks...
Comment 4 Corey Betka 2004-01-22 09:25:44 UTC
Sorry to reopen, but /etc/init.d/named stop now fails if you are running in chroot. Adding the PIDFILE and KEY path correction logic to the stop function seems to fix this for me, YMMV. Not sure if this needs to be added in other functions or not.

--- /usr/portage/net-dns/bind/files/named.rc6   2004-01-12 16:07:45.000000000 -0
+++ named       2004-01-22 11:13:31.000000000 -0600
@@ -40,6 +40,13 @@
 stop() {
        ebegin "Stopping ${CHROOT:+chrooted }named"
        checkconfig || return 2
+       if [ $CHROOT -a -d $CHROOT ] ; then
+               PIDFILE="${CHROOT}/var/run/named/named.pid"
+               KEY="${CHROOT}/etc/bind/rndc.key"
+       else
+               PIDFILE="/var/run/named/named.pid"
+               KEY="/etc/bind/rndc.key"
+       fi
        start-stop-daemon --stop --quiet --pidfile $PIDFILE \
                --exec /usr/sbin/named -- stop
        eend $?
Comment 5 Ole Tange 2004-06-15 01:45:35 UTC
I applyed the fix. But running 
date;/etc/init.d/named restart;date;/etc/init.d/named restart;date;
still gives errors.