When you have nagios running, it makes few tens of processes working together (plugins, notify commands and main nagios processes). Unfortunately after issuing /etc/init.d/nagios stop I have still them running: nagios 9065 0.0 0.2 4584 2916 ? Ss Nov03 1:05 /usr/nagios/bin/nagios -d /etc/nagios/nagios.cfg nagios 22868 0.0 0.2 4588 3008 ? S 14:48 0:00 /usr/nagios/bin/nagios -d /etc/nagios/nagios.cfg nagios 22869 0.0 0.0 1344 476 ? S 14:48 0:00 /usr/nagios/libexec//check_ping -H a.b.c.d -w 50.0,5% -c 100.0,20% -p 20 nagios 22874 0.0 0.0 1496 464 ? S 14:48 0:00 /bin/ping -n -U -c 20 a.b.c.d nagios 22913 0.0 0.2 4588 3008 ? S 14:48 0:00 /usr/nagios/bin/nagios -d /etc/nagios/nagios.cfg nagios 22914 0.0 0.0 1344 476 ? S 14:48 0:00 /usr/nagios/libexec//check_ping -H a.b.c.d -w 100.0,20% -c 300.0,50% -p 20 nagios 22916 0.0 0.0 1492 460 ? S 14:48 0:00 /bin/ping -n -U -c 20 a.b.c.d And so on... You need to wait some time and after that it leaves only one process: nagios 9065 0.0 0.2 4584 2916 ? Ss Nov03 1:05 /usr/nagios/bin/nagios -d /etc/nagios/nagios.cfg And then it respawns plugins again. Shouldn't be all processes killed after some time? Let say - wait 10 seconds and kill everything belonging to nagios.
No. It should be very controlled, so killall can't be used. Can you make this change in /etc/init.d/nagios: insert this line between 41 and 42 (After stop() ): einfo "Nagios PID: $(< /var/nagios/nagios.lock)" This will show you the PID it is stopping. Then when you stop - update the report if it correspond to the process which is still running or not. I've had this problem myself in the original /etc/init.d/ script, but not on gentoo.
It was lines 21 and 22 :) I had lock file for nagios in other place, now changed and watching what happens: root@gollum alchemyx # /etc/init.d/nagios stop * Nagios PID: 30077 * Stopping nagios... [ ok ] root@gollum alchemyx # dmesg root@gollum alchemyx # ps aux | grep nagios nagios 773 0.0 0.3 4592 3112 ? S 23:12 0:00 /usr/nagios/bin/nagios -d /etc/nagios/nagios.cfg nagios 777 0.0 0.3 4592 3112 ? S 23:12 0:00 /usr/nagios/bin/nagios -d /etc/nagios/nagios.cfg And few more running processes. And after few moments they dissaper. So now is everything fine. My fault then, sorry! By the way - why script wasn't complaining about missing lock?
see bug 72145 I've just submitted...
Great, good idea. Thank you!