If start-stop-daemon is called with a non-existing pid file (may happen if the service started fails without creating the expected pid file) start-stop-daemon aborts with SIGTERM like in the following example: # start-stop-daemon --stop --retry 10 --progress --pidfile /run/no-such-file.pid * start-stop-daemon: caught SIGTERM, aborting This then causes any 'stop' or 'restart' of the affected service to fail. And as the service will typically be marked as crashed it can't be started again before being stopped, or, as a sequence of both, be restarted. Thus a crashed service without pid file can never be restarted. This is ugly.
(In reply to Andreas Steinmetz from comment #0) > If start-stop-daemon is called with a non-existing pid file (may happen if > the service started fails without creating the expected pid file) > start-stop-daemon aborts with SIGTERM like in the following example: > > # start-stop-daemon --stop --retry 10 --progress --pidfile > /run/no-such-file.pid > * start-stop-daemon: caught SIGTERM, aborting > > This then causes any 'stop' or 'restart' of the affected service to fail. > And as the service will typically be marked as crashed it can't be started > again before being stopped, or, as a sequence of both, be restarted. Thus a > crashed service without pid file can never be restarted. > > This is ugly. If a service is saying that it is started before the pid file is created, this is an issue in the service script. Be sure you are using the --wait option when starting the service with start-stop-daemon. That will cause start to fail if the pid file is not written in the specified time. I just wrote a service that starts successfully then tries to stop by pointing to an invalid pid file. The stop fails in this case, so the service does not show crashed, it shows started.
I have same problem, but more fatal with 32.1 start-stop-deamon on nonexisting pid file kills random processes # strace start-stop-daemon --stop --pidfile dsdsfsfsd 2>test # grep kill test kill(32468, SIGTERM) = 0 kill(22198, SIGTERM) = 0 kill(22141, SIGTERM) = 0 kill(22113, SIGTERM) = 0 kill(22012, SIGTERM) = 0 kill(21959, SIGTERM) = 0 kill(20699, SIGTERM) = 0 kill(10976, SIGTERM) = 0 kill(9924, SIGTERM) = 0 ... I found commit https://gitweb.gentoo.org/proj/openrc.git/commit/?id=36a0ab9054512ade413226fb8e8b28060045e9a4 witch removed check for correct pid file read ... @@ -328,12 +327,6 @@ int run_stop_schedule(const char *applet, } } - if (pidfile) { - pid = get_pid(applet, pidfile); - if (pid == -1) - return 0; - } - if I add next patch, all work correctlly -- src/rc/rc-schedules.c.org 2017-11-08 16:43:34.973227732 +0100 +++ src/rc/rc-schedules.c 2017-11-08 16:46:16.572146038 +0100 @@ -307,6 +307,10 @@ const char *const *p; bool progressed = false; + // regression - if we had error during pidfile read, don't kill anything + if (pid == -1) + return 0; + if (exec) einfov("Will stop %s", exec); if (pid > 0)
stabillized openrc-0.34.7 still have that behavior, start-stop-daemon still kills itself # start-stop-daemon --stop --pidfile noexistingfile -v * start-stop-daemon: fopen `noexistingfile': No such file or directory * Sending signal 15 to PID 16609 ... * start-stop-daemon: caught SIGTERM, aborting # strace start-stop-daemon --stop --pidfile noexistingfile -v 2>&1 | grep kill read(4, "grep\0--colour=auto\0kill\0", 4096) = 24 kill(16615, SIGTERM) = 0
Note that this problem has manifested itself sporadically since at least 2009: <https://groups.google.com/forum/#!topic/linux.gentoo.user/THkuFO-e5AU> So it has become something of a running sore.
Same here. Command from comment #3 with --test option shows that start-stop-daemon tries to kill all the processes: $ /sbin/start-stop-daemon --stop --pidfile noexistingfile -v --test * start-stop-daemon: fopen `noexistingfile': No such file or directory * Would send signal 15 to PID 27291 * Would send signal 15 to PID 27277 * Would send signal 15 to PID 27270 * Would send signal 15 to PID 27259 * Would send signal 15 to PID 27177 * Would send signal 15 to PID 27138 ... * Would send signal 15 to PID 11 * Would send signal 15 to PID 10 * Would send signal 15 to PID 9 * Would send signal 15 to PID 8 * Would send signal 15 to PID 7 * Would send signal 15 to PID 6 * Would send signal 15 to PID 2 * Would send signal 15 to PID 1
This is fixed in 0.34.9.