Manpage start-stop-daemon(8): -p, --pidfile pidfile ... When stopping we only stop the *pid(s)* listed in the pidfile. Emphasis mine, for multiple PIDs. This is crucial for haproxy with the nbprocs option, where multiple PIDs are written to the pidfile (see bug 584410). static pid_t get_pid(const char *pidfile) only returns a SINGLE pid, instead of all of the PIDs. I think it should be modified to return RC_PIDLIST instead, and all of the usages of it should operate on all elements of the list.
How are the PIDs delimited? Newlines? Spaces? Both? Is there any documented format for that?
haproxy writes one pid per line. If you don't think it's appropriate, please correct the documentation to take out the plural form, and I'll find another solution for haproxy (probably using the systemd wrapper's generic support).
It just seems a little risky to make any assumptions about a file format that is not well-defined beyond a single pid at offset 0.
While there are other programs that have multiple pids, it seems that the --pidfile implementations NEVER supported multiple PIDs. The documentation however has used the plural form for 15+ years. Debian's original start-stop-daemon.pl circa 1996 only did a single pidfile, but the early docs used the plural form for processes from the pidfile, possibly as a linguistic byproduct (possibly not english-first-language documentation author). It got rewritten as public-domain C, and later imported to Gentoo in 2001. https://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-src/rc-scripts/sbin/start-stop-daemon.c?hideattic=0&revision=1.1&view=markup OpenRC did lose the original attribution at some point. Proposed resolution: change the manpage to use the singular form and description that only the first number will be used.
https://github.com/openrc/openrc/commit/3552f0a This commit fixes the man page. The attribution of start-stop-daemon.c as originally imported into OpenRC is two-clause bsd (C) Roy Marples. https://github.com/openrc/openrc/commit/ac21d7
Shouldn't the implementation be fixed to follow the 15-year old documentation instead? PID separators would "evidently" the shell's word separators. (ASCII whitespace) I propose the same behaviour as this (with the default IFS): for pid in $(<pidfile);do kill "$pid"; done
(In reply to calimeroteknik from comment #6) > Shouldn't the implementation be fixed to follow the 15-year old > documentation instead? How is that preferable to updating the documentation to reflect reality? I can only reiterate comment 3: it is risky to change the behavior here, since there is no documented standard to following regarding the format of PID files.
(In reply to Mike Gilbert from comment #7) > (In reply to calimeroteknik from comment #6) > > Shouldn't the implementation be fixed to follow the 15-year old > > documentation instead? > > How is that preferable to updating the documentation to reflect reality? Indeed, changes in default behaviour must be considered with extreme caution. We have two options: Documentation updated, all service file maintainers need to be informed (sometimes the service file is even from upstream), those who need multiple PID support each roll their own implementation for the stop() function. Implementation updated to match the documentation, no need to inform maintainers (who referred to the docs to write their service files), and apparently no breakage to anticipate (see below), only desirable 'fixage'. I do believe that this second option is preferrable, considering the following: > I can only reiterate comment 3: it is risky to change the behavior here, > since there is no documented standard to following regarding the format of > PID files. Precisely, and that means there is no documented standard to determine what the "first" PID in the pidfile is, either. We can dig out the current de-facto format. It turns out fscanf(f, "%d", &pid) is used: https://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-src/rc-scripts/sbin/start-stop-daemon.c?hideattic=0&revision=1.1&view=markup#l599 fscanf will skip any whitespace (which means the PID does not actually have to be at offset 0); it will accept a minus sign; and it will consider it has found its integer when it sees anything else than a digit. If I'm not mistaken, pidfiles usually contain one PID, so for most services it would change nothing to support several. For services that write several PIDs, supporting them (say, separated by whitespace) would only cause 'fixage' if anything for e.g. haproxy. If some weird service tries to use commas or whatever, it will stay broken the same as before, since fscanf() will stop finding integers as soon as it encounters garbage. Is there any case of possible breakage this could logically introduce?
As an example of something using multiple PIDS and could be problematic, see: https://issues.apache.org/jira/browse/FLINK-2302 The format of each line is: "PID description\n" The description example given is 'tm1', 'tm2'. If the description is just a number, say '1', we must be VERY VERY careful not to parse that part (sending SIGTERM to PID 1 would be bad...). If there is an app that writes an integer other than a PID to the second line, that could also cause problems. Furthermore, I think documentation is the correct answer, because Debian's start-stop-daemon, that the Gentoo one is based on, never actually implemented multiple PIDs in a single pidfile, AND fixed their documentation to use the singular format. If a single app wants to support multiple PIDs in a pidfile, they can implement it themselves like this (this wouldn't work for the description case above): t=$(mktemp) for pid in $(cat $PIDFILE) ; do echo $pid > $t start-stop-daemon --pidfile $t ... done rm -f $t Similarly, if there
(In reply to Robin Johnson from comment #9) > As an example of something using multiple PIDS and could be problematic, see: > https://issues.apache.org/jira/browse/FLINK-2302 > > The format of each line is: > "PID description\n" > > If the description is just a number, say '1', we must be VERY VERY careful > not to parse that part (sending SIGTERM to PID 1 would be bad...). That was an excellent counter-example, which is sufficient in itself. This could give inspiration for a standard that I think should be made; more on this later in this post. > Furthermore, I think documentation is the correct answer, because Debian's > start-stop-daemon, that the Gentoo one is based on, never actually > implemented multiple PIDs in a single pidfile, AND fixed their documentation > to use the singular format. Agreed. If we want to reach a standard, everybody needs to do the same thing. However, I find the current resolution of this bug unsatisfactory. Just fixing the docs is something nobody will notice, and trying to tell maintainers to check their services is tedious. Therefore: ## Proposal 1/2 ## start-stop-deamon must not change its behaviour at all (what it *does*). However, it should *print a warning* if the pidfile contains anything else than a PID and ASCII whitespace, instead of silently ignoring any garbage (as it currently does). ## That way, users and maintainers will notice the warning and fix the services. I believe this alone is entirely satisfactory, as opposed to . > > If a single app wants to support multiple PIDs in a pidfile, they can > implement it themselves like this (this wouldn't work for the description > case above): [redacted for concision] > > Similarly, if there [assuming this was what you meant to reference:] > If there is an app that writes an integer other than a PID to the second > line, that could also cause problems. ## Proposal 2/2: ## Document a standard for pidfiles with multiple PIDs, and implement it in start-stop-daemon as a new option. In order for most services to not have to roll their own implementation, and since pidfiles with one PID per line are common, the following sounds sane: One PID at the beginning of each line (after possible ASCII horizontal whitespace), and an optional description after it. The description is separated from the PID by ASCII horizontal whitespace. Print an error and exit with an error code if the pidfile is not in that format. ## pidfiles not following that format will require manual implementation, but that should hopefully weed out the vast majority of multi-PID pidfiles. I want to put emphasis on proposal 1/2. (warn on fishy pidfile formats)
(In reply to calimeroteknik from comment #10) I think you are turning a molehill into a mountain here. We already know that PID files containing multiple PIDs are quite rare, and altering the OpenRC code to check for it seems excessive to me.
I think the most this change warrants is an entry in any release notes that are published.
Sorry, I tend to overdo things. Printing a warning when the pid files contains more than just the expected PID sounds good, though. Can we focus on that?
(In reply to calimeroteknik from comment #13) There are cases where it is appropriate to grab the first PID only, and discard the rest. For example, postgresql writes the PID if its main daemon first, followed by a bunch of other postgres-related information. There is no need to warn about this.
(In reply to Mike Gilbert from comment #14) > (In reply to calimeroteknik from comment #13) > > There are cases where it is appropriate to grab the first PID only, and > discard the rest. > > For example, postgresql writes the PID if its main daemon first, followed by > a bunch of other postgres-related information. > > There is no need to warn about this. That's an argument for providing a way to disable the warning, for valid usecases. But a warning presumably would've helped Robin identify his problem sooner.