Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 639218 - sys-apps/openrc: start-stop-daemon killed all processes matching the name if the pidfile was specified and missing
Summary: sys-apps/openrc: start-stop-daemon killed all processes matching the name if ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: OpenRC (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: OpenRC Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-29 19:04 UTC by Patrick Lauer
Modified: 2017-11-29 22:48 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Lauer gentoo-dev 2017-11-29 19:04:11 UTC
With haproxy-1.7.9, using 3 instances (symlinked init.d/haproxy -> init.d/haproxy-1 etc.)

haproxy is configured to use multiple processes ('nbproc 64' in the haproxy config to use 64 processes)

When stopping e.g. haproxy-1 all haproxy processes die. (This only happens if nbproc > 1, with nbproc 1 it only stops the one process we care aboot)

strace creates a verbose dump of ~100MB size, the obvious part is:

# grep -c kill\( strace-verbose
192

So openrc really actively killed all 3*64 processes!


It seems to be the first iteration of the loop in the haproxy init script - 
strace says:
+ pidfile=/tmp/tmp.XJvxcaOt06

which matches `                pidfile="${_t}" default_stop`
then there's a few calls to service_get_value, and then:


+ start-stop-daemon --stop --exec /usr/sbin/haproxy --pidfile /run/haproxy-1.pid
which ends up as pid 1645.

Now we see:
[pid  1645] readlink("/proc/1/ns/pid", "pid:[4026531836]", 30) = 16
[...]
[pid  1645] stat("/proc/2/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0

a sequential scan over all PIDs, followed by

[pid  1645] fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid  1645] read(4, "32451 (kworker/4:1) S 2 0 0 0 -1"..., 1024) = 166
[pid  1645] close(4)                    = 0
[pid  1645] getdents(3, /* 0 entries */, 32768) = 0
[pid  1645] close(3)                    = 0
[pid  1645] kill(1572, SIGTERM)         = 0

then 192 kill(), and

[pid  1645] kill(1307, SIGTERM)         = 0
[pid  1645] open("/proc", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
[pid  1645] fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid  1645] stat("/proc/self/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid  1645] openat(AT_FDCWD, "/proc/self/status", O_RDONLY) = 4
[pid  1645] fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid  1645] read(4, "Name:\tstart-stop-daem\nUmask:\t002"..., 1024) = 1024
[pid  1645] read(4, "00000000,00000000,00000000,00000"..., 1024) = 270
[pid  1645] read(4, "", 1024)           = 0
[pid  1645] close(4)                    = 0
[pid  1645] stat("/proc/self/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
[pid  1645] readlink("/proc/self/ns/pid", "pid:[4026531836]", 30) = 16
[pid  1645] getdents(3, /* 345 entries */, 32768) = 9040
[pid  1645] stat("/proc/1/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0


another sequential scan of pids.

So for some reason, somehow, start-stop-daemon scans over /proc and grabs all processes with name similar to 'haproxy' and kills them.

I'm still trying to narrow it down, but it's very easy to reproduce with haproxy, nbrpoc > 1 in the config, and using multiple instances.
Comment 1 Patrick Lauer gentoo-dev 2017-11-29 19:24:31 UTC
EINFO_VERBOSE=yes RC_VERBOSE=yes /etc/init.d/haproxy-2 restart
 * Checking /etc/haproxy/haproxy-2.cfg ... [ ok ]
 * Stopping haproxy-2 ...
 * Will stop /usr/sbin/haproxy
 * Will stop PID 8981
 * Will stop processes of `/usr/sbin/haproxy'

 * Sending signal 15 to PID 8981 ... [ ok ]
 * Sending signal 0 to PID 8981 ... [ ok ]
 [ ok ]
 * Stopping haproxy-2 ...
 * start-stop-daemon: fopen `/run/haproxy-2.pid': No such file or directory
 * Will stop /usr/sbin/haproxy
 * Will stop processes of `/usr/sbin/haproxy'

 * Sending signal 15 to PID 9243 ... [ ok ]
 * Sending signal 15 to PID 9242 ... [ ok ]
[... and all 192 processes die here]
Comment 2 William Hubbs gentoo-dev 2017-11-29 22:47:40 UTC
The issue happened if the pidfile was missing.
This is fixed in OpenRC 0.34.11.