Created attachment 495198 [details, diff] Wait for multiple children in SIGCHLD handler I'm using netplugd and teamd with over 20 interfaces. On boot while the bundle is being configured over all interfaces netplugd leaves some of the interfaces uncofigured. I also noticed that there are some zombies left from netplug and their number matched the number of unconfigured interfaces: ps -A f 4620 ? Ss 0:00 /sbin/netplugd -D -c /etc/netplug/netplugd.conf -p /tmp/netplug.pid 5081 ? Z 0:00 \_ [netplug] <defunct> 5082 ? Z 0:00 \_ [netplug] <defunct> 5111 ? Z 0:00 \_ [netplug] <defunct> After some digging in the netplugd code it turned out that it forks the configuration script (/etc/netplug.d/netplug) and waits for its completion using the SIGCHLD signal. The netplugd uses a state machine to track the state of every interface and the exit of the configuration script is required to be able to get to the inning state of the interface and actually configure it. The problem with the zombie processes is caused by the way the SIGCHLD handler is currently implemented - it waits just for the child which death generated the signal. The catch is that while the handler is being executed SIGCHLD is blocked and if many such signals are delivered to the netplugd all except one will be discarded and the discarded ones will never be waited for. This will leave these children as zombies and the matching interfaces will not be configured. I made a minimal patch which fixes the problem using the implementation at https://www.gnu.org/software/libc/manual/html_node/Merged-Signals.html Briefly the patch changes the SIGCHLD handler to wait for all children. In such way if more children die while the handler is being executed they will be waited for. If a child dies after we stopped waiting but before we exit the handler, the new signal will be left as pending and on the next call of the hanlder the process(es) will be waited for. In such way all children should be reaped.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ca3c5c6b52e60bea1eab05a2b5bc97942aa7dc66 commit ca3c5c6b52e60bea1eab05a2b5bc97942aa7dc66 Author: Lars Wendler <polynomial-c@gentoo.org> AuthorDate: 2019-04-20 23:02:50 +0000 Commit: Lars Wendler <polynomial-c@gentoo.org> CommitDate: 2019-04-20 23:06:17 +0000 sys-apps/netplug: Attempt to fix zombie creation Thanks-to: Lev Danilski <8o55kd+1v8xnjsby8b9k@pokemail.net> Closes: https://bugs.gentoo.org/631316 Package-Manager: Portage-2.3.64, Repoman-2.3.12 Signed-off-by: Lars Wendler <polynomial-c@gentoo.org> .../netplug-1.2.9.2-multi-waitpid-sigchld.patch | 65 +++++++++++++++++++ sys-apps/netplug/netplug-1.2.9.2-r3.ebuild | 73 ++++++++++++++++++++++ 2 files changed, 138 insertions(+)