For a while now I have seen that a SIGSTOP of an emerge process fails to stop all children - various new jobs are still spawned despite the parent being stopped. When I set FEATURES=-pid-sandbox, the problem goes away.
It's triggered by this commit, since there's no way to propagate SIGSTOP to the new session that's created: https://gitweb.gentoo.org/proj/portage.git/commit/?id=37e4dc5ae842afa03849a47b123345906fdd81a2 commit 37e4dc5ae842afa03849a47b123345906fdd81a2 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2019-01-22 07:17:18 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2019-01-23 04:47:25 +0000 pid-sandbox: pid-ns-init setsid support (bug 675870) Use setsid to isolate the parent process from signals sent to the process group, and forward signals to the entire process group with kill(0, signum). Bug: https://bugs.gentoo.org/675870 Signed-off-by: Zac Medico <zmedico@gentoo.org> bin/pid-ns-init | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
(In reply to Zac Medico from comment #1) > It's triggered by this commit, since there's no way to propagate SIGSTOP to > the new session that's created: > > https://gitweb.gentoo.org/proj/portage.git/commit/ > ?id=37e4dc5ae842afa03849a47b123345906fdd81a2 Ah yes. I found that code to be suspect, particularly because it mentions SIGSTOP. :) The comments there had me confused as to which process the parent process is (emerge, the ebuild's phases, something else?).
In that commit message, parent process refers to the emerge process.
*** Bug 751493 has been marked as a duplicate of this bug. ***
The same applies to SIGTSTP, which means Ctrl+Z doesn’t properly suspend an emerge run.
(In reply to Christopher Head from comment #5) > The same applies to SIGTSTP, which means Ctrl+Z doesn’t properly suspend an > emerge run. We can fix it to forward SIGTSTP to the process group created by the setsid call.
Patch posted for review: https://archives.gentoo.org/gentoo-portage-dev/message/e475a85b045961346ac5032249eac7c2 https://github.com/gentoo/portage/pull/631
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=8b7edb648814cc53774c5841e45d8cc325bcef6e commit 8b7edb648814cc53774c5841e45d8cc325bcef6e Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-10-28 08:34:51 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-11-01 21:45:01 +0000 pid-sandbox: Forward SIGTSTP and SIGCONT (bug 704498) For correct operation of Ctrl+Z, forward SIGTSTP and SIGCONT to all sandboxed pids. Fixes: 37e4dc5ae842 ("pid-sandbox: pid-ns-init setsid support (bug 675870)") Bug: https://bugs.gentoo.org/704498 Signed-off-by: Zac Medico <zmedico@gentoo.org> bin/pid-ns-init | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
Since it's impossible to handle SIGSTOP, I suppose we can consider this fixed by the SIGTSTP handling.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=43cc36f055d6c4fd9c8761ecb50f57027180a1dd commit 43cc36f055d6c4fd9c8761ecb50f57027180a1dd Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-11-02 01:57:51 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-11-02 01:58:00 +0000 sys-apps/portage: Bump to version 3.0.9 #199722 Enable QA Notice for deprecated hasq/useq in *rm phases #704498 Fix pid-sandbox to handle Ctrl+Z correctly #752066 Add emerge --quickpkg-direct-root option #752147 Fix make.conf to expand special *ROOT variables #752153 Add @golang-rebuild package set Bug: https://bugs.gentoo.org/752168 Bug: https://bugs.gentoo.org/199722 Bug: https://bugs.gentoo.org/704498 Bug: https://bugs.gentoo.org/752066 Bug: https://bugs.gentoo.org/752147 Bug: https://bugs.gentoo.org/752153 Package-Manager: Portage-3.0.9, Repoman-3.0.2 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-3.0.9.ebuild | 267 ++++++++++++++++++++++++++++++++++ 2 files changed, 268 insertions(+)
Are you sure this is fully fixed? I was using Portage 3.0.9 to build dev-lang/rust-1.47.0-r2. I pressed Ctrl+Z and got back to a shell, but there were still rustc processes running, using 100% CPU, much later. I do run with EMERGE_DEFAULT_OPTS="--jobs 4 --load-average 8" and MAKEOPTS="--jobs 4 --load-average 8" in case that makes a difference.
(In reply to Christopher Head from comment #11) > Are you sure this is fully fixed? I was using Portage 3.0.9 to build > dev-lang/rust-1.47.0-r2. I pressed Ctrl+Z and got back to a shell, but there > were still rustc processes running, using 100% CPU, much later. I do run > with EMERGE_DEFAULT_OPTS="--jobs 4 --load-average 8" and MAKEOPTS="--jobs 4 > --load-average 8" in case that makes a difference. I can see that the first pid-ns-init process does not enter the "stopped by job control signal" state, nor do any of the processes below it in the pid namespace.
FWIW, I still get this on Portage 3.0.38.1
*** Bug 913487 has been marked as a duplicate of this bug. ***
I'm a bit confused by the fix above - it only forwards to one process (the main one). Perhaps it should forward to the PG?
well, in fairness, the most reliable approach would send to the whole session or cgroup (e.g. by using freezing/thawing, if available). I'm not sure if we need to open a new PTY to do that.. I always get confused by the session/process/tty/... interactions.
(In reply to Arsen Arsenović from comment #15) > I'm a bit confused by the fix above - it only forwards to one process (the > main one). Perhaps it should forward to the PG? on arsen's suggestion, tried: ``` diff --git a/bin/pid-ns-init b/bin/pid-ns-init index 4ea234d3a..d8d337136 100644 --- a/bin/pid-ns-init +++ b/bin/pid-ns-init @@ -41,7 +41,7 @@ def forward_sigtstp_sigcont(pid, signum, frame): # being called recursively, since the signal will also be sent # to the current process. handler = signal.signal(signum, signal.SIG_DFL) - os.kill(pid, signum) + os.killpg(pid, signum) if handler is not None: signal.signal(signum, handler) ``` which didn't help. nor did os.killpg(os.getpgid(pid), signum).
Interesting. In fact, I don't seem to be able to stop subprocesses of emerge via SIGTSTP via htop(1) either. SIGSTOP works to stop them but it obviously can't be forwarded.
(In reply to Arsen Arsenović from comment #15) > I'm a bit confused by the fix above - it only forwards to one process (the > main one). Perhaps it should forward to the PG? It's actually a bit more complex than that. We have 2 pid-ns-init processes. Lets say PID 123 is the parent, and PID 124 is the "main child" (indicated by main_child_pid). Here's what is supposed to happen given the current code: PID 123 receives SIGTSTP from the kernel when ^Z is pressed. PID 123 calls kill(124, SIGTSTP). PID 124 received SIGTSTP from PID 123. PID 124 calls kill(0, SIGTSTP). Reading the kill(2) man page, passing 0 as the first parameter should cause the signal to be sent to all processes in the current process group. I actually think this sequence could be simplified to this: PID 123 receives SIGTSTP from the kernel when ^Z is pressed. PID 123 calls killpg(124, SIGTSTP). We know that PID 124 must be the leader of a process group because it has called setsid and is acting as PID 1 in the new PID namespace. However, I'm not sure this change would fix the original issue of not all child processes receiving the signal. I still need to debug it further to figure out is going on there.