Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 704498 - FEATURES=pid-sandbox - SIGTSTP does not stop all children
Summary: FEATURES=pid-sandbox - SIGTSTP does not stop all children
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
: 751493 913487 (view as bug list)
Depends on: 675870
Blocks: 910332
  Show dependency tree
 
Reported: 2020-01-01 20:34 UTC by Jeroen Roovers (RETIRED)
Modified: 2024-03-26 16:18 UTC (History)
9 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeroen Roovers (RETIRED) gentoo-dev 2020-01-01 20:34:41 UTC
For a while now I have seen that a SIGSTOP of an emerge process fails to stop all children - various new jobs are still spawned despite the parent being stopped. When I set FEATURES=-pid-sandbox, the problem goes away.
Comment 1 Zac Medico gentoo-dev 2020-01-01 23:41:08 UTC
It's triggered by this commit, since there's no way to propagate SIGSTOP to the new session that's created:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=37e4dc5ae842afa03849a47b123345906fdd81a2

commit 37e4dc5ae842afa03849a47b123345906fdd81a2
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2019-01-22 07:17:18 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2019-01-23 04:47:25 +0000

    pid-sandbox: pid-ns-init setsid support (bug 675870)
    
    Use setsid to isolate the parent process from signals sent
    to the process group, and forward signals to the entire
    process group with kill(0, signum).
    
    Bug: https://bugs.gentoo.org/675870
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 bin/pid-ns-init | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)
Comment 2 Jeroen Roovers (RETIRED) gentoo-dev 2020-01-02 08:41:43 UTC
(In reply to Zac Medico from comment #1)
> It's triggered by this commit, since there's no way to propagate SIGSTOP to
> the new session that's created:
> 
> https://gitweb.gentoo.org/proj/portage.git/commit/
> ?id=37e4dc5ae842afa03849a47b123345906fdd81a2

Ah yes. I found that code to be suspect, particularly because it mentions SIGSTOP. :)

The comments there had me confused as to which process the parent process is (emerge, the ebuild's phases, something else?).
Comment 3 Zac Medico gentoo-dev 2020-01-02 10:23:47 UTC
In that commit message, parent process refers to the emerge process.
Comment 4 Zac Medico gentoo-dev 2020-10-27 21:24:49 UTC
*** Bug 751493 has been marked as a duplicate of this bug. ***
Comment 5 Christopher Head 2020-10-28 01:29:24 UTC
The same applies to SIGTSTP, which means Ctrl+Z doesn’t properly suspend an emerge run.
Comment 6 Zac Medico gentoo-dev 2020-10-28 07:56:49 UTC
(In reply to Christopher Head from comment #5)
> The same applies to SIGTSTP, which means Ctrl+Z doesn’t properly suspend an
> emerge run.

We can fix it to forward SIGTSTP to the process group created by the setsid call.
Comment 8 Larry the Git Cow gentoo-dev 2020-11-01 21:46:15 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=8b7edb648814cc53774c5841e45d8cc325bcef6e

commit 8b7edb648814cc53774c5841e45d8cc325bcef6e
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-10-28 08:34:51 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-11-01 21:45:01 +0000

    pid-sandbox: Forward SIGTSTP and SIGCONT (bug 704498)
    
    For correct operation of Ctrl+Z, forward SIGTSTP and SIGCONT
    to all sandboxed pids.
    
    Fixes: 37e4dc5ae842 ("pid-sandbox: pid-ns-init setsid support (bug 675870)")
    Bug: https://bugs.gentoo.org/704498
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 bin/pid-ns-init | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)
Comment 9 Zac Medico gentoo-dev 2020-11-01 21:51:29 UTC
Since it's impossible to handle SIGSTOP, I suppose we can consider this fixed by the SIGTSTP handling.
Comment 10 Larry the Git Cow gentoo-dev 2020-11-02 02:01:29 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=43cc36f055d6c4fd9c8761ecb50f57027180a1dd

commit 43cc36f055d6c4fd9c8761ecb50f57027180a1dd
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-11-02 01:57:51 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-11-02 01:58:00 +0000

    sys-apps/portage: Bump to version 3.0.9
    
     #199722 Enable QA Notice for deprecated hasq/useq in *rm phases
     #704498 Fix pid-sandbox to handle Ctrl+Z correctly
     #752066 Add emerge --quickpkg-direct-root option
     #752147 Fix make.conf to expand special *ROOT variables
     #752153 Add @golang-rebuild package set
    
    Bug: https://bugs.gentoo.org/752168
    Bug: https://bugs.gentoo.org/199722
    Bug: https://bugs.gentoo.org/704498
    Bug: https://bugs.gentoo.org/752066
    Bug: https://bugs.gentoo.org/752147
    Bug: https://bugs.gentoo.org/752153
    Package-Manager: Portage-3.0.9, Repoman-3.0.2
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/Manifest             |   1 +
 sys-apps/portage/portage-3.0.9.ebuild | 267 ++++++++++++++++++++++++++++++++++
 2 files changed, 268 insertions(+)
Comment 11 Christopher Head 2020-12-01 20:43:25 UTC
Are you sure this is fully fixed? I was using Portage 3.0.9 to build dev-lang/rust-1.47.0-r2. I pressed Ctrl+Z and got back to a shell, but there were still rustc processes running, using 100% CPU, much later. I do run with EMERGE_DEFAULT_OPTS="--jobs 4 --load-average 8" and MAKEOPTS="--jobs 4 --load-average 8" in case that makes a difference.
Comment 12 Zac Medico gentoo-dev 2020-12-02 00:58:13 UTC
(In reply to Christopher Head from comment #11)
> Are you sure this is fully fixed? I was using Portage 3.0.9 to build
> dev-lang/rust-1.47.0-r2. I pressed Ctrl+Z and got back to a shell, but there
> were still rustc processes running, using 100% CPU, much later. I do run
> with EMERGE_DEFAULT_OPTS="--jobs 4 --load-average 8" and MAKEOPTS="--jobs 4
> --load-average 8" in case that makes a difference.

I can see that the first pid-ns-init process does not enter the "stopped by job control signal" state, nor do any of the processes below it in the pid namespace.
Comment 13 Arsen Arsenović gentoo-dev 2022-11-21 09:32:05 UTC
FWIW, I still get this on Portage 3.0.38.1
Comment 14 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-08 21:41:04 UTC
*** Bug 913487 has been marked as a duplicate of this bug. ***
Comment 15 Arsen Arsenović gentoo-dev 2023-09-08 21:49:36 UTC
I'm a bit confused by the fix above - it only forwards to one process (the main one).  Perhaps it should forward to the PG?
Comment 16 Arsen Arsenović gentoo-dev 2023-09-08 22:02:28 UTC
well, in fairness, the most reliable approach would send to the whole session or cgroup (e.g. by using freezing/thawing, if available).  I'm not sure if we need to open a new PTY to do that..  I always get confused by the session/process/tty/... interactions.
Comment 17 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-08 22:21:12 UTC
(In reply to Arsen Arsenović from comment #15)
> I'm a bit confused by the fix above - it only forwards to one process (the
> main one).  Perhaps it should forward to the PG?

on arsen's suggestion, tried:
```
diff --git a/bin/pid-ns-init b/bin/pid-ns-init
index 4ea234d3a..d8d337136 100644
--- a/bin/pid-ns-init
+++ b/bin/pid-ns-init
@@ -41,7 +41,7 @@ def forward_sigtstp_sigcont(pid, signum, frame):
         # being called recursively, since the signal will also be sent
         # to the current process.
         handler = signal.signal(signum, signal.SIG_DFL)
-    os.kill(pid, signum)
+    os.killpg(pid, signum)
     if handler is not None:
         signal.signal(signum, handler)

```

which didn't help. nor did os.killpg(os.getpgid(pid), signum).
Comment 18 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2023-09-09 02:16:36 UTC
Interesting.  In fact, I don't seem to be able to stop subprocesses of emerge via SIGTSTP via htop(1) either.  SIGSTOP works to stop them but it obviously can't be forwarded.
Comment 19 Mike Gilbert gentoo-dev 2023-09-09 14:37:06 UTC
(In reply to Arsen Arsenović from comment #15)
> I'm a bit confused by the fix above - it only forwards to one process (the
> main one).  Perhaps it should forward to the PG?

It's actually a bit more complex than that. We have 2 pid-ns-init processes.

Lets say PID 123 is the parent, and PID 124 is the "main child" (indicated by main_child_pid).

Here's what is supposed to happen given the current code:

PID 123 receives SIGTSTP from the kernel when ^Z is pressed.
PID 123 calls kill(124, SIGTSTP).
PID 124 received SIGTSTP from PID 123.
PID 124 calls kill(0, SIGTSTP).

Reading the kill(2) man page, passing 0 as the first parameter should cause the signal to be sent to all processes in the current process group.

I actually think this sequence could be simplified to this:

PID 123 receives SIGTSTP from the kernel when ^Z is pressed.
PID 123 calls killpg(124, SIGTSTP).

We know that PID 124 must be the leader of a process group because it has called setsid and is acting as PID 1 in the new PID namespace.

However, I'm not sure this change would fix the original issue of not all child processes receiving the signal. I still need to debug it further to figure out is going on there.