Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 675312

Summary: sys-apps/portage: pid-sandbox does not reap orphaned descendant processes, leaves zombies
Product: Portage Development Reporter: Toralf Förster <toralf>
Component: Core - Ebuild SupportAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal CC: slyfox
Priority: Normal Keywords: InVCS, REGRESSION
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 671498    
Attachments: 29.20190113-040328
pstree-a.txt

Description Toralf Förster gentoo-dev 2019-01-13 09:18:36 UTC
Observed since 2 daas at the tinderbox (pstree will be attached).
Comment 1 Toralf Förster gentoo-dev 2019-01-13 09:19:15 UTC
Created attachment 560826 [details]
29.20190113-040328
Comment 2 Toralf Förster gentoo-dev 2019-01-13 12:57:35 UTC
Created attachment 560838 [details]
pstree-a.txt
Comment 3 Toralf Förster gentoo-dev 2019-01-13 13:59:28 UTC
FWIW this issue/regression of https://gitweb.gentoo.org/proj/sandbox.git/commit/?id=f3e51a930312422cc78b693a247b7c5704ac90a2 seems to be package specific.
Comment 4 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-13 15:19:57 UTC
Do I understand correctly unexpected threads here are only 'erl_child_setup' and 'bash' ones and those are zombies? Or something else as well?

The patch you linked (unless bugs) should not change behaviour of thread creation/shutdown.

SIGCHLD handling might have changed by 'pid-ns-init' if for whatever reason 'pid-ns-init' disables SIGCHLD and sandbox inherits it.

If you can easily reproduce it can you also try FEATURES=-pid-sandbox to see if it helps?

AFAIU known reproducers are:
    dev-java/openjdk
    net-analyzer/tsung
Comment 5 Toralf Förster gentoo-dev 2019-01-13 16:11:54 UTC
(In reply to Sergei Trofimovich from comment #4)
IMO they are not zombies, at least they are reaped at the end of the emerge process, but they do fill up quickly the process id table

FEATURES=-pid-sandbox helped, reproducer was net-analyzer/tsung
Comment 6 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-13 19:15:42 UTC
(In reply to Toralf Förster from comment #5)
> (In reply to Sergei Trofimovich from comment #4)
> IMO they are not zombies, at least they are reaped at the end of the emerge
> process, but they do fill up quickly the process id table
> 
> FEATURES=-pid-sandbox helped, reproducer was net-analyzer/tsung

Thanks for the info!

I reproduced one locally and those are zombies at least for me:

...
21637 root       20   0 13760  7760  4736 S  0.0  0.0  0:00.02     └─ /usr/bin/python3.6m /usr/lib/portage/python3.6/pid-ns-init 11683
21638 portage    20   0  2384  1704  1552 S  0.0  0.0  0:00.00          └─ [dev-lang/erlang-21.1.1] sandbox /usr/lib/portage/python3.6/ebuild.sh compile
32507 portage    20   0     0     0     0 Z  0.0  0.0  0:00.00           ├─ erl_child_setup 1024
32465 portage    20   0     0     0     0 Z  0.0  0.0  0:00.00           ├─ erl_child_setup 1024
32438 portage    20   0     0     0     0 Z  0.0  0.0  0:00.00           ├─ erl_child_setup 1024
32436 portage    20   0     0     0     0 Z  0.0  0.0  0:00.00           ├─ erl_child_setup 1024
...

Thus I suspect something in reparenting handling is way off in pid-sandbox case. It's a bit unclear what is pid-1 in new pid namespace. It ought to be a pid-ns-init but children are reparented to sandbox.
Comment 7 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-13 19:27:59 UTC
sandbox is really pid=1. I've checked it by tweaking the ebuild slightly:

--- a/net-analyzer/tsung/tsung-1.7.0.ebuild
+++ b/net-analyzer/tsung/tsung-1.7.0.ebuild
@@ -33,2 +33,4 @@ src_compile() {
        emake || die "Failed building"
+       pstree -a -p -s -S
+       die "enough"
 }

The output is:

sandbox,1 /usr/lib/portage/python3.6/ebuild.sh compile
  ├─ebuild.sh,4 /usr/lib/portage/python3.6/ebuild.sh compile
  │   └─ebuild.sh,21 /usr/lib/portage/python3.6/ebuild.sh compile
  │       └─pstree,4738 -a -p -s -S
  ├─(erl_child_setup,96)
  ├─(erl_child_setup,150)
Comment 8 Toralf Förster gentoo-dev 2019-01-13 19:32:54 UTC
FWIW the effect is much higher if I start the tinderox chroot with nice (due to the longer emerge time). The # of (zombie) processes is growing constantly with each second.
Comment 9 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-13 19:39:17 UTC
Without looking at the code (sorry) my guess is that portage calls exec() into 'pid-ns-init' without fork()ing a new process after unshare(CLONE_NEWPID). And that causes only next fork()ed process to ender new namespace.

From 'man unshare':
"""
CLONE_NEWPID (since Linux 3.8)
  This flag has the same effect as the clone(2) CLONE_NEWPID flag.
  Unshare the PID namespace, so that the calling process has a new
  PID  namespace  for  its children  which is not shared with any
  previously existing process.  The calling process is not moved
  into the new namespace. ...
"""
Comment 10 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-13 19:47:15 UTC
Here is our unshare()/execv() same-process sequence:
    https://gitweb.gentoo.org/proj/portage.git/tree/lib/portage/process.py#n552
Comment 11 Zac Medico gentoo-dev 2019-01-13 22:00:46 UTC
According to the pid_namespaces(7) man page, we need to arrange it so that pid-ns-init is the first process forked and execed immediately after the unshare call:

> The namespace init process
> 
> The first process created in a new namespace (i.e., the
> process created using clone(2) with the CLONE_NEWPID flag,
> or the first child created by a process after a call to
> unshare(2) using the CLONE_NEWPID flag) has the PID 1,
> and is the "init" process for the namespace (see init(1)).
> A child process that is orphaned within the namespace
> will be reparented to this process rather than init(1)
> (unless one of the ancestors of the child in the same PID
> names‐ pace employed the prctl(2) PR_SET_CHILD_SUBREAPER
> command to mark itself as the reaper of orphaned descendant
> pro‐ cesses).
Comment 13 Larry the Git Cow gentoo-dev 2019-01-15 06:22:36 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4c7debeccab9d1ea546ffbd6cbb9ff352aba8f63

commit 4c7debeccab9d1ea546ffbd6cbb9ff352aba8f63
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2019-01-15 06:13:54 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2019-01-15 06:22:23 +0000

    sys-apps/portage: version bump to 2.3.56
    
     #675284 restore canonicalize func
     #675312 pid-sandbox: execute pid-ns-init as pid 1
    
    Bug: https://bugs.gentoo.org/670484
    Bug: https://bugs.gentoo.org/671498
    Bug: https://bugs.gentoo.org/675312
    Package-Manager: Portage-2.3.56, Repoman-2.3.12
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-2.3.56.ebuild | 271 +++++++++++++++++++++++++++++++++
 2 files changed, 272 insertions(+)
Comment 14 Larry the Git Cow gentoo-dev 2019-01-15 06:22:50 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=fb406579b1d13c1ba23b28e0bb794c22878a58c0

commit fb406579b1d13c1ba23b28e0bb794c22878a58c0
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2019-01-13 23:06:35 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2019-01-15 05:45:55 +0000

    pid-sandbox: execute pid-ns-init as pid 1 (bug 675312)
    
    Execute pid-ns-init as the first fork after unshare, as
    required for it to have pid 1 and become the default reaper
    of orphaned descendant processes. In _exec, exec a separate
    pid-ns-init process to behave as a supervisor which will
    forward signals to init and forward exit status to the parent
    process.
    
    Fixes: a75d5546e3a4 ("Introduce a tiny init replacement for inside pid namespace")
    Bug: https://bugs.gentoo.org/675312
    Reviewed-by: Brian Dolbec <dolsen@gentoo.org>
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 bin/pid-ns-init        | 44 ++++++++++++++++++++++++++++++++++++++++----
 lib/portage/process.py | 25 +++++++++++++++++++------
 2 files changed, 59 insertions(+), 10 deletions(-)