On one of my machines with `openrc-0.61` I've discovered these log entries: ``` Mar 26 16:15:42 queen supervise-daemon[2075]: Supervisor command line: supervise-daemon user.freddie --start /usr/libexec/rc/bin/openrc-user -- freddie Mar 26 16:15:42 queen supervise-daemon[2078]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2078, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2088]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2088, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2094]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2094, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2100]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2100, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2107]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2107, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2111]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2111, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2115]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2115, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2119]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2119, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2123]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2123, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2127]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2127, exited with return code 255 Mar 26 16:15:42 queen supervise-daemon[2131]: Child command line: /usr/libexec/rc/bin/openrc-user freddie Mar 26 16:15:42 queen supervise-daemon[2076]: /usr/libexec/rc/bin/openrc-user, pid 2131, exited with return code 255 ``` `rc-status` shows: ``` Dynamic Runlevel: hotplugged user.freddie [ failed ] Dynamic Runlevel: needed/wanted Dynamic Runlevel: manual user.freddie [ failed ] ``` It's an unstable gentoo box: ``` Portage 3.0.67 (python 3.12.9-final-0, default/linux/amd64/23.0/split-usr, gcc-14, glibc-2.41-r1, 6.6.83-gentoo x86_64) ================================================================= System uname: Linux-6.6.83-gentoo-x86_64-Intel-R-_Xeon-R-_CPU_E5-2620_v3_@_2.40GHz-with-glibc2.41 KiB Mem: 4896064 total, 4515020 free KiB Swap: 3071996 total, 3071996 free Timestamp of repository gentoo: Wed, 26 Mar 2025 11:45:00 +0000 Head commit of repository gentoo: 26ca07d4504245e52b3a4b1545c125a0b5d86bd7 sh bash 5.2_p37 ld GNU ld (Gentoo 2.44 p1) 2.44.0 app-misc/pax-utils: 1.3.8::gentoo app-shells/bash: 5.2_p37::gentoo dev-build/autoconf: 2.72-r1::gentoo dev-build/automake: 1.16.5-r2::gentoo, 1.17-r2::gentoo dev-build/cmake: 3.31.6-r1::gentoo dev-build/libtool: 2.5.4::gentoo dev-build/make: 4.4.1-r100::gentoo dev-build/meson: 1.7.0::gentoo dev-lang/perl: 5.40.1::gentoo dev-lang/python: 3.11.9-r1::gentoo, 3.12.9::gentoo, 3.13.2::gentoo sys-apps/baselayout: 2.17::gentoo sys-apps/openrc: 0.61::gentoo sys-apps/sandbox: 2.46::gentoo sys-devel/binutils: 2.44::gentoo sys-devel/binutils-config: 5.5.2::gentoo sys-devel/gcc: 14.2.1_p20250301::gentoo sys-devel/gcc-config: 2.12.1::gentoo sys-kernel/linux-headers: 6.13::gentoo (virtual/os-headers) sys-libs/glibc: 2.41-r1::gentoo ```
ping
This is in 0.62, released yesterday: From 6411b4b0cce1781da20ec0a6c0babc4bd30d1f3f Mon Sep 17 00:00:00 2001 From: NRK <nrk@disroot.org> Date: Fri, 11 Apr 2025 06:15:58 +0000 Subject: [PATCH] init.d/user: quit fast if it's failing rapidly Ref: https://bugs.gentoo.org/952108 Ref: https://github.com/OpenRC/openrc/issues/817
With 0.62 it's even worse. When doing $(ssh server), it hangs and does not connect. After trying in the second terminal, I'm able to log in. However, there is the user.${USER} failed entry in rc-status. As a bonus, every time I try to log in, there is 100% cpu eating process: root 1780 99.9 0.0 2792 2048 ? R 13:56 40:49 supervise-daemon user.freddie --start --respawn-max 3 --respawn-period 1 --notify fd:3 /usr/libexec/rc/bin/openrc-user -- freddie root 1781 0.0 0.0 0 0 ? Z 13:56 0:00 [supervise-daemo] <defunct> root 3413 99.9 0.0 2792 2048 ? R 14:34 2:56 supervise-daemon user.freddie --start --respawn-max 3 --respawn-period 1 --notify fd:3 /usr/libexec/rc/bin/openrc-user -- freddie root 3414 0.0 0.0 0 0 ? Z 14:34 0:00 [supervise-daemo] <defunct> root 3811 99.9 0.0 2792 2048 ? R 14:36 0:52 supervise-daemon user.freddie --start --respawn-max 3 --respawn-period 1 --notify fd:3 /usr/libexec/rc/bin/openrc-user -- freddie root 3812 0.0 0.0 0 0 ? Z 14:36 0:00 [supervise-daemo] <defunct>
could you try https://github.com/OpenRC/openrc/pull/848? (patch https://patch-diff.githubusercontent.com/raw/OpenRC/openrc/pull/848.patch)
(In reply to Anna from comment #4) > could you try https://github.com/OpenRC/openrc/pull/848? (patch > https://patch-diff.githubusercontent.com/raw/OpenRC/openrc/pull/848.patch) Thanks!!! Fixed v0.62
UPD: Partial fixed. Is able to ssh login, but Z processes and 100% CPU load is present.
(In reply to Yuriy Dmitriev from comment #6) > UPD: > Partial fixed. Is able to ssh login, but Z processes and 100% CPU load is > present. question, did you reboot or otherwise kill those processes after applying the patch? just making sure they're not lasting from before the patch, since i can't yet reproduce the issue
(In reply to Anna from comment #7) > (In reply to Yuriy Dmitriev from comment #6) > > UPD: > > Partial fixed. Is able to ssh login, but Z processes and 100% CPU load is > > present. > > question, did you reboot or otherwise kill those processes after applying > the patch? just making sure they're not lasting from before the patch, since > i can't yet reproduce the issue Of course, completely reboot. Reboot LXC containers and host. I revert to disable user services. Simply rename /etc/init.d/user to other name. May be this is LXC specific? Unprivileged gentoo systems without nested cgroups.
> I revert to disable user services. Simply rename /etc/init.d/user to other > name. FYI, you can simply install mask the PAM module to avoid it auto starting. This is what I have on my make.conf: INSTALL_MASK="/lib64/security/pam_openrc.so"
(In reply to Anna from comment #4) > could you try https://github.com/OpenRC/openrc/pull/848? (patch > https://patch-diff.githubusercontent.com/raw/OpenRC/openrc/pull/848.patch) Thanks, tried the patch with openrc-0.62, I'm now able to log in. I don't see any zombie processes. However, rc-status still looks like: Dynamic Runlevel: hotplugged user.freddie [ failed ] Dynamic Runlevel: needed/wanted Dynamic Runlevel: manual user.freddie [ failed ]
(In reply to Tomáš Mózes from comment #10) > (In reply to Anna from comment #4) > > could you try https://github.com/OpenRC/openrc/pull/848? (patch > > https://patch-diff.githubusercontent.com/raw/OpenRC/openrc/pull/848.patch) > > Thanks, tried the patch with openrc-0.62, I'm now able to log in. I don't > see any zombie processes. great! seems like the only issue remaining is the lxc edge case that i can't reproduce yet then. > However, rc-status still looks like: > Dynamic Runlevel: hotplugged > user.freddie > [ failed ] > Dynamic Runlevel: needed/wanted > Dynamic Runlevel: manual > user.freddie > [ failed ] if XDG_RUNTIME_DIR is unset, this is expected, for now at least (it shouldn't be listed twice though, that's another bug). we could add a check so that this exit state is considered clean (so it's marked as stopped instead of failed), though it technically is still a 'failure' condiition