Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 916964 - OpenRC cgroups init script has inconsistent default and doesn't enable controllers (sys-cluster/k3s-1.25.4_p1 fails to start after install)
Summary: OpenRC cgroups init script has inconsistent default and doesn't enable contro...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: OpenRC (show other bugs)
Hardware: All Linux
: Normal critical
Assignee: OpenRC Team
URL:
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2023-11-06 15:29 UTC by acab
Modified: 2023-11-17 05:35 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge info (emerge-info,6.66 KB, text/plain)
2023-11-06 15:31 UTC, acab
Details
logfile (k3s.log,25.01 KB, text/plain)
2023-11-06 15:31 UTC, acab
Details

Note You need to log in before you can comment on or make changes to this bug.
Description acab 2023-11-06 15:29:42 UTC
Hi,
i've installed =sys-cluster/k3s-1.25.4_p1 but it enters a crash loop as soon as i start it


Reproducible: Always

Steps to Reproduce:
1. emerge sys-cluster/k3s
2. /etc/init.d/k3s start

Actual Results:  
note the crash loop in /var/log/k3s/k3s.log

Expected Results:  
the cluster should start

attaching the relevant files
Comment 1 acab 2023-11-06 15:31:14 UTC
Created attachment 874153 [details]
emerge info
Comment 2 acab 2023-11-06 15:31:43 UTC
Created attachment 874154 [details]
logfile
Comment 3 acab 2023-11-06 18:24:07 UTC
Update:
setting rc_cgroup_mode="hybrid" in /etc/rc.conf makes the problem go away
Comment 4 acab 2023-11-08 11:07:08 UTC
Hi a quick update.

I've traced the root issue to the cpu cgroup v2 controller not being settable.

root@box /sys/fs/cgroup # echo "+cpuset +io +memory +hugetlb +pids +rdma +misc" > cgroup.subtree_control 
root@box /sys/fs/cgroup # echo "+cpu" > cgroup.subtree_control
-bash: echo: write error: Invalid argument

With cgroup v1 it starts but since its support is being phased out from k8s, that's not a good solution.

I have zero understanding of cgroup and there's no trace of the reason in the kernel logs. I'm taking hints. Thanks a lot!
Comment 5 acab 2023-11-08 11:27:45 UTC
One further finding, sorry for the spam.

I CAN enable "+cpu" in cgroup.subtree_control if I do that via tty1 BEFORE LOGGING IN via lxdm. But afterwards it always ends up in EINVAL.

The following might be related (from Documentation/admin-guide/cgroup-v2.rst):
WARNING: cgroup2 doesn't yet support control of realtime processes and the cpu controller can only be enabled when all RT processes are in the root cgroup. Be aware that system management software may already have placed RT processes into nonroot cgroups during the system boot process, and these processes may need to be moved to the root cgroup before the cpu controller can be enabled.
Comment 6 acab 2023-11-08 12:30:30 UTC
ok i've figured this out: the bug is in openrc

the /etc/init.d/cgroups has:

start()
{
        # set up kernel support for cgroups
        if [ -d /sys/fs/cgroup ]; then
                mount_cgroups
                restorecon_cgroups
        fi
        return 0
}


The "mount_cgroups" func handles the 3 different case (legacy, hybrid and unified). Important! Note that "unified" is the default case here.

mount_cgroups()
{
        case "${rc_cgroup_mode:-unified}" in
        hybrid) cgroups_hybrid ;;
        legacy) cgroups_legacy ;;
        unified) cgroups_unified ;;
        esac
        return 0
}


In turn the "cgroups_unified" func reads:

cgroups_unified()
{
        cgroup2_base
        cgroup2_controllers
        return 0
}


The cgroup2_base just mounts /sys/fs/cgroup; the interesting one is "cgroup2_controllers". A snip of the func follows. Note how "unified" is NOT the default anymore.

cgroup2_controllers()
[...]
        read -r active < "${cgroup_path}/cgroup.controllers"
        for x in ${active}; do
        case "$rc_cgroup_mode" in
                unified)
                        echo "+${x}"  > "${cgroup_path}/cgroup.subtree_control"
                        ;;
                hybrid)
[...]


This leads to the issue i was experiencing: when leaving rc_cgroup_mode unset (the default) in /etc/rc.conf, the "unified" mode should be used (cgroup v2 only).
However due to the inconsistency in /etc/init.d/cgroups script presented above, the cgroupv2 fs gets mounted but the controlllers are not enabled. And once the system boot progresses it becomes too late for the "cpu" controller to be manually enabled.

The workaround is to explicitly set rc_cgroup_mode="unified". A proper fix belongs in /etc/init.d/cgroups. Shall i file a bug? Where?

Now back to this very bug, it can be closed.
Sorry for the noise, it took a while to figure this out.
Comment 7 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-11-12 05:41:20 UTC
(In reply to acab from comment #6)
> ok i've figured this out: the bug is in openrc
> 

I think it should be okay to just rename this bug / move it into the OpenRC component, unless I'm misunderstanding the issue.

> [...]
Comment 8 acab 2023-11-12 06:01:53 UTC
(In reply to Sam James from comment #7)
> (In reply to acab from comment #6)
> > ok i've figured this out: the bug is in openrc
> > 
> 
> I think it should be okay to just rename this bug / move it into the OpenRC
> component, unless I'm misunderstanding the issue.
> 
> > [...]

No, that's correct. Go ahead.

Thanks
Comment 9 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-11-12 06:50:53 UTC
Does https://github.com/OpenRC/openrc/pull/669 look sufficient? I've not tested it yet.
Comment 10 acab 2023-11-13 09:52:53 UTC
(In reply to Sam James from comment #9)
> Does https://github.com/OpenRC/openrc/pull/669 look sufficient? I've not
> tested it yet.

Just tested it and it works perfectly.
Thanks!
Comment 11 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-11-13 10:32:13 UTC
Thanks!
Comment 12 William Hubbs gentoo-dev 2023-11-17 05:35:25 UTC
This will be fixed in 0.52.