Summary: | booting using initramfs generated using >=sys-kernel/genkernel-4.1.0-r2 in SELinux context causes issues with at least xorg and NetworkManager | ||
---|---|---|---|
Product: | Gentoo Hosted Projects | Reporter: | Fredrik Eriksson <gentoo> |
Component: | genkernel | Assignee: | Gentoo Genkernel Maintainers <genkernel> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | CC: | bkohler, info, juippis, maracay, matthias.grobarek, StormByte |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
genkernel.conf used
log from "genkernel initramfs" log from starting xorg when no input devices are found diff of listings of /run for working and not working environment diff between working (583) and non-working (585) initramfs emerge --info (as an attachment because it’s too large) |
Description
Fredrik Eriksson
2020-08-28 16:02:02 UTC
Created attachment 657304 [details]
genkernel.conf used
Created attachment 657306 [details]
log from "genkernel initramfs"
Created attachment 657308 [details]
log from starting xorg when no input devices are found
This sounds similar to another issue I heard about-- can you see if you are missing /run/utmp ? Do your /run contents look the same on working vs non-working boot? Created attachment 657336 [details, diff]
diff of listings of /run for working and not working environment
I generated a listing of /run in both a working and a broken environment using find /run | sort > list and created the attached diff.
utmp is present in both environments, but you're probably on to something because the broken environment is missing lots of files compared to the working environment. Most of them seems to be udev-related.
Looks like udev isn't starting at all... can you try to start the udev service on the "broken" setup and see if it starts up successfully? Getting closer, but unfortunately it looks like udev is running; 'rc-service udev status' reports the service as started and there is a systemd-udevd process in the process list. Trying to restart udev curiously results in that I now have two systemd-udevd-processes, but /run has not been more populated than before and the problem persists. For the record: when I noticed the problem I was using sys-fs/udev-245.5-r1, but right now I use sys-fs/udev-246-r1 as I unmasked it during troubleshooting. I tried to upgrade another machine which uses pretty much the same setup, except for using sys-fs/eudev instead of sys-fs/udev. The problem does not appear on that machine, so it's probably restricted to systems using sys-fs/udev. I have exactly the same problem on my system. Network devices won’t come up and X.org won’t start due to missing devices. With initramfs images from earlier genkernel versions, booting works fine. I diff’ed the two initramfs and noticed that the newer ones lack mdev-related stuff. Maybe genkernel omits something important? Created attachment 659024 [details]
diff between working (583) and non-working (585) initramfs
Created attachment 659026 [details]
emerge --info (as an attachment because it’s too large)
(In reply to matthias.grobarek from comment #8) > I diff’ed the two initramfs and noticed that the newer ones lack > mdev-related stuff. Maybe genkernel omits something important? No :) We switched to udev with v4.1.0+. I can reproduce this and having problems with wifi not being enabled (and impossible to) in NetworkManager: nmcli says: (...) wlo1: unmanaged "Intel Wireless-AC 9560" wifi (iwlwifi), 30:24:32:69:B7:84, plugin missing, hw, mtu 1500 That "plugin missing" is strange, since I compiled it with wifi USE flag: Installed versions: 1.26.2^t(19:29:37 16/09/20)(dhcpcd elogind introspection iwd json ncurses nss policykit resolvconf wext wifi -audit -bluetooth -connection-sharing -consolekit -dhclient -gnutls -modemmanager -ofono -ovs -ppp -selinux -systemd -teamd -test -vala ABI_MIPS="-n32 -n64 -o32" ABI_RISCV="-ilp32 -ilp32d -lp64 -lp64d" ABI_S390="-32 -64" ABI_X86="64 -32 -x32" KERNEL="linux") But according to equery files networkmanager I can see that the plugin is in fact there: /usr/lib64/NetworkManager/1.26.2/libnm-device-plugin-wifi.so So I don't think it is a NetworkManager ebuild problem but rather seems related to this genkernel update. Did not tried though reverting back to previous version without any other change to double confirm. (In reply to David Carlos Manuelda from comment #12) > I can reproduce this and having problems with wifi not being enabled (and > impossible to) in NetworkManager: > > nmcli says: > > (...) > wlo1: unmanaged > "Intel Wireless-AC 9560" > wifi (iwlwifi), 30:24:32:69:B7:84, plugin missing, hw, mtu 1500 > > That "plugin missing" is strange, since I compiled it with wifi USE flag: > > Installed versions: 1.26.2^t(19:29:37 16/09/20)(dhcpcd elogind > introspection iwd json ncurses nss policykit resolvconf wext wifi -audit > -bluetooth -connection-sharing -consolekit -dhclient -gnutls -modemmanager > -ofono -ovs -ppp -selinux -systemd -teamd -test -vala ABI_MIPS="-n32 -n64 > -o32" ABI_RISCV="-ilp32 -ilp32d -lp64 -lp64d" ABI_S390="-32 -64" ABI_X86="64 > -32 -x32" KERNEL="linux") > > But according to equery files networkmanager I can see that the plugin is in > fact there: > > /usr/lib64/NetworkManager/1.26.2/libnm-device-plugin-wifi.so > > So I don't think it is a NetworkManager ebuild problem but rather seems > related to this genkernel update. > > Did not tried though reverting back to previous version without any other > change to double confirm. Important to say that this problem is not related to kernel or other software, because wifi can be correctly associated by hand using wpa_supplicant I have reproduced this problem on every kernel build I have done since I upgraded genkernel to 4.1.2-r3 on Sep 28 - being gentoo-sources 5.8.12, 5.8.13, and 5.8.14. I could successfully start Xorg, and use NetworkManager on 5.8.11 (built on Sep 26). I have downgraded genkernel to 4.0.10, and rebuilt the initramfs of 5.8.14. This boots successfully, and I get into X, and Network Manager brings up my network connections, so the problem has to be with what genkernel is doing when it builds the initramfs. Bug 740576 looks like it is related. Is everyone on this bug running selinux? It looks to me like it could be this issue only affects selinux systems booted with the new genkernel release. (In reply to Eddie Chapman from comment #15) > Bug 740576 looks like it is related. Is everyone on this bug running > selinux? It looks to me like it could be this issue only affects selinux > systems booted with the new genkernel release. Yes... and no... or maybe just a little? Both my systems (one working and one not) use the hardened/selinux profile, but both are configured to run in permissive mode. The difference between the two systems regarding selinux is that the system that does not work has had selinux in enforcing mode previously, while the system that works have I never had in enforcing mode. The labelling is probably in various state of broken on both systems... (In reply to Fredrik Eriksson from comment #16) > Both my systems (one working and one not) use the hardened/selinux profile, > but both are configured to run in permissive mode. Both my affected systems are selinux permissive but, if the problem is what I think it is, it doesn't matter which selinux mode you are running. From my own observations and testing, and what the OP of the other bug suggests, I believe the problem exists because in the initramfs /run gets mounted by the linuxrc script *without* these options needed for all selinux systems: rootcontext=system_u:object_r:var_run_t,seclabel After the initramfs, OpenRC then mounts on top of /run with the selinux options (from the /run entry in the system's /etc/fstab). So on an selinux system you will have 2 /run mounts showing in /proc/mounts. But the selinux options used on the 2nd mount have no effect, your system runs as if you did not have them (which is correct and intended, I believe). This causes all kinds of problems on an selinux system for e.g. X, NetworkManager and OpenRC itself. I'm working on a change to the linuxrc script to try out myself, to mount /run correctly on selinux systems inside the initramfs. I modified the linuxrc script in the initramfs to mount /run with rootcontext=system_u:object_r:var_run_t,seclabel on selinux systems, and tried it on one of my such systems. But unfortunately the mount command fails when you ask it to use rootcontext= So I enabled the recently added Dropbear SSH feature - very nice, works really well by the way :-) - and SSH'd into the initrd environment while it was paused on bootup to poke around. The mount command is a busybox's, not util-linux, so it just complains of an invalid argument if you try to mount anything using rootcontext= . I tried both remounting /run and mounting a separate tmpfs on a made up directory, both fail. Then I copied /bin/mount (util-linux) plus its needed /lib64 libraries (including libselinux.so.1) from my rootfs which was already mounted at /newroot, into the initramfs environment. I was then able to run the util-linux mount command using rootcontext= and it did not emit any error, either with remount option for /run or when mounting a separate new tmpfs. However, although appearing to work, the resulting mountpoints did not have rootcontext= both in the output from the mount command and in /proc/mounts, so it simply did not take effect. Then I realised of course it wont work as no selinux policies are loaded yet. At that point I realised I'd opened a can of worms and getting selinux support into the initramfs environment is not going to be straighforward :-) Unless I'm barking up the wrong tree completely, I've concluded that >=sys-kernel/genkernel-4.1.0 is unusable for selinux systems and should not be used on them until we are able to mount /run using rootcontext= at the beginning inside the initramfs. Alternatively, which might be a better approach, on selinux systems at least, perhaps clean up the environment completely, including unmounting /run (maybe in the "cleanup" function in initrd.scripts?), before running switch_root. Then /run can be freshly mounted by OpenRC with the rootcontext= option. I say unusable as even on selinux systems without X or NetworkManager the /run situation still messes up OpenRC, which tries to run services multiple times on boot as a result (I'm seeing this on multiple selinux systems). I don't run any systemd systems so maybe there are no problems there. But anyway running a Gentoo selinux system without correct rootcontext= on /run is by definition a broken selinux system anyway. Workarounds that I can think of: disable selinux completely until this is fixed (my systems are permissive anyway, though there are still security benefits to lose by disabling a permissive system), use another initramfs solution, or one hack I might try now is insert a very early OpenRC service which unmounts /run and remounts it again after selinux policies are loaded. An update: I'm happy and relieved as I finally have a correctly booting selinux machine using sys-kernel/genkernel-4.1.2-r3 and it's nice new features :-) Albeit with a nasty but simple hack. First I tried inserting a command to unmount /run in one of the early scripts in the OpenRC sysinit runlevel, that runs right after genkernel. but umount failed (busy error) as turns out OpenRC puts stuff in /run to track bootup as soon as it is launched from init. Then I tried a /bin/sh wrapper around the "/sbin/openrc sysinit" command that init launches as its first child. umount failed here as well as turns out, from what I could gather from debugging, that /sbin/init itself pops a single file into /run as soon as it is launched by genkernel. So I used a simple wrapper for /sbin/init containing just: #!/bin/sh echo "Trying to unmount leftover /run from genkernel" umount -n /run # sleep briefly so we can see any output from umount sleep 5 exec /sbin/init and added init=/sbin/myinit to kernel command line in grub (genkernel picks that up and launches that instead of /sbin/init). Now umount succeeded in unmounting /run and OpenRC booted perfectly without any of the mess created by the leftover /run, and I have a fully working selinux system again. OpenRC handled mounting /run itself as it had done before, and /proc/mounts now has only a single /run mounted with the correct rootcontext= option. This confirms to me without a doubt that the problem in this bug report is simply caused by the /run mountpoint leftover by genkernel without the needed rootcontext= on selinux systems. Simple solution IMHO is to simply unmount that in genkernel linuxrc script before running /sbin/init - but I'll leave that to genkernel devs. One quick thing I forgot to add in case anyone tries what I did, make sure you set correct context on your wrapper: chcon --reference /sbin/init /sbin/myinit and of course don't forget: chmod 0755 /sbin/myinit The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/genkernel.git/commit/?id=ab6d73225f21be7d55649363ceb460d91270638d commit ab6d73225f21be7d55649363ceb460d91270638d Author: Thomas Deutschmann <whissi@gentoo.org> AuthorDate: 2021-02-08 01:25:50 +0000 Commit: Thomas Deutschmann <whissi@gentoo.org> CommitDate: 2021-02-08 21:20:28 +0000 linuxrc: Add gk.preserverun.disabled When this boolean option is set and enabled, genkernel initramfs will unmount /run before calling switch_root. This can help in SELinux context for example where labeling is required which is not supported by genkernel. Bug: https://bugs.gentoo.org/739424 Bug: https://bugs.gentoo.org/740576 Signed-off-by: Thomas Deutschmann <whissi@gentoo.org> defaults/initrd.defaults | 1 + defaults/linuxrc | 15 +++++++++++ doc/genkernel.8.txt | 6 +++++ ....1-switch_root-check-if-mountpoint-exists.patch | 31 ++++++++++++++++++++++ 4 files changed, 53 insertions(+) Merging with bug 740576. *** This bug has been marked as a duplicate of bug 740576 *** I’m sorry to bother you but when I saw this ticket being closed and genkernel-4.2.1-r1 being stable in Portage, I gave it another try. Sadly, also that version creates non-working initramfs for me. Just as 4.1.2-r3 did. When I go back to 4.0.10, I get working an initramfs. The symptoms haven’t changed: • X.org does not detect any input devices • the network interfaces are not started properly (especially IPv4) Should this ticket be reopened (because it seems to be closely related) or open another one? 740576 doesn’t really tackle my problem either because I don’t run SELinux. Have you booted with gk.preserverun.disabled=yes? Ahhh I missed that option. With gk.preserverun.disabled=yes, booting from the initramfs created by genkernel >=4.1.2-r3 works again. Thank you! But since I don’t use SELinux, why is it that I need that flag at all? You can find my emerge --info in comment #10. |