| Summary: | sys-kernel/genkernel-4.1.2-r3 building kernel with broken lvm | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | r7l <gentoo> |
| Component: | Current packages | Assignee: | Gentoo Genkernel Maintainers <genkernel> |
| Status: | RESOLVED INVALID | ||
| Severity: | normal | CC: | gentoo, jstein |
| Priority: | Normal | ||
| Version: | unspecified | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: |
Genkernel 4.0.10 log
Genkernel 4.1.2-r3 log init.log udev.log |
||
|
Description
r7l
2020-11-02 14:12:08 UTC
Could you please try to collect more information with the help of our IRC channel #gentoo Thanks. Are you using SELinux? What kinda of information should be collected? I am currently rebuilding the kernel using sys-kernel/genkernel-4.1.2-r3 in order to be able to provide a log file from it. I will also rebuild it with 4.0.10 later on and provide a log file from that. But it will take some time as the system is really slow (just some Celeron) and i can't work on it to much as it is still in productive use during daytime. I am not sure how to obtain anything from booting other then rc.log or dmesg but i might check in to IRC. It's not running SELinux. It's running a pretty simple OpenRC based Gentoo with Raid and Crypt. I am also using genkernels SSH remote unlock feature, wich also still works as expected. In Genkernel >=4.1 we are now using UDEV in initramfs to initialize devices. UDEV will create /run/udev/data. This data must be preserved so that UDEV from real system can continue to use the devices. Whenever this is not possible, for example there is currently a known problem with SELinux where /run becomes unusable due to missing labels, you will see problems like this. So start with debugging mount/udev service from real system and verify that /run/udev/data is preserved, available and will get loaded on real system. Created attachment 669992 [details]
Genkernel 4.0.10 log
This is the log of the working 4.0.10 version.
Created attachment 669995 [details]
Genkernel 4.1.2-r3 log
This is the log of the broken 4.1.2-r3 version.
I've just added the logs of both version. Both of them are running with pretty much the same configuration and i've activated the cleanup options in order to not have anything being reused. I hope it does what it says. Other then that, i guess your last comment seems to point out the issue. While not using SELinux, all the errors i see are either related to /run or udev. As said, i can't really check for errors right now as i've just finished to rebuild of the kernel in 4.0.10 and i would need to reinstall 4.1.2-r3 and redo the kernel there but from what i can recall, there is an error about OpenRC complaining about something in /run, another error about a single service not being able to create a pid file in /run while all other services seem to be able to. Other then that it's mostly udev issues like said initially. Will take a bit as the kernel compiles for hours and i need the system to run right now. Is there any easy way to debug the mount? Never did debugging for the boot process. The system is a real system btw. Start with checking /etc/fstab and also compare enabled services. If your installation matches a stage3 system (=strictly following *official* handbook), you shouldn't see such an issue. Add some debug code to /etc/init.d/bootmisc to list /run content before and after bootmisc service. The system is running for a number of years. Can't really say how long and i've also migrated it from one hardware to another. So i am not entirely sure but it's been a while and while i am pretty certain to have followed the install guide during the inital setup for the most part. But i might have tempered with it here and there over the years. I've tried a couple things and added some debug into /etc/init.d/bootmisc to list /run. There is message right before OpenRC starts, after unlocking the partition and resuming boot: Something about "udevd still running! Trying to kill it" The error message i see during boot is: fopen(/run/openrc/rc.log) failed: No such file or directory Other then that it's errors about missing kernel modules (which are there once booted with older genkernel) and this PID creation error of a single service. I've added some debugging to /etc/init.d/bootmisc as you said but i am not sure what to look for. There is something in /run prior of the first error and it gets called right after that error with different output. Not sure if i've found something. At some point i've added tmpfs to /run and /tmp in fstab. I think i've found it somewhere in some forum and it never was an issue. But i am going to remove it and rerun the genkernel build. I might report back once that's done and if that's what have caused issues. > Something about "udevd still running! Trying to kill it" This sounds like https://gitweb.gentoo.org/proj/genkernel.git/tree/defaults/linuxrc?h=v4.1.2#n1317 which would already indicate some kind of abnormal behavior. Do you see /run/initramfs with content at all? It works now and the reason for it to fail was that line in /etc/fstab that mounted /run to tmpfs. As said before, i am not sure how long it's been since i've added it there. But once removed and after rerunning genkernel, it boots fine now. I am still seeing this warning and yes, it's this exact message you've linked to in git. Not sure how to look into this issue as it is even prior of OpenRC or any service being started. It's right after entering "resume-boot" in SSH. Other then that, i guess this LVM issue is resolved for me. No more errors and warnings and the system is running fine again. Thanks allot for your help! OK, you should now see /run/initramfs after boot. Please add "gk.udev.debug=yes udev.children-max=1" to kernel command-line and reboot. Please share /run/initramfs/init.log and /run/initramfs/udevd.log afterwards. Created attachment 671827 [details]
init.log
Created attachment 671830 [details]
udev.log
This last remaining error message seems to be less consistent after fixing the issue with /run. I've did a number of kernel rebuilds and restarts and it appears every now and then but i wasn't able to make it appear with the debug options attached to the kernel parameters. I've not seen them in the debug logs as well. But i'm submitting them anyways as there might be a clue for what is wrong. Beside the message itself, i can't see any issue. The system boots fine now and works as expected. Well, these logs are looking good. But this is not surprising given that they belong to a successful run.
Maybe keep these kernel command-line arguments set for a while and see if you can catch the error and report back. Would be interesting to understand why
> udevadm control --exit
fails for you sometimes.
I am closing this issue as INVALID for now because the reported issue was caused by your /etc/fstab overmounting /run. As said, keep posting to this bug in case you will get new logs showing an error.
Thanks!
|