Tried using a ZFS-enabled initramfs built by genkernel-4.1.0_rc1, and after rebooting, the initramfs will load and then it just sits on "Importing ZFS pools" for a very long time. I gave it about 10+ minutes, before rebooting into a rescue image and switching back to the older genkernel-4.0.10-built initramfs, which works fine to load the root ZFS pool and boot into multiuser. If I had to take a guess, it looks like the switch to udev-managed devices might be to blame. Since this was added for systemd purposes, can the mdev functionality be added back and put behind a command-line argument? I run sysvinit/OpenRC, so the udev change doesn't bring me any benefit on the affected system. Plus the mdev-based initramfs is almost 1.5MB smaller. Still getting used to genkernel, so I did not attempt to do any debugging. If there are any debugging tips I can try, let me know. For now, sticking with the older, mdev-based initramfs.
I'm running a very similar setup: musl profile, zfs on root, openrc and similarly had to revert back to from 4.1.0-rc1 to 4.0.10. The only difference is that after loading modules the boot hangs while displaying "Activating udev ..." while usually should be "Importing ZFS pool".
(In reply to Joshua Kinard from comment #0) > If I had to take a guess, it looks like the switch to udev-managed devices > might be to blame. Since this was added for systemd purposes, can the mdev > functionality be added back and put behind a command-line argument? I run > sysvinit/OpenRC, so the udev change doesn't bring me any benefit on the > affected system. Plus the mdev-based initramfs is almost 1.5MB smaller. Don't give up so early -- it's still in development. Also, udev has some advantages for you -- like udevsettle, which can avoid race conditions. The only ZFS-related change I can think of is https://gitweb.gentoo.org/proj/genkernel.git/commit/?id=73689f82a7ef090c4d8c22eced7a56471be14156 -- but this should be unrelated. For debugging: Enable SSH feature and try to see if you can connect and see what's failing/hanging. There's also /tmp/init.log in initramfs... CC'ing ZFS maintainers for input/debug request.
(In reply to Thomas Deutschmann from comment #2) > (In reply to Joshua Kinard from comment #0) > > If I had to take a guess, it looks like the switch to udev-managed devices > > might be to blame. Since this was added for systemd purposes, can the mdev > > functionality be added back and put behind a command-line argument? I run > > sysvinit/OpenRC, so the udev change doesn't bring me any benefit on the > > affected system. Plus the mdev-based initramfs is almost 1.5MB smaller. > > Don't give up so early -- it's still in development. Also, udev has some > advantages for you -- like udevsettle, which can avoid race conditions. > > The only ZFS-related change I can think of is > https://gitweb.gentoo.org/proj/genkernel.git/commit/ > ?id=73689f82a7ef090c4d8c22eced7a56471be14156 -- but this should be unrelated. > > For debugging: Enable SSH feature and try to see if you can connect and see > what's failing/hanging. There's also /tmp/init.log in initramfs... > > CC'ing ZFS maintainers for input/debug request. Haven't given up, just needed a working dev machine :) udev might actually help with Linux's obsession with detecting my drives out of order (despite the fact they're on an HBA card with dedicated port assignments, but I digress). I wrote some udev rules for persistent disk names, but it doesn't look like mdev supports anything similar. So I'll need to test that again once the ZFS issue is hunted down. I'll also have to try the SSH thing later on. One suspicion I have is udev may not be detecting the disks the same way mdev does, causing the pool import to fail.
I hit the same issue with root on ZFS. Genkernel 4.0.10 works fine. I tried with kernels: 4.19.129, 4.19.133, 4.19.138. ZFS compiled into the kernel. Debugging the initrc I have found out that the udev is not the issue here. The initrc hangs in 'start_volumes' function. I didn't go deeper in my investigation. I was reproducing this in qemu/kvm as I usually test my kernel before booting my system with a new kernel. Here is my script to test the ZFS kernel if this helpful with terminal only: #!/bin/bash default_version=$(cd /usr/src/linux; make -s kernelversion) version=${1:-$default_version} echo Testing kernel version: $version kernel=/boot/vmlinuz-$version-x86_64 initrd=/boot/initramfs-$version-x86_64.img qemu-system-x86_64 -kernel $kernel \ -initrd $initrd \ -cpu host \ -smp 8 \ -enable-kvm \ -m 16G \ -curses \ \ -drive file=/ssd/temp/gentoo-special1.img,format=raw,id=disk-s1,if=none,cache=none,aio=threads,discard=unmap \ -device ide-hd,drive=disk-s1,bus=ide.0 \ \ -drive file=/ssd/temp/gentoo-special2.img,format=raw,id=disk-s2,if=none,cache=none,aio=threads,discard=unmap \ -device ide-hd,drive=disk-s2,bus=ide.0 \ \ -drive file=/ssd/temp/gentoo-zfs1.img,format=raw,id=disk1,if=none,cache=none,aio=threads,discard=unmap \ -device ide-hd,drive=disk1,bus=ide.1 \ \ -drive file=/ssd/temp/gentoo-zfs2.img,format=raw,id=disk2,if=none,cache=none,aio=threads,discard=unmap \ -device ide-hd,drive=disk2,bus=ide.1 \ -append "dozfs root=ZFS"
Maybe we need to get rid of > export ZPOOL_IMPORT_UDEV_TIMEOUT_MS=0 in start_volumes() now that we are using udev. Could please try that change? Just edit /usr/share/genkernel/defaults/initrd.scripts and re-generate initramfs and reboot. If this doesn't fix the problem, please add "set -x" / "set +x" around start_voulmes() to see where it stops.
yes, genkernel needs to get rid of export ZPOOL_IMPORT_UDEV_TIMEOUT_MS=0 this is a magic variable needed for mdev/static dev setups. it just makes zfs ignore calls to libudev completely, calls times out immediately. here's the commit that touched https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults?id=94edd477e491da2b900e6d1b1d71884081cf09ab just don't set it at all in any file with udev. I'm testing now to confirm.
it hangs in start_volumes close at 'if call_func_timeout waitForZFS 5' > + call_func_timeout waitForZFS 5 > + local 'func=waitForZFS 'timeout=5' pid watcher > + pid=3875 > + waitForZFS > + watcher=3876+ > '[' '!'+ -cwait /dev/zfs 3875 ] > > + exit 1
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/genkernel.git/commit/?id=5c2d7fc5bbc302c6ad6eac111e375709d7031187 commit 5c2d7fc5bbc302c6ad6eac111e375709d7031187 Author: Thomas Deutschmann <whissi@gentoo.org> AuthorDate: 2020-08-09 20:02:07 +0000 Commit: Thomas Deutschmann <whissi@gentoo.org> CommitDate: 2020-08-09 20:02:07 +0000 defaults/initrd.scripts: start_volumes(): Don't wait for /dev/zfs This is not needed anymore and also not working. Bug: https://bugs.gentoo.org/736084 Signed-off-by: Thomas Deutschmann <whissi@gentoo.org> defaults/initrd.scripts | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-)
(In reply to Larry the Git Cow from comment #8) > The bug has been referenced in the following commit(s): > > https://gitweb.gentoo.org/proj/genkernel.git/commit/ > ?id=5c2d7fc5bbc302c6ad6eac111e375709d7031187 > > commit 5c2d7fc5bbc302c6ad6eac111e375709d7031187 > Author: Thomas Deutschmann <whissi@gentoo.org> > AuthorDate: 2020-08-09 20:02:07 +0000 > Commit: Thomas Deutschmann <whissi@gentoo.org> > CommitDate: 2020-08-09 20:02:07 +0000 > > defaults/initrd.scripts: start_volumes(): Don't wait for /dev/zfs > > This is not needed anymore and also not working. > > Bug: https://bugs.gentoo.org/736084 > Signed-off-by: Thomas Deutschmann <whissi@gentoo.org> > > defaults/initrd.scripts | 13 +------------ > 1 file changed, 1 insertion(+), 12 deletions(-) This fixes the problem for me as well.
Fix released with >=genkernel-4.1.0 final.