Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 736084 - sys-kernel/genkernel-4.1.0_rc1: fails to import ZFS root pool
Summary: sys-kernel/genkernel-4.1.0_rc1: fails to import ZFS root pool
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: genkernel (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Genkernel Maintainers
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks:
 
Reported: 2020-08-05 21:04 UTC by Joshua Kinard
Modified: 2020-08-10 09:51 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Kinard gentoo-dev 2020-08-05 21:04:11 UTC
Tried using a ZFS-enabled initramfs built by genkernel-4.1.0_rc1, and after rebooting, the initramfs will load and then it just sits on "Importing ZFS pools" for a very long time.  I gave it about 10+ minutes, before rebooting into a rescue image and switching back to the older genkernel-4.0.10-built initramfs, which works fine to load the root ZFS pool and boot into multiuser.

If I had to take a guess, it looks like the switch to udev-managed devices might be to blame.  Since this was added for systemd purposes, can the mdev functionality be added back and put behind a command-line argument?  I run sysvinit/OpenRC, so the udev change doesn't bring me any benefit on the affected system.  Plus the mdev-based initramfs is almost 1.5MB smaller.

Still getting used to genkernel, so I did not attempt to do any debugging.  If there are any debugging tips I can try, let me know.  For now, sticking with the older, mdev-based initramfs.
Comment 1 asx 2020-08-06 18:24:55 UTC
I'm running a very similar setup: musl profile, zfs on root, openrc and similarly had to revert back to from 4.1.0-rc1 to 4.0.10.

The only difference is that after loading modules the boot hangs while displaying "Activating udev ..." while usually should be "Importing ZFS pool".
Comment 2 Thomas Deutschmann (RETIRED) gentoo-dev 2020-08-06 19:02:43 UTC
(In reply to Joshua Kinard from comment #0)
> If I had to take a guess, it looks like the switch to udev-managed devices
> might be to blame.  Since this was added for systemd purposes, can the mdev
> functionality be added back and put behind a command-line argument?  I run
> sysvinit/OpenRC, so the udev change doesn't bring me any benefit on the
> affected system.  Plus the mdev-based initramfs is almost 1.5MB smaller.

Don't give up so early -- it's still in development. Also, udev has some advantages for you -- like udevsettle, which can avoid race conditions.

The only ZFS-related change I can think of is https://gitweb.gentoo.org/proj/genkernel.git/commit/?id=73689f82a7ef090c4d8c22eced7a56471be14156 -- but this should be unrelated.

For debugging: Enable SSH feature and try to see if you can connect and see what's failing/hanging. There's also /tmp/init.log in initramfs...

CC'ing ZFS maintainers for input/debug request.
Comment 3 Joshua Kinard gentoo-dev 2020-08-06 19:44:02 UTC
(In reply to Thomas Deutschmann from comment #2)
> (In reply to Joshua Kinard from comment #0)
> > If I had to take a guess, it looks like the switch to udev-managed devices
> > might be to blame.  Since this was added for systemd purposes, can the mdev
> > functionality be added back and put behind a command-line argument?  I run
> > sysvinit/OpenRC, so the udev change doesn't bring me any benefit on the
> > affected system.  Plus the mdev-based initramfs is almost 1.5MB smaller.
> 
> Don't give up so early -- it's still in development. Also, udev has some
> advantages for you -- like udevsettle, which can avoid race conditions.
> 
> The only ZFS-related change I can think of is
> https://gitweb.gentoo.org/proj/genkernel.git/commit/
> ?id=73689f82a7ef090c4d8c22eced7a56471be14156 -- but this should be unrelated.
> 
> For debugging: Enable SSH feature and try to see if you can connect and see
> what's failing/hanging. There's also /tmp/init.log in initramfs...
> 
> CC'ing ZFS maintainers for input/debug request.

Haven't given up, just needed a working dev machine :)

udev might actually help with Linux's obsession with detecting my drives out of order (despite the fact they're on an HBA card with dedicated port assignments, but I digress).  I wrote some udev rules for persistent disk names, but it doesn't look like mdev supports anything similar.  So I'll need to test that again once the ZFS issue is hunted down.  I'll also have to try the SSH thing later on.  One suspicion I have is udev may not be detecting the disks the same way mdev does, causing the pool import to fail.
Comment 4 Ivan Volosyuk 2020-08-09 08:39:22 UTC
I hit the same issue with root on ZFS. Genkernel 4.0.10 works fine.
I tried with kernels: 4.19.129, 4.19.133, 4.19.138. ZFS compiled into the kernel.

Debugging the initrc I have found out that the udev is not the issue here. The initrc hangs in 'start_volumes' function. I didn't go deeper in my investigation.

I was reproducing this in qemu/kvm as I usually test my kernel before booting my system with a new kernel.

Here is my script to test the ZFS kernel if this helpful with terminal only:

#!/bin/bash
default_version=$(cd /usr/src/linux; make -s kernelversion)
version=${1:-$default_version}
echo Testing kernel version: $version
kernel=/boot/vmlinuz-$version-x86_64
initrd=/boot/initramfs-$version-x86_64.img

qemu-system-x86_64 -kernel $kernel \
                   -initrd $initrd \
                   -cpu host \
                   -smp 8 \
                   -enable-kvm \
                   -m 16G \
                   -curses \
                        \
                   -drive file=/ssd/temp/gentoo-special1.img,format=raw,id=disk-s1,if=none,cache=none,aio=threads,discard=unmap \
                   -device ide-hd,drive=disk-s1,bus=ide.0 \
                        \
                   -drive file=/ssd/temp/gentoo-special2.img,format=raw,id=disk-s2,if=none,cache=none,aio=threads,discard=unmap \
                   -device ide-hd,drive=disk-s2,bus=ide.0 \
                        \
                   -drive file=/ssd/temp/gentoo-zfs1.img,format=raw,id=disk1,if=none,cache=none,aio=threads,discard=unmap \
                   -device ide-hd,drive=disk1,bus=ide.1 \
                        \
                   -drive file=/ssd/temp/gentoo-zfs2.img,format=raw,id=disk2,if=none,cache=none,aio=threads,discard=unmap \
                   -device ide-hd,drive=disk2,bus=ide.1 \
                   -append "dozfs root=ZFS"
Comment 5 Thomas Deutschmann (RETIRED) gentoo-dev 2020-08-09 10:10:15 UTC
Maybe we need to get rid of

> 		export ZPOOL_IMPORT_UDEV_TIMEOUT_MS=0

in start_volumes() now that we are using udev.

Could please try that change? Just edit /usr/share/genkernel/defaults/initrd.scripts and re-generate initramfs and reboot.

If this doesn't fix the problem, please add "set -x" / "set +x" around start_voulmes() to see where it stops.
Comment 6 Georgy Yakovlev archtester gentoo-dev 2020-08-09 17:17:08 UTC
yes, genkernel needs to get rid of 

export ZPOOL_IMPORT_UDEV_TIMEOUT_MS=0

this is a magic variable needed for mdev/static dev setups. it just makes zfs ignore calls to libudev completely, calls times out immediately.


here's the commit that touched

https://gitweb.gentoo.org/proj/genkernel.git/commit/defaults?id=94edd477e491da2b900e6d1b1d71884081cf09ab


just don't set it at all in any file with udev.

I'm testing now to confirm.
Comment 7 Georgy Yakovlev archtester gentoo-dev 2020-08-09 19:44:10 UTC
it hangs in start_volumes

close at 'if call_func_timeout waitForZFS 5'

> + call_func_timeout waitForZFS 5
> + local 'func=waitForZFS 'timeout=5' pid watcher
> + pid=3875
> + waitForZFS
> + watcher=3876+
> '[' '!'+ -cwait /dev/zfs 3875 ]
> 
> + exit 1
Comment 8 Larry the Git Cow gentoo-dev 2020-08-09 20:04:12 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/genkernel.git/commit/?id=5c2d7fc5bbc302c6ad6eac111e375709d7031187

commit 5c2d7fc5bbc302c6ad6eac111e375709d7031187
Author:     Thomas Deutschmann <whissi@gentoo.org>
AuthorDate: 2020-08-09 20:02:07 +0000
Commit:     Thomas Deutschmann <whissi@gentoo.org>
CommitDate: 2020-08-09 20:02:07 +0000

    defaults/initrd.scripts: start_volumes(): Don't wait for /dev/zfs
    
    This is not needed anymore and also not working.
    
    Bug: https://bugs.gentoo.org/736084
    Signed-off-by: Thomas Deutschmann <whissi@gentoo.org>

 defaults/initrd.scripts | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)
Comment 9 Joshua Kinard gentoo-dev 2020-08-10 05:12:34 UTC
(In reply to Larry the Git Cow from comment #8)
> The bug has been referenced in the following commit(s):
> 
> https://gitweb.gentoo.org/proj/genkernel.git/commit/
> ?id=5c2d7fc5bbc302c6ad6eac111e375709d7031187
> 
> commit 5c2d7fc5bbc302c6ad6eac111e375709d7031187
> Author:     Thomas Deutschmann <whissi@gentoo.org>
> AuthorDate: 2020-08-09 20:02:07 +0000
> Commit:     Thomas Deutschmann <whissi@gentoo.org>
> CommitDate: 2020-08-09 20:02:07 +0000
> 
>     defaults/initrd.scripts: start_volumes(): Don't wait for /dev/zfs
>     
>     This is not needed anymore and also not working.
>     
>     Bug: https://bugs.gentoo.org/736084
>     Signed-off-by: Thomas Deutschmann <whissi@gentoo.org>
> 
>  defaults/initrd.scripts | 13 +------------
>  1 file changed, 1 insertion(+), 12 deletions(-)

This fixes the problem for me as well.
Comment 10 Thomas Deutschmann (RETIRED) gentoo-dev 2020-08-10 09:51:13 UTC
Fix released with >=genkernel-4.1.0 final.