Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 787872 - zfs module fails to load in recent admincd
Summary: zfs module fails to load in recent admincd
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: Catalyst (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Catalyst Developers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-03 08:58 UTC by Ben Cordero
Modified: 2021-12-17 12:26 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
admincd patch (0001-releases-specs-amd64-hardened-admincd-stage2-always-.patch,1021 bytes, patch)
2021-12-16 02:44 UTC, Georgy Yakovlev
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Cordero 2021-05-03 08:58:29 UTC
In recent autobuilds, the amd64 admincd includes sys-fs/zfs-kmod compiled for 5.4.97-gentoo-x86_64, but the installed kernel is 5.10.27-gentoo-x86_64.

This mismatch means that tools like "zfs list" and "zpool status" in the livecd are no longer working.

```bash
livecd ~ # zpool import
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
```

Running `modprobe zfs` returns the following error
```bash
livecd ~ # modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/5.10.27-gentoo-x86_64
```

And indeed, we can see that this livecd is using the 5.10.27 kernel, and while the sys-fs/zfs-kmod package is installed, the modules were compiled against the wrong kernel version.

```bash
livecd ~ # ls /lib/modules
5.10.27-gentoo-x86_64 5.4.97-gentoo-x86_64
livecd ~ # tree /lib/modules/5.4.97-gentoo-x86_64
/lib/modules/5.4.97-gentoo-x86_64/
├── extra
│   ├── avl
│   │   └── zavl.ko
│   ├── icp
│   │   └── icp.ko
│   ├── lua
│   │   └── zlua.ko
│   ├── nvpair
│   │   └── znvpair.ko
│   ├── spl
│   │   └── spl.ko
│   ├── unicode
│   │   └── zunicode.ko
│   ├── zcommon
│   │   └── zcommon.ko
│   ├── zfs
│   │   └── zfs.ko
│   └── zstd
│       └── zzstd.ko
├── modules.alias
├── modules.alias.bin
├── modules.builtin.alias.bin
├── modules.builtin.bin
├── modules.dep
├── modules.dep.bin
├── modules.devname
├── modules.softdep
├── modules.symbols
└── modules.symbols.bin

10 directories, 19 files
```

For reference, I am using the admincd-amd64-20210502T214503Z.iso, which is the current autobuild available from https://distfiles.gentoo.org/releases/amd64/autobuilds/current-admincd-amd64/

This can be verified by running this ISO in a VM,

    $ qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -cdrom ./admincd-amd64-20210502T214503Z.iso

---

This appears to be a regression, as I also have an archive version of the admincd, admincd-amd64-20200618T170443Z.iso, which was using the 5.4.38-gentoo-x86_64 kernel and has the zfs modules compiled correctly.

In qemu, it is possible to load the module, and use tools like "zpool import" without error (note, it's not required to have a zfs pool to re-test this).

```bash
livecd ~ # modprobe zfs
livecd ~ # zpool import
no pools available to import
```

---

I've tried to look into why catalyst is doing this, and prepared two runs of catalyst locally.

First run - From a relatively clean system that has a livecd-stage1, run catalyst to create the kerncache and binpkgs.

Here's a copy of the livecd-stage2 spec that I was using.
https://github.com/bencord0/etc-catalyst/blob/6a89fc15e8af607c8e6561289aef50897e906efa/specs/livecd-stage2.spec

Important lines are:
```
boot/kernel/gentoo/sources: gentoo-sources
boot/kernel/gentoo/packages:
        sys-fs/zfs
```

For me, this generates a working livecd with the zfs modules installed against the 5.10.27-gentoo-x86_64 kernel.

Second run - Run the same build again, but this time with a different kernel version selected.

Here's a copy of the new spec file that I was using, the difference is that I've selected `boot/kernel/gentoo/sources: gentoo-sources:5.4.109`, which is the previously stable version that is still in-tree.
https://github.com/bencord0/etc-catalyst/blob/6a89fc15e8af607c8e6561289aef50897e906efa/specs/livecd-stage2.spec.broken

Note: These spec files work against the current 3.0.x branch of catalyst, I'm using 3.0.17.

This forces genkernel to build the new kernel (5.4.109) from source, but the callback will pull in the prebuilt sys-fs/zfs-kmod package (5.10.27).

Booting this second image reproduces the bug for me.

---

I've managed to remediate the problem by applying the following patch against the current catalyst-9999 ebuild.

diff --git a/targets/support/kmerge.sh b/targets/support/kmerge.sh
index fb67aba6..2eaa16bb 100755
--- a/targets/support/kmerge.sh
+++ b/targets/support/kmerge.sh
@@ -52,7 +52,7 @@ genkernel_compile() {
 	else
 		gk_callback_opts=(-qN)
 	fi
-	if [[ -n ${clst_KERNCACHE} ]]; then
+	if [[ -n ${clst_KERNCACHE} && ${cached_kernel_found} = "true" ]]; then
 		gk_callback_opts+=(-kb)
 	fi
 	if [[ -n ${clst_FETCH} ]]; then

Note: since e96ef61854ae6f85f90dc9f5e01b5e1743c8a6f6, this patch is incompatible with the current 3.0.x releases, since `kmerge.sh` has been refactored.

This works by ignoring the prebuilt sys-fs/kmod-zfs package (and all other prebuilt packages listed in boot/kernel/gentoo/packages) if genkernel needs to rebuild the kernel too.
Comment 1 Ben Kohler gentoo-dev 2021-05-05 19:06:01 UTC
I have pruned zfs-kmod binpkgs so the next admincd build should be a good one, but yes we need this fixed in catalyst properly, eventually.
Comment 2 Georgy Yakovlev archtester gentoo-dev 2021-12-16 02:40:42 UTC
I took care of this issue on ppc64 CDs by specifying this:

boot/kernel/4K_PAGESZ/packages: --usepkg n zfs zfs-kmod


catalyst feeds '--usepkg n' to portage/genkernel invocation that installs zfs verbatim, so it prevents it from using binpkgs completely.


admincd could use that too.
Comment 3 Georgy Yakovlev archtester gentoo-dev 2021-12-16 02:44:11 UTC
Created attachment 759260 [details, diff]
admincd patch

here's the patch.
Comment 4 Georgy Yakovlev archtester gentoo-dev 2021-12-16 02:45:15 UTC
note that I specified both zfs and zfs-kmod, because seems you were relying
on zfs pulling kmod

but we need to explicitly mention kmod to let portage rebuild it.
Comment 5 Larry the Git Cow gentoo-dev 2021-12-17 12:26:46 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/proj/releng.git/commit/?id=1d44a214557ad6d8ddaf2ab1c579c51fcb57065e

commit 1d44a214557ad6d8ddaf2ab1c579c51fcb57065e
Author:     Georgy Yakovlev <gyakovlev@gentoo.org>
AuthorDate: 2021-12-16 02:42:00 +0000
Commit:     Ben Kohler <bkohler@gentoo.org>
CommitDate: 2021-12-17 12:26:24 +0000

    releases/specs/amd64/hardened/admincd-stage2: always build fresh zfs
    
    Closes: https://bugs.gentoo.org/787872
    Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
    Signed-off-by: Ben Kohler <bkohler@gentoo.org>

 releases/specs/amd64/hardened/admincd-stage2.spec | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)