In recent autobuilds, the amd64 admincd includes sys-fs/zfs-kmod compiled for 5.4.97-gentoo-x86_64, but the installed kernel is 5.10.27-gentoo-x86_64. This mismatch means that tools like "zfs list" and "zpool status" in the livecd are no longer working. ```bash livecd ~ # zpool import The ZFS modules are not loaded. Try running '/sbin/modprobe zfs' as root to load them. ``` Running `modprobe zfs` returns the following error ```bash livecd ~ # modprobe zfs modprobe: FATAL: Module zfs not found in directory /lib/modules/5.10.27-gentoo-x86_64 ``` And indeed, we can see that this livecd is using the 5.10.27 kernel, and while the sys-fs/zfs-kmod package is installed, the modules were compiled against the wrong kernel version. ```bash livecd ~ # ls /lib/modules 5.10.27-gentoo-x86_64 5.4.97-gentoo-x86_64 livecd ~ # tree /lib/modules/5.4.97-gentoo-x86_64 /lib/modules/5.4.97-gentoo-x86_64/ ├── extra │ ├── avl │ │ └── zavl.ko │ ├── icp │ │ └── icp.ko │ ├── lua │ │ └── zlua.ko │ ├── nvpair │ │ └── znvpair.ko │ ├── spl │ │ └── spl.ko │ ├── unicode │ │ └── zunicode.ko │ ├── zcommon │ │ └── zcommon.ko │ ├── zfs │ │ └── zfs.ko │ └── zstd │ └── zzstd.ko ├── modules.alias ├── modules.alias.bin ├── modules.builtin.alias.bin ├── modules.builtin.bin ├── modules.dep ├── modules.dep.bin ├── modules.devname ├── modules.softdep ├── modules.symbols └── modules.symbols.bin 10 directories, 19 files ``` For reference, I am using the admincd-amd64-20210502T214503Z.iso, which is the current autobuild available from https://distfiles.gentoo.org/releases/amd64/autobuilds/current-admincd-amd64/ This can be verified by running this ISO in a VM, $ qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -cdrom ./admincd-amd64-20210502T214503Z.iso --- This appears to be a regression, as I also have an archive version of the admincd, admincd-amd64-20200618T170443Z.iso, which was using the 5.4.38-gentoo-x86_64 kernel and has the zfs modules compiled correctly. In qemu, it is possible to load the module, and use tools like "zpool import" without error (note, it's not required to have a zfs pool to re-test this). ```bash livecd ~ # modprobe zfs livecd ~ # zpool import no pools available to import ``` --- I've tried to look into why catalyst is doing this, and prepared two runs of catalyst locally. First run - From a relatively clean system that has a livecd-stage1, run catalyst to create the kerncache and binpkgs. Here's a copy of the livecd-stage2 spec that I was using. https://github.com/bencord0/etc-catalyst/blob/6a89fc15e8af607c8e6561289aef50897e906efa/specs/livecd-stage2.spec Important lines are: ``` boot/kernel/gentoo/sources: gentoo-sources boot/kernel/gentoo/packages: sys-fs/zfs ``` For me, this generates a working livecd with the zfs modules installed against the 5.10.27-gentoo-x86_64 kernel. Second run - Run the same build again, but this time with a different kernel version selected. Here's a copy of the new spec file that I was using, the difference is that I've selected `boot/kernel/gentoo/sources: gentoo-sources:5.4.109`, which is the previously stable version that is still in-tree. https://github.com/bencord0/etc-catalyst/blob/6a89fc15e8af607c8e6561289aef50897e906efa/specs/livecd-stage2.spec.broken Note: These spec files work against the current 3.0.x branch of catalyst, I'm using 3.0.17. This forces genkernel to build the new kernel (5.4.109) from source, but the callback will pull in the prebuilt sys-fs/zfs-kmod package (5.10.27). Booting this second image reproduces the bug for me. --- I've managed to remediate the problem by applying the following patch against the current catalyst-9999 ebuild. diff --git a/targets/support/kmerge.sh b/targets/support/kmerge.sh index fb67aba6..2eaa16bb 100755 --- a/targets/support/kmerge.sh +++ b/targets/support/kmerge.sh @@ -52,7 +52,7 @@ genkernel_compile() { else gk_callback_opts=(-qN) fi - if [[ -n ${clst_KERNCACHE} ]]; then + if [[ -n ${clst_KERNCACHE} && ${cached_kernel_found} = "true" ]]; then gk_callback_opts+=(-kb) fi if [[ -n ${clst_FETCH} ]]; then Note: since e96ef61854ae6f85f90dc9f5e01b5e1743c8a6f6, this patch is incompatible with the current 3.0.x releases, since `kmerge.sh` has been refactored. This works by ignoring the prebuilt sys-fs/kmod-zfs package (and all other prebuilt packages listed in boot/kernel/gentoo/packages) if genkernel needs to rebuild the kernel too.
I have pruned zfs-kmod binpkgs so the next admincd build should be a good one, but yes we need this fixed in catalyst properly, eventually.
I took care of this issue on ppc64 CDs by specifying this: boot/kernel/4K_PAGESZ/packages: --usepkg n zfs zfs-kmod catalyst feeds '--usepkg n' to portage/genkernel invocation that installs zfs verbatim, so it prevents it from using binpkgs completely. admincd could use that too.
Created attachment 759260 [details, diff] admincd patch here's the patch.
note that I specified both zfs and zfs-kmod, because seems you were relying on zfs pulling kmod but we need to explicitly mention kmod to let portage rebuild it.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/proj/releng.git/commit/?id=1d44a214557ad6d8ddaf2ab1c579c51fcb57065e commit 1d44a214557ad6d8ddaf2ab1c579c51fcb57065e Author: Georgy Yakovlev <gyakovlev@gentoo.org> AuthorDate: 2021-12-16 02:42:00 +0000 Commit: Ben Kohler <bkohler@gentoo.org> CommitDate: 2021-12-17 12:26:24 +0000 releases/specs/amd64/hardened/admincd-stage2: always build fresh zfs Closes: https://bugs.gentoo.org/787872 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org> Signed-off-by: Ben Kohler <bkohler@gentoo.org> releases/specs/amd64/hardened/admincd-stage2.spec | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)