|Summary:||sys-kernel/genkernel sometimes creates initramfs that is missing busybox causing request_module: runaway loop modprobe binfmt-0000|
|Product:||Gentoo Hosted Projects||Reporter:||Tom Dickson <gentoo>|
|Component:||genkernel||Assignee:||Gentoo Genkernel Maintainers <genkernel>|
|Severity:||major||CC:||gentoo.bugs, gordonp, sping, webmaster|
|Package list:||Runtime testing required:||---|
|Bug Depends on:|
the emerge info output
output of genkernel
Description Tom Dickson 2008-11-11 03:08:36 UTC
Using genkernel to compile gentoo-sources-2.6.25-r9 causes the boot to fail with request_module: runaway loop modprobe binfmt-0000 repeated five times. This was caused by a zero length busybox as seen here: Verification of the error is seen below: cd tmp cp /boot/initramfs-genkernel-x86-2.6.22-gentoo-r2 init.cpio.gz gunzip init.cpio.gz cpio -i -d -H newc -F init.cpio --no-absolute-filenames cd bin ls -l total 4 -rwxr-xr-x 8 root root 0 Nov 10 06:33 [ -rwxr-xr-x 8 root root 0 Nov 10 06:33 ash -rwxr-xr-x 8 root root 0 Nov 10 06:33 busybox -rwxr-xr-x 8 root root 0 Nov 10 06:33 cut -rwxr-xr-x 8 root root 0 Nov 10 06:33 echo -rwxr-xr-x 8 root root 0 Nov 10 06:33 mount -rwxr-xr-x 8 root root 0 Nov 10 06:33 sh -rwxr-xr-x 1 root root 396 Nov 10 06:33 udhcpc.scripts -rwxr-xr-x 8 root root 0 Nov 10 06:33 uname All those zero length files should be hard linked copies of busybox. Reproducible: Sometimes Steps to Reproduce: 1. run genkernel with defaults from liveCD (via zcat /proc/config.gz > /etc/kernels/kernelfile) 2. configure grub and boot 3. error as above Actual Results: The initramfs was created incorrectly, but genkernel reported no errors. When it attempts to boot, since important parts are zero length, it fails with the binfmt-0000 (it attempts to read the header and gets nulls). Expected Results: It should either create the initramfs correctly or check to see if it failed and warn the user that trying to boot will fail. I was unable to boot to a previous kernel as all of them exhibited this behavior (obviously the working one had been deleted). To resolve the situation, I booted from the LiveCD, chrooted to the install, and tried to run genkernel there. It also failed. The only way I could get it to work was to chroot to the install from the LiveCD, run emerge --emptytree genkernel and then run genkernel all. After a reboot into the working system, I tried to run genkernel again, and it created a bad kernel. Of course, genkernel's default names for the same kernel are the same, so I had to use the chroot and --emptytree to get it working again. I'm testing it again to narrow down what's happening, with a copy of the working kernel safely renamed. It also failed on a 2.6.22-r2 kernel, and failed both on the config from the LiveCD and a custom --menuconfig with only the drivers I needed. Genkernel produced no error messages that I could see. The hints to the solution were found by reading this thread: http://firstname.lastname@example.org/msg00979.html which implied that cpio was failing silently for some reason. Genkernel is version 3.4.10 emerge --info and other support files will be attached. Of note: the filesystem is xfs on a 2.5 TB 3ware array using a GPT partition table, but it doesn't even get far enough for those to be an issue.
Comment 1 Tom Dickson 2008-11-11 03:21:51 UTC
Created attachment 171391 [details] the emerge info output I just confirmed that it made a bad initramfs, which I'll attach next. This is the emerge --info output.
Comment 2 Tom Dickson 2008-11-11 03:33:47 UTC
http://schnecke.bombcar.com/random/initramfs-genkernel-x86-2.6.25-gentoo-r9 is the bad initramfs (it will be hosted for a decent amount of time). I just created it with genkernel all, however, rerunning genkernel --loglevel=5 initrd made a good initramfs!
Comment 3 Tom Dickson 2008-11-11 04:06:52 UTC
Created attachment 171393 [details] output of genkernel Here is the --loglevel=5 output from genkernel. This run created a bad initramfs. Rerunning "genkernel initrd" right afterwards made another bad initramfs, even with --loglevel=5. It is as if it fails sometimes, succeeds sometimes.
Comment 4 Andrew Gaffney (RETIRED) 2008-11-11 15:14:22 UTC
Can you add something like: || gen_die "Failed to append busybox to cpio" to the end of the line calling cpio in append_busybox() in gen_initramfs.sh? I've never been able to reproduce this problem, so I don't know if this problem causes cpio to return a non-zero exit code. If that gen_die triggers in the case of a bad generation for you, I'll add code like that in git so we can at least *detect* when things go wrong.
Comment 5 Tom Dickson 2008-11-11 16:51:55 UTC
I added that to the appropriate line, but it didn't die and produced a bad initramfs. * Gentoo Linux Genkernel; Version 3.4.10 * Running with options: all * Linux Kernel 2.6.25-gentoo-r9 for x86... * kernel: >> Running mrproper... * config: Using config from /etc/kernels/kernel-config-x86-2.6.25-gentoo-r9 * Previous config backed up to .config.bak * >> Running oldconfig... * kernel: >> Cleaning... * >> Compiling 2.6.25-gentoo-r9 bzImage... * >> Compiling 2.6.25-gentoo-r9 modules... * Copying config for successful build to /etc/kernels/kernel-config-x86-2.6.25-gentoo-r9 * busybox: >> Using cache * initramfs: >> Initializing... * >> Appending base_layout cpio data... * >> Appending auxilary cpio data... * >> Appending busybox cpio data... * >> Appending modules cpio data... * * Kernel compiled successfully! * * Required Kernel Parameters: * real_root=/dev/$ROOT * * Where $ROOT is the device node for your root partition as the * one specified in /etc/fstab * * If you require Genkernel's hardware detection features; you MUST * tell your bootloader to use the provided INITRAMFS file. Otherwise; * substitute the root argument for the real_root argument if you are * not planning to use the initrd... * WARNING... WARNING... WARNING... * Additional kernel cmdline arguments that *may* be required to boot properly... * Do NOT report kernel bugs as genkernel bugs unless your bug * is about the default genkernel configuration... * * Make sure you have the latest genkernel before reporting bugs.
Comment 6 Dallas 2009-07-15 19:15:35 UTC
I have the same issue as described by Tom. Has this been worked on since then?
Comment 7 Andrew Gaffney (RETIRED) 2009-07-15 19:50:08 UTC
It was a "random" issue that I could never reproduce myself. Sometimes cpio just "failed" to append the busybox stuff without any indication why (or even a non-zero exit code, iirc).
Comment 8 Tom Dickson 2009-07-15 19:52:24 UTC
Yeah, it only happened 2 out of 4 times for me. I'll try again tonight and see if it happens again. Perhaps we could put in a check that unrolls the CPIO bundle and verifies that it was appended correctly?
Comment 9 Andrew Gaffney (RETIRED) 2009-07-15 20:08:32 UTC
Instead of applying a bandaid when the festering sore isn't even ours, maybe the issue should be taken upstream to the cpio people to figure out why the hell it doesn't realize itself that it hasn't done anything...
Comment 10 Tom Dickson 2009-07-15 21:38:16 UTC
There is a new version of cpio (2.10) which reports: Fix exit codes to reliably indicate success or failure of the operation. I'll try to reproduce this issue with that version.
Comment 11 Dallas 2009-07-16 15:51:37 UTC
I currently have this bug all of the time, not just randomly. In the past, it was random, but now seems to be happening no matter what I try. I'm not a developer, but let me know if I can try anything on my end since I have a reproducible setup.
Comment 12 Tom Dickson 2009-07-16 15:56:09 UTC
I don't have a guaranteed reproducible setup. However, I compiled 2.6.29 with both cpio 2.9 and cpio 2.10 (using ~x86) - both produced zero length files in the bin directory. So it looks like cpio 2.10 at least doesn't fix the issue.
Comment 13 Dallas 2009-07-16 21:35:21 UTC
I just ran a couple of tests: 1. Compile 2.6.29-r3 and get a working initramfs. 2. Recompile 2.6.29-r3 and get a missing busybox. 3. Switch to 2.6.29-r2, compile, and get a working initramfs. 4. Recompile 2.6.29-r2 and get a missing busybox. 5. Recompile 2.6.29-r2 and get a missing busybox. 6. Switch to 2.6.29-r3, compile, and get a working initramfs. 7. Recompile 2.6.29-r3 and get a missing busybox. 8. Recompile 2.6.29-r3 and get a missing busybox. etc... Does this indicate some kind of stale cache after each "success" case that can also be triggered by switching kernel versions?
Comment 14 Andrew Gaffney (RETIRED) 2009-07-19 22:13:22 UTC
The busybox caching code should only be enabled if it's specifically requested. We use this functionality within catalyst for building CDs, and I've never seen an issue with it.
Comment 15 Michael Weissenbacher 2009-11-05 16:51:14 UTC
I have one box here that exhibits this behavior at random. It's really driving me crazy. This box also has a 3ware RAID controller, uses XFS as the root filesystem, and the root Filesystem is 1.5TB. Maybe the big filesystem could be part of the problem. I've checked the filesystem numerous times, tested the hardware toroughly and tried all combinations of (cpio 2.9-r2, cpio 2.10) and (genkernel 126.96.36.1994, genkernel 188.8.131.526). I've noticed that the problem seems to disappear if i first emerge anything or delete /var/tmp/genkernel. UPDATE: I've put /var/tmp/genkernel on a 500MiB loopback filesystem (ext2) and the problem is now GONE! So it really looks like some problem with XFS + cpio.
Comment 16 Robin Johnson 2009-12-06 07:07:44 UTC
Ok, I'd like to thank <email@example.com> for giving me access to his system where he had reproduced the problem. The problem is specifically the following: - On XFS only - appending hardlinks into a cpio archive. Additionally, please note that the 'cpio -t' output is not accurate in that it shows hardlinked files as zero-bytes. You have to actually extract to check. agaffney: Can we make do with symlinks instead?
Comment 17 Andrew Gaffney (RETIRED) 2009-12-06 15:44:39 UTC
I've switched the code to use symlinks in git. However, your conclusion doesn't entirely make sense. If it was the symlinks, shouldn't the busybox binary be >0 since it's straight copied? Also, how can cpio tell the difference between a hardlink and a "normal" file? Don't they look the same, except for a >1 reference count on the inode?
Comment 18 Andrew Gaffney (RETIRED) 2009-12-06 15:46:55 UTC
This page does seem to collaborate your findings: http://firstname.lastname@example.org/msg00298.html
Comment 19 Robin Johnson 2009-12-06 17:13:45 UTC
Doh, 64-bit inodes, I should have remembered that from all my past dealings with XFS. It's inode allocation policy means you might get inodes that are identical in the lower 32 bits, but differ in the upper 32. If you're only looking at the lower 32, you're in trouble then. I think we need to get our cpio bumped to include that patch (nice find btw).
Comment 20 Tom Dickson 2009-12-11 06:18:49 UTC
Confirmed; I am using XFS, once I symlinked /var/tmp/genkernel to a directory on my ext3 /boot partition, it worked correctly.
Comment 21 Michael Weissenbacher 2009-12-11 07:46:10 UTC
While this problem really seems to be linked with XFS, i have to add that my system runs a 32-bit kernel and i am NOT using the inode64 mount option. Maybe it also happens with 32-bit inodes when they are above 2^31 and some signed integer wraps around. Or XFS is really using 64-bit inodes even though the inode64 mount option is not used. Strange to say the least. Is there any (unstable?) cpio version in the portage tree which incorporates the 64-bit inode patch?
Comment 22 Richard Scott 2009-12-11 17:27:53 UTC
I'm having this prob and I'm using 64bit with XFS
Comment 23 Robin Johnson 2010-01-20 07:31:23 UTC
cpio-2.10-r1 now contains the patch from the upstream mailing list for the inode creation. Not sure when upstream is planning on releasing a 2.11 however.
Comment 24 Michael Weissenbacher 2010-01-20 14:01:19 UTC
Thanks for the info Robin. I can confirm that unmasking and installing cpio-2.10-r1 indeed solved the problem on my system. I hope that this can be pushed to stable soon.
Comment 25 Sebastian Pipping 2011-01-11 16:44:52 UTC
If I understood all of this correctly, the problem involved 1) usage of XFS 2) usage of cpio prior to 2.11 3) missing checks of return codes As cpio 2.11 is in stable now, I suppose (1) and (2) are done. For (3) I have added these two commits to the experimental branch exposed by genkernel-99999 (five nines): http://git.overlays.gentoo.org/gitweb/?p=proj/genkernel.git;a=commitdiff;h=fcdece1b0e232b02bcfdfab884b64d4e1ad1cfd3 http://git.overlays.gentoo.org/gitweb/?p=proj/genkernel.git;a=commitdiff;h=398daeb3b375b539d0ec37f90a0b32728fa48653 I assume we can close this bug. Please report back how genkernel-99999 works for you. If I do not hear anything in two weeks, I may close this bug.