Using genkernel to compile gentoo-sources-2.6.25-r9 causes the boot to fail with
request_module: runaway loop modprobe binfmt-0000
repeated five times. This was caused by a zero length busybox as seen here:
Verification of the error is seen below:
cp /boot/initramfs-genkernel-x86-2.6.22-gentoo-r2 init.cpio.gz
cpio -i -d -H newc -F init.cpio --no-absolute-filenames
-rwxr-xr-x 8 root root 0 Nov 10 06:33 [
-rwxr-xr-x 8 root root 0 Nov 10 06:33 ash
-rwxr-xr-x 8 root root 0 Nov 10 06:33 busybox
-rwxr-xr-x 8 root root 0 Nov 10 06:33 cut
-rwxr-xr-x 8 root root 0 Nov 10 06:33 echo
-rwxr-xr-x 8 root root 0 Nov 10 06:33 mount
-rwxr-xr-x 8 root root 0 Nov 10 06:33 sh
-rwxr-xr-x 1 root root 396 Nov 10 06:33 udhcpc.scripts
-rwxr-xr-x 8 root root 0 Nov 10 06:33 uname
All those zero length files should be hard linked copies of busybox.
Steps to Reproduce:
1. run genkernel with defaults from liveCD (via zcat /proc/config.gz > /etc/kernels/kernelfile)
2. configure grub and boot
3. error as above
The initramfs was created incorrectly, but genkernel reported no errors. When it attempts to boot, since important parts are zero length, it fails with the binfmt-0000 (it attempts to read the header and gets nulls).
It should either create the initramfs correctly or check to see if it failed and warn the user that trying to boot will fail.
I was unable to boot to a previous kernel as all of them exhibited this behavior (obviously the working one had been deleted). To resolve the situation, I booted from the LiveCD, chrooted to the install, and tried to run genkernel there. It also failed. The only way I could get it to work was to chroot to the install from the LiveCD, run
emerge --emptytree genkernel
and then run genkernel all.
After a reboot into the working system, I tried to run genkernel again, and it created a bad kernel. Of course, genkernel's default names for the same kernel are the same, so I had to use the chroot and --emptytree to get it working again. I'm testing it again to narrow down what's happening, with a copy of the working kernel safely renamed.
It also failed on a 2.6.22-r2 kernel, and failed both on the config from the LiveCD and a custom --menuconfig with only the drivers I needed.
Genkernel produced no error messages that I could see. The hints to the solution were found by reading this thread:
which implied that cpio was failing silently for some reason.
Genkernel is version 3.4.10
emerge --info and other support files will be attached.
Of note: the filesystem is xfs on a 2.5 TB 3ware array using a GPT partition table, but it doesn't even get far enough for those to be an issue.
Created attachment 171391 [details]
the emerge info output
I just confirmed that it made a bad initramfs, which I'll attach next.
This is the emerge --info output.
http://schnecke.bombcar.com/random/initramfs-genkernel-x86-2.6.25-gentoo-r9 is the bad initramfs (it will be hosted for a decent amount of time).
I just created it with genkernel all, however, rerunning genkernel --loglevel=5 initrd made a good initramfs!
Created attachment 171393 [details]
output of genkernel
Here is the --loglevel=5 output from genkernel. This run created a bad initramfs. Rerunning "genkernel initrd" right afterwards made another bad initramfs, even with --loglevel=5. It is as if it fails sometimes, succeeds sometimes.
Can you add something like:
|| gen_die "Failed to append busybox to cpio"
to the end of the line calling cpio in append_busybox() in gen_initramfs.sh?
I've never been able to reproduce this problem, so I don't know if this problem causes cpio to return a non-zero exit code. If that gen_die triggers in the case of a bad generation for you, I'll add code like that in git so we can at least *detect* when things go wrong.
I added that to the appropriate line, but it didn't die and produced a bad initramfs.
* Gentoo Linux Genkernel; Version 3.4.10
* Running with options: all
* Linux Kernel 2.6.25-gentoo-r9 for x86...
* kernel: >> Running mrproper...
* config: Using config from /etc/kernels/kernel-config-x86-2.6.25-gentoo-r9
* Previous config backed up to .config.bak
* >> Running oldconfig...
* kernel: >> Cleaning...
* >> Compiling 2.6.25-gentoo-r9 bzImage...
* >> Compiling 2.6.25-gentoo-r9 modules...
* Copying config for successful build to /etc/kernels/kernel-config-x86-2.6.25-gentoo-r9
* busybox: >> Using cache
* initramfs: >> Initializing...
* >> Appending base_layout cpio data...
* >> Appending auxilary cpio data...
* >> Appending busybox cpio data...
* >> Appending modules cpio data...
* Kernel compiled successfully!
* Required Kernel Parameters:
* Where $ROOT is the device node for your root partition as the
* one specified in /etc/fstab
* If you require Genkernel's hardware detection features; you MUST
* tell your bootloader to use the provided INITRAMFS file. Otherwise;
* substitute the root argument for the real_root argument if you are
* not planning to use the initrd...
* WARNING... WARNING... WARNING...
* Additional kernel cmdline arguments that *may* be required to boot properly...
* Do NOT report kernel bugs as genkernel bugs unless your bug
* is about the default genkernel configuration...
* Make sure you have the latest genkernel before reporting bugs.
I have the same issue as described by Tom. Has this been worked on since then?
It was a "random" issue that I could never reproduce myself. Sometimes cpio just "failed" to append the busybox stuff without any indication why (or even a non-zero exit code, iirc).
Yeah, it only happened 2 out of 4 times for me. I'll try again tonight and see if it happens again.
Perhaps we could put in a check that unrolls the CPIO bundle and verifies that it was appended correctly?
Instead of applying a bandaid when the festering sore isn't even ours, maybe the issue should be taken upstream to the cpio people to figure out why the hell it doesn't realize itself that it hasn't done anything...
There is a new version of cpio (2.10) which reports:
Fix exit codes to reliably indicate success or failure of the operation.
I'll try to reproduce this issue with that version.
I currently have this bug all of the time, not just randomly. In the past, it was random, but now seems to be happening no matter what I try. I'm not a developer, but let me know if I can try anything on my end since I have a reproducible setup.
I don't have a guaranteed reproducible setup. However, I compiled 2.6.29 with both cpio 2.9 and cpio 2.10 (using ~x86) - both produced zero length files in the bin directory.
So it looks like cpio 2.10 at least doesn't fix the issue.
I just ran a couple of tests:
1. Compile 2.6.29-r3 and get a working initramfs.
2. Recompile 2.6.29-r3 and get a missing busybox.
3. Switch to 2.6.29-r2, compile, and get a working initramfs.
4. Recompile 2.6.29-r2 and get a missing busybox.
5. Recompile 2.6.29-r2 and get a missing busybox.
6. Switch to 2.6.29-r3, compile, and get a working initramfs.
7. Recompile 2.6.29-r3 and get a missing busybox.
8. Recompile 2.6.29-r3 and get a missing busybox.
Does this indicate some kind of stale cache after each "success" case that can also be triggered by switching kernel versions?
The busybox caching code should only be enabled if it's specifically requested. We use this functionality within catalyst for building CDs, and I've never seen an issue with it.
I have one box here that exhibits this behavior at random. It's really driving me crazy. This box also has a 3ware RAID controller, uses XFS as the root filesystem, and the root Filesystem is 1.5TB. Maybe the big filesystem could be part of the problem.
I've checked the filesystem numerous times, tested the hardware toroughly and tried all combinations of (cpio 2.9-r2, cpio 2.10) and (genkernel 220.127.116.114, genkernel 18.104.22.1686).
I've noticed that the problem seems to disappear if i first emerge anything or delete /var/tmp/genkernel.
UPDATE: I've put /var/tmp/genkernel on a 500MiB loopback filesystem (ext2) and the problem is now GONE! So it really looks like some problem with XFS + cpio.
Ok, I'd like to thank <email@example.com> for giving me access to his system where he had reproduced the problem.
The problem is specifically the following:
- On XFS only
- appending hardlinks into a cpio archive.
Additionally, please note that the 'cpio -t' output is not accurate in that it shows hardlinked files as zero-bytes. You have to actually extract to check.
Can we make do with symlinks instead?
I've switched the code to use symlinks in git. However, your conclusion doesn't entirely make sense. If it was the symlinks, shouldn't the busybox binary be >0 since it's straight copied? Also, how can cpio tell the difference between a hardlink and a "normal" file? Don't they look the same, except for a >1 reference count on the inode?
This page does seem to collaborate your findings:
Doh, 64-bit inodes, I should have remembered that from all my past dealings with XFS. It's inode allocation policy means you might get inodes that are identical in the lower 32 bits, but differ in the upper 32. If you're only looking at the lower 32, you're in trouble then.
I think we need to get our cpio bumped to include that patch (nice find btw).
Confirmed; I am using XFS, once I symlinked /var/tmp/genkernel to a directory on my ext3 /boot partition, it worked correctly.
While this problem really seems to be linked with XFS, i have to add that my system runs a 32-bit kernel and i am NOT using the inode64 mount option. Maybe it also happens with 32-bit inodes when they are above 2^31 and some signed integer wraps around. Or XFS is really using 64-bit inodes even though the inode64 mount option is not used. Strange to say the least.
Is there any (unstable?) cpio version in the portage tree which incorporates the 64-bit inode patch?
I'm having this prob and I'm using 64bit with XFS
cpio-2.10-r1 now contains the patch from the upstream mailing list for the inode creation. Not sure when upstream is planning on releasing a 2.11 however.
Thanks for the info Robin. I can confirm that unmasking and installing cpio-2.10-r1 indeed solved the problem on my system. I hope that this can be pushed to stable soon.
If I understood all of this correctly, the problem involved
1) usage of XFS
2) usage of cpio prior to 2.11
3) missing checks of return codes
As cpio 2.11 is in stable now, I suppose (1) and (2) are done.
For (3) I have added these two commits to the experimental branch exposed by genkernel-99999 (five nines):
I assume we can close this bug. Please report back how genkernel-99999 works for you. If I do not hear anything in two weeks, I may close this bug.