Summary: | sys-kernel/genkernel sometimes creates initramfs that is missing busybox causing request_module: runaway loop modprobe binfmt-0000 | ||
---|---|---|---|
Product: | Gentoo Hosted Projects | Reporter: | Tom Dickson <gentoo> |
Component: | genkernel | Assignee: | Gentoo Genkernel Maintainers <genkernel> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | gentoo.bugs, gordonp, sping, webmaster |
Priority: | High | Keywords: | InVCS |
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 351772 | ||
Attachments: |
the emerge info output
output of genkernel |
Description
Tom Dickson
2008-11-11 03:08:36 UTC
Created attachment 171391 [details]
the emerge info output
I just confirmed that it made a bad initramfs, which I'll attach next.
This is the emerge --info output.
http://schnecke.bombcar.com/random/initramfs-genkernel-x86-2.6.25-gentoo-r9 is the bad initramfs (it will be hosted for a decent amount of time). I just created it with genkernel all, however, rerunning genkernel --loglevel=5 initrd made a good initramfs! Created attachment 171393 [details]
output of genkernel
Here is the --loglevel=5 output from genkernel. This run created a bad initramfs. Rerunning "genkernel initrd" right afterwards made another bad initramfs, even with --loglevel=5. It is as if it fails sometimes, succeeds sometimes.
Can you add something like: || gen_die "Failed to append busybox to cpio" to the end of the line calling cpio in append_busybox() in gen_initramfs.sh? I've never been able to reproduce this problem, so I don't know if this problem causes cpio to return a non-zero exit code. If that gen_die triggers in the case of a bad generation for you, I'll add code like that in git so we can at least *detect* when things go wrong. I added that to the appropriate line, but it didn't die and produced a bad initramfs. * Gentoo Linux Genkernel; Version 3.4.10 * Running with options: all * Linux Kernel 2.6.25-gentoo-r9 for x86... * kernel: >> Running mrproper... * config: Using config from /etc/kernels/kernel-config-x86-2.6.25-gentoo-r9 * Previous config backed up to .config.bak * >> Running oldconfig... * kernel: >> Cleaning... * >> Compiling 2.6.25-gentoo-r9 bzImage... * >> Compiling 2.6.25-gentoo-r9 modules... * Copying config for successful build to /etc/kernels/kernel-config-x86-2.6.25-gentoo-r9 * busybox: >> Using cache * initramfs: >> Initializing... * >> Appending base_layout cpio data... * >> Appending auxilary cpio data... * >> Appending busybox cpio data... * >> Appending modules cpio data... * * Kernel compiled successfully! * * Required Kernel Parameters: * real_root=/dev/$ROOT * * Where $ROOT is the device node for your root partition as the * one specified in /etc/fstab * * If you require Genkernel's hardware detection features; you MUST * tell your bootloader to use the provided INITRAMFS file. Otherwise; * substitute the root argument for the real_root argument if you are * not planning to use the initrd... * WARNING... WARNING... WARNING... * Additional kernel cmdline arguments that *may* be required to boot properly... * Do NOT report kernel bugs as genkernel bugs unless your bug * is about the default genkernel configuration... * * Make sure you have the latest genkernel before reporting bugs. I have the same issue as described by Tom. Has this been worked on since then? It was a "random" issue that I could never reproduce myself. Sometimes cpio just "failed" to append the busybox stuff without any indication why (or even a non-zero exit code, iirc). Yeah, it only happened 2 out of 4 times for me. I'll try again tonight and see if it happens again. Perhaps we could put in a check that unrolls the CPIO bundle and verifies that it was appended correctly? Instead of applying a bandaid when the festering sore isn't even ours, maybe the issue should be taken upstream to the cpio people to figure out why the hell it doesn't realize itself that it hasn't done anything... There is a new version of cpio (2.10) which reports: Fix exit codes to reliably indicate success or failure of the operation. I'll try to reproduce this issue with that version. I currently have this bug all of the time, not just randomly. In the past, it was random, but now seems to be happening no matter what I try. I'm not a developer, but let me know if I can try anything on my end since I have a reproducible setup. I don't have a guaranteed reproducible setup. However, I compiled 2.6.29 with both cpio 2.9 and cpio 2.10 (using ~x86) - both produced zero length files in the bin directory. So it looks like cpio 2.10 at least doesn't fix the issue. I just ran a couple of tests: 1. Compile 2.6.29-r3 and get a working initramfs. 2. Recompile 2.6.29-r3 and get a missing busybox. 3. Switch to 2.6.29-r2, compile, and get a working initramfs. 4. Recompile 2.6.29-r2 and get a missing busybox. 5. Recompile 2.6.29-r2 and get a missing busybox. 6. Switch to 2.6.29-r3, compile, and get a working initramfs. 7. Recompile 2.6.29-r3 and get a missing busybox. 8. Recompile 2.6.29-r3 and get a missing busybox. etc... Does this indicate some kind of stale cache after each "success" case that can also be triggered by switching kernel versions? The busybox caching code should only be enabled if it's specifically requested. We use this functionality within catalyst for building CDs, and I've never seen an issue with it. I have one box here that exhibits this behavior at random. It's really driving me crazy. This box also has a 3ware RAID controller, uses XFS as the root filesystem, and the root Filesystem is 1.5TB. Maybe the big filesystem could be part of the problem. I've checked the filesystem numerous times, tested the hardware toroughly and tried all combinations of (cpio 2.9-r2, cpio 2.10) and (genkernel 3.4.10.904, genkernel 3.4.10.906). I've noticed that the problem seems to disappear if i first emerge anything or delete /var/tmp/genkernel. UPDATE: I've put /var/tmp/genkernel on a 500MiB loopback filesystem (ext2) and the problem is now GONE! So it really looks like some problem with XFS + cpio. Ok, I'd like to thank <gordonp@sfu.ca> for giving me access to his system where he had reproduced the problem. The problem is specifically the following: - On XFS only - appending hardlinks into a cpio archive. Additionally, please note that the 'cpio -t' output is not accurate in that it shows hardlinked files as zero-bytes. You have to actually extract to check. agaffney: Can we make do with symlinks instead? I've switched the code to use symlinks in git. However, your conclusion doesn't entirely make sense. If it was the symlinks, shouldn't the busybox binary be >0 since it's straight copied? Also, how can cpio tell the difference between a hardlink and a "normal" file? Don't they look the same, except for a >1 reference count on the inode? This page does seem to collaborate your findings: http://www.mail-archive.com/bug-cpio@gnu.org/msg00298.html Doh, 64-bit inodes, I should have remembered that from all my past dealings with XFS. It's inode allocation policy means you might get inodes that are identical in the lower 32 bits, but differ in the upper 32. If you're only looking at the lower 32, you're in trouble then. I think we need to get our cpio bumped to include that patch (nice find btw). Confirmed; I am using XFS, once I symlinked /var/tmp/genkernel to a directory on my ext3 /boot partition, it worked correctly. While this problem really seems to be linked with XFS, i have to add that my system runs a 32-bit kernel and i am NOT using the inode64 mount option. Maybe it also happens with 32-bit inodes when they are above 2^31 and some signed integer wraps around. Or XFS is really using 64-bit inodes even though the inode64 mount option is not used. Strange to say the least. Is there any (unstable?) cpio version in the portage tree which incorporates the 64-bit inode patch? I'm having this prob and I'm using 64bit with XFS cpio-2.10-r1 now contains the patch from the upstream mailing list for the inode creation. Not sure when upstream is planning on releasing a 2.11 however. Thanks for the info Robin. I can confirm that unmasking and installing cpio-2.10-r1 indeed solved the problem on my system. I hope that this can be pushed to stable soon. If I understood all of this correctly, the problem involved 1) usage of XFS 2) usage of cpio prior to 2.11 3) missing checks of return codes As cpio 2.11 is in stable now, I suppose (1) and (2) are done. For (3) I have added these two commits to the experimental branch exposed by genkernel-99999 (five nines): http://git.overlays.gentoo.org/gitweb/?p=proj/genkernel.git;a=commitdiff;h=fcdece1b0e232b02bcfdfab884b64d4e1ad1cfd3 http://git.overlays.gentoo.org/gitweb/?p=proj/genkernel.git;a=commitdiff;h=398daeb3b375b539d0ec37f90a0b32728fa48653 I assume we can close this bug. Please report back how genkernel-99999 works for you. If I do not hear anything in two weeks, I may close this bug. |