Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 571182 - sys-kernel/dracut-044 creates "garbage" archives, says kernel
Summary: sys-kernel/dracut-044 creates "garbage" archives, says kernel
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Patrick McLean
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-07 17:38 UTC by Duncan
Modified: 2016-01-12 14:16 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info dracut (dracut.emerge.info,6.89 KB, text/plain)
2016-01-07 17:38 UTC, Duncan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan 2016-01-07 17:38:02 UTC
Created attachment 422216 [details]
emerge --info dracut

I was up all nite tracking this down, between changed dracut installed packages, gcc5, new kernel... and it's dracut-044!

At least here, it consistently produces cpio archives that the kernel refuses to boot, with a "junk in compressed archive" message.

I have dracut setup to create uncompressed cpio archives, which the kernel is pointed at via CONFIG_INITRAMFS_SOURCE.  The kernel build then compresses the cpio and appends it to vmlinuz, so each kernel comes complete with its own initramfs in the same vmlinuz file, such that should a kernel or its initramfs be broken, I can simply boot an older kernel along with its tested-working initramfs.

This has worked fine for some time now, multiple dracut and kernel versions.

But with yesterday's upgrade of both dracut (with a new cpio build afterward) and the kernel, I got the dreaded "junk in compressed archive" error and the new kernel refused to boot.

Note for trying to repeat: I originally had problems tracing it down, as the kernel's INITRAMFS_SOURCE option actually points to a symlink, so I can switch it between cpio builds easier.  Unfortunately, simply changing the symlink apparently wouldn't trigger make's change detection, and I eventually found I needed to clean up the kernel build so it redid everything in ordered to get it to build with the new cpio archive after I changed the symlink.  (Changing the config would have probably done it too, but I found simply wiping my output dir of all but .config* worked.)  Also, I was naming my old archives *.cpio.old, tho the symlink still said *.cpio, and apparently the kernel saw the dereferenced file extension wasn't cpio and created a junk vmlinuz, apparently without the cpio attached at all (the file was too small), from that.  So I had to figure out both /those/ issues before I could trace down the real problem.

But once I did, it was entirely repeatable.  dracut-044 builds a cpio that I can browse in mc, but attempting to have the kernel compress and attach it results in a way too small vmlinuz image, as if it's not attaching anything.

Meanwhile, the old dracut-043-r2, both the original binpkg I had built back in July with gcc-4.x, and the new one I just built with gcc5 to test that it wasn't actually gcc5 doing it, repeatedly produced a working cpio that could be attached to a kernel build and booted.

So dracut-043-r2 works, dracut-044 is broken and produces garbage archives, at least from the kernel's perspective.  Entirely dependably and repeatedly with my configuration at least, as I've no eliminated all the other variables.

I'm attaching emerge --info dracut.

Note that given the possibility of unbootable systems I set this critical (and evaluated blocker, but I'll leave that for you to decide).  If you consider that incorrect, please set it as you believe it should be.  At this moment, I'd recommend urgent masking (as I've done locally) due to the danger of broken systems, until the problem can be resolved.
Comment 1 Mike Gilbert gentoo-dev 2016-01-07 17:58:12 UTC
Could you attach one of the "garbage" cpio archives?
Comment 2 Mike Gilbert gentoo-dev 2016-01-07 18:08:12 UTC
It would also be useful to know if you encounter any issue when loading the initramfs via your bootloader rather than baking it into the kernel image.
Comment 3 Mike Gilbert gentoo-dev 2016-01-07 18:37:14 UTC
I was able to successfully boot a kernel in qemu with a initramfs built-in.

Here are some relevant packages:

app-arch/cpio-2.12
sys-boot/dracut-044
linux-4.3.2

My kernel config has this:

CONFIG_INITRAMFS_SOURCE="initramfs.cpio"

I built the initramfs like this:

sudo dracut --no-compress --no-kernel initramfs.cpio

systemd started in the initramfs successfully with the following qemu command:

qemu-system-x86_64 -machine accel=kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0
Comment 4 Alexander Tsoy 2016-01-07 19:28:10 UTC
Please try with --no-reproducible --no-early-microcode
Comment 5 Duncan 2016-01-08 02:08:01 UTC
(In reply to Mike Gilbert from comment #1)
> Could you attach one of the "garbage" cpio archives?

Good idea... but it's over the 1000 KB attachment limit.  Even after xz-compression the first one (I was going to post a good and a bad one, maybe the diff...) is ~4.5 MiB, and the second would be similar.  (The bad cpio is still about the same size and is browsable in mc, so it's gotta be just not quite the correct format or something.  I'd suspect a bad cpio executable if I rebuilt it with gcc5, I'm actually not sure, but then it would produce bad cpios for both dracuts, and it doesn't)

Meanwhile some more info.  FWIW, my dracut helper script already tracks versions of a few packages in an additional text file it puts in the same subdir.  Here's its contents for the "good" one:

app-shells/bash-4.3_p42-r1
sys-fs/btrfs-progs-4.3.1
sys-kernel/dracut-043-r2
sys-apps/systemd-228-r1

The bad one would be the same except of course dracut-044 instead of 043-r2.

Second, here's my non-default settings in /etc/dracut.conf.d/*.conf (each setting in its own file, here listed together).  Obviously this will apply to both good and bad cpios.  (I tried host-only when I originally setup dracut, and had a bug, now long fixed AFAIK but if you look in my bugzilla history you'll see I filed it, so set it up hostonly=no and simply specified the modules I wanted added and subtracted.  By the time the bug was fixed I had a working config I was happy with, so I left it as it was.)

add_dracutmodules="
	btrfs
"
hostonly_cmdline=no
hostonly=no
install_items+="
	/bin/most
	/bin/nano
"
no_kernel=yes
# caps doesn't work with systemd
omit_dracutmodules="caps"
persistent_policy=by-label
ro_mnt=yes
show_modules=yes

A couple other notes in case it matters: 
1: (reminder, IIRC it's in the use flags) dracut with systemd, not openrc.
2: On my main system I have unified /usr to root via /usr -> . symlink, and unified bin/sbin via sbin -> bin symlink, as well.  That does throw an occasional bug my way.


After working on that most of the nite and putting in some hours at work today, I've been up over 30 hours now.  I was going to try to attach those before I went to bed, but...  Anybody got a good pastebin to recommend while I'm out?  It'll save me some time looking.  I'll look at the other suggestions when I can actually think straight... which isn't now.

Thanks for the ideas and encouragement. =:^)
Comment 6 Duncan 2016-01-12 14:16:54 UTC
TL;DR: Heisenbug!  Works fine now.  <shrug>

Found somewhere (load.to) to put the cpios, remerged dracut-44 to build a bad version again... and this time it works just fine.  My first build must have screwed up somehow that produced bad cpios, that I could still browse in mc, but that the kernel didn't like.  Either that or the cpios were fine and something weird was going on with the kernel build.  But I rebuilt several times with different kernel version checkouts, over several reboots, and the behavior was consistently bad with cpios built with 44 and good with those built with 43-r2, then, and now 44 builds good ones the kernel doesn't have a problem with, so I simply don't know.  Maybe it was screwed-up hardware state that survived the warm reboots and that only the 44-built one was sensitive to.  Whatever.

I even doubted the kernel was pulling in the 44-built cpio when it worked without issue this time, and set the rd.break kernel commandline option to break inside the initr* environment and verify /lib64/os-release, and sure enough, it said dracut 44 there too, so it _couldn't_ be the kernel still using the known-working 43-r2 version.

I guess I reopen if it starts occurring again, but for now, the heisenbug is gone, so I'm setting resolved/invalid.