Created attachment 350116 [details] livecd-test.png Something seems to be going wrong with the module dependency information in initramfs generated by the newest genkernel-3.4.46.1 (also affects 3.4.46 afaik). It seems that the pata_modules are being loaded before libata is available, pata_pcmcia is being loaded before the pcmcia "ds" module is available, maybe more. Here is a screenshot of early boot with a minimal install cd iso generated by catalyst w/ genkernel-3.4.46.1. It is using the current amd64 stable gentoo-sources-3.8.13. It still boots successfully because I don't personally depend on either of those broken modules for boot.
Created attachment 350118 [details] dmesg.log I've also built vanilla-sources-3.9.4 independent of catalyst, and while I was unable to capture the actual screen output during initramfs boot, I do see the same kind of messages in dmesg, see attached log.
Created attachment 350134 [details] another livecd test screenshot Here's another screenshot which seems to sum up the root of the problem. To reproduce: 1. Boot a new "genkernel all" kernel built by >=genkernel-3.4.46 2. Use parameters "nodetect debug", ie I'm booting my minimal cd with "gentoo nodetect debug" 3a. Try to "modprobe pata_mpiix" OR 3b. Try to "modprobe pcmcia" for another fun error
I see now. New genkernel uses busybox modprobe (again) where older versions used the genkernel-provided modprobe script. See bug #197730 for the background on this script. Should we be fixing up our modules_load files to get the load order right, so we can use busybox's modprobe directly, or go back to the genkernel modprobe script? Working (3.4.45) setup: # ls -l $(which modprobe) -rwxr-xr-x 1 0 0 2576 May 9 06:41 /sbin/modprobe # Non-working (3.4.46) setup: # ls -l $(which modprobe) lrwxrwxrwx 1 0 0 14 Jun 4 04:23 /sbin/modprobe -> ../bin/busybox #
(In reply to Ben Kohler from comment #2) > 2. Use parameters "nodetect debug", ie I'm booting my minimal cd with > "gentoo nodetect debug" For the record, what happens if you don't use nodetect?
Created attachment 350154 [details, diff] genkernel-modules_load-fix.patch Without nodetect, I get the output in the first attachment. Basically due to busybox's funky module dep handling, the first pata_* driver will always fail because the required libata module gets loaded immediately AFTER the first requested pata_* driver. A similar situation with pata_pcmcia and some required pcmcia modules. I've attached a patch to work around the errors if we are going to keep busybox-modprobe, just shuffling libata to the beginning of the module list (which is logical anyway) and moving pata_pcmcia into the pcmcia modules since it deps on most of those. I've only patched x86_64, I imagine most other arches would need the very same fix. But it seems like busybox's modprobe dep handling is just a mess, we should probably look into when the change (from genkernel-modprobe back to bb-modprobe) was made and make sure it's intentional.
I have to admit, I am a little curious why we need our own modprobe handling here... So the system boots up, uses genkernel modprobe, while busybox-modprobe is available, then we have kmod after that. Really, three version of modprobe on one boot? Can someone please tell us why genkernel has it's very own modprobe? Certainly there is a better solution than this...
Here's the commit where we moved back to busybox modprobe, FWIW: http://git.overlays.gentoo.org/gitweb/?p=proj/genkernel.git;a=commit;h=3a054014e880e5b1ff28e3d87767c45a073da6b5
We don't need that. We need to fix the underlying bugs generated by having our own crufty version of modprobe for ages there. I believe that modules.dep is not where it is expected to be or the module dependencies are not stated correctly or (I hardly believe this) busybox modprobe doesn't consider module dependencies. If that's true, it adds up to the arguments against using busybox modprobe, busybox mdev and friends in the initramfs.
http://git.busybox.net/busybox/tree/modutils/Config.in?h=1_15_stable#n8 It says that it automagically handles modules dependencies. Thus, I believe that the module is declaring wrong dependencies, which means that the bug is in the kernel.
I can confirm that running busybox depmod before any module loads creates the required modules.dep.bb and resolves the issue. This should happen @ build time, not initramfs runtime, right?
CONFIG_MODPROBE_SMALL=y (from defaults/busy-config) should save us from doing that. Did you change the busybox config?
I have not made a single change to busybox's config, but my genkernel initramfs busybox does not seem to match the config from /usr/share/genkernel/defaults/busy-config. As I read defaults/busy-config, it shouldn't have insmod/rmmod/lsmod/depmod, but mine does: # ~/initramfs-unpacked/bin/busybox BusyBox v1.20.2 (2013-06-04 14:25:25 CDT) multi-call binary. Copyright (C) 1998-2011 Erik Andersen, Rob Landley, Denys Vlasenko and others. Licensed under GPLv2. See source distribution for full notice. Usage: busybox [function] [arguments]... or: busybox --list[-full] or: busybox --install [-s] [DIR] or: function [arguments]... BusyBox is a multi-call binary that combines many common Unix utilities into a single executable. Most people will create a link to busybox for each function they wish to use and BusyBox will act like whatever it was invoked as. Currently defined functions: [, [[, acpid, ash, awk, base64, basename, beep, blkid, blockdev, bootchartd, brctl, bzip2, cat, chat, chgrp, chmod, chown, chpasswd, chroot, chvt, clear, cp, cut, date, dd, depmod, devmem, df, dhcprelay, dirname, dmesg, dnsdomainname, du, dumpkmap, dumpleases, echo, env, false, fbsplash, fgconsole, fgrep, find, findfs, flock, free, freeramdisk, fsync, ftpd, grep, groups, gunzip, gzip, halt, hd, head, hexdump, hostid, hostname, id, ifconfig, ifenslave, ifplugd, init, insmod, ionice, iostat, kbd_mode, kill, killall, linuxrc, ln, loadfont, loadkmap, losetup, lpd, lpq, lpr, ls, lsmod, lsof, lspci, lsusb, lzop, lzopcat, makedevs, makemime, man, mdev, mdstart, mesg, microcom, mkdir, mkdosfs, mke2fs, mkfs.ext2, mkfs.vfat, mknod, mktemp, modinfo, modprobe, more, mount, mpstat, mv, nbd-client, ntpd, pgrep, ping, pivot_root, pkill, pmap, popmaildir, poweroff, powertop, ps, pstree, pwd, pwdx, rdate, rdev, readlink, reboot, reformime, reset, rev, rm, rmdir, rmmod, route, rtcwake, script, scriptreplay, sed, sendmail, setfont, setserial, sh, sha256sum, sha512sum, showkey, sleep, smemcap, sort, stty, swapoff, swapon, switch_root, sync, sysctl, tac, tail, tar, test, timeout, touch, true, tty, tunctl, ubiattach, ubidetach, ubimkvol, ubirmvol, ubirsvol, ubiupdatevol, udhcpc, udhcpd, umount, uname, uniq, unlzop, unxz, uptime, volname, wget, which, whoami, whois, xargs, xz, xzcat, yes, zcat
(In reply to Rick Farina (Zero_Chaos) from comment #6) > I have to admit, I am a little curious why we need our own modprobe handling > here... > > So the system boots up, uses genkernel modprobe, while busybox-modprobe is > available, then we have kmod after that. Really, three version of modprobe > on one boot? Can someone please tell us why genkernel has it's very own > modprobe? Certainly there is a better solution than this... As far as I know, we had our own modprobe for historical reasons. Anyway, there seems to be a bug in busybox modprobe. This needs to be fixed ASAP. (In reply to Ben Kohler from comment #10) > I can confirm that running busybox depmod before any module loads creates > the required modules.dep.bb and resolves the issue. This should happen @ > build time, not initramfs runtime, right? This sounds reasonable. We can do it when running the genkernel command. (In reply to Fabio Erculiani from comment #11) > CONFIG_MODPROBE_SMALL=y (from defaults/busy-config) should save us from > doing that. Did you change the busybox config? The busybox documentation states that CONFIG_MODPROBE_SMALL makes module loading slower: http://git.busybox.net/busybox/tree/modutils/Config.in?h=1_15_stable The switch to busybox modprobe was done for speed, so it would make sense to go with Ben's suggestion to create modules.dep.bb when generating the initramfs and disable CONFIG_MODPROBE_SMALL. The documentation also states that the code is experimental, which might explain why it is doing something wrong. I am sure that the busybox developers would appreciate a patch to fix that, but it seems to me that the best thing is to create modules.dep.bb.
Created attachment 350256 [details, diff] Experimental patch to fix issue I wrote a bunch of patches that together appear to fix the issue, but it needs testing. I have attached a monolithic patch. I would appreciate it if someone affected by this would test it.
(In reply to Richard Yao from comment #14) It will break cross compilation even more (which is a good excuse for getting rid of it :)) due to the chroot call. Then, I'd place the code in a separate function. Then, I don't know if we already depend against fakeroot, if not, with that patch we should. I realize that busybox depmod has a broken -b support, but maybe it would be easier to fix it. Or, would it be possible to use the kmod/module-init-tools depmod generated modules.dep? Is the syntax different?
The syntax is quite different, modules.dep.bb actually has info on each module's provided symbols plus their immediate module deps. Running busybox depmod is pretty costly from my booted desktop (~3.5 seconds, my disk is slow!) but from initramfs, it might not be more than a tiny fraction of a second. Takes only ~0.065s on the same desktop when the stuff is in FS cache. I guess I should add 'time' to busybox so I can find out for sure what the impact would be of generating it @ boot time.
If the cost is not so high, that workaround would be much better than the hackish (no offense!) chroot in gen_initramfs.sh. Surely, the best would be having depmod -b eventually fixed upstream.
I added the 'time' applet to busy-config and built a 'genkernel all' and booted it with "nodetect debug" params to dump right to a shell: # time busybox depmod real 0m 0.03s user 0m 0.02s sys 0m 0.01s # modprobe pata_mpiix ACPI: bus type ATA registered libata version 3.00 loaded # (added the modprobe to make sure the depmod was accomplishing our end goal) This is on a relatively modern i5-2540m, but I can't imagine the impact is too huge even on much older hardware.
Then I think that we should take that route to deal with the interim.
(In reply to Fabio Erculiani from comment #19) > Then I think that we should take that route to deal with the interim. I agree, I'll wait .03s for a proper modules.dep file.
Created attachment 350298 [details, diff] call depmod at runtime to workaround busybox depmod -b bug
(In reply to Fabio Erculiani from comment #21) > Created attachment 350298 [details, diff] [details, diff] > call depmod at runtime to workaround busybox depmod -b bug Mostly works, except it seems -v is not implemented, so this breaks debug mode: # busybox depmod -v depmod: invalid option -- 'v' BusyBox v1.21.0 (2013-02-16 16:13:42) multi-call binary. Usage: depmod [-qfwrsv] MODULE [symbol=value]... -r Remove MODULE (stacks) or do autoclean -q Quiet -v Verbose -f Force -w Wait for unload -s Report via syslog instead of stderr # This added depmod call does resolve this libata/pata_* dep issue, there is still something going on with pcmcia but that's another issue, I think.
WTF? depmod --help declares flags that doesn't implement? Busybox depmod seems to be a bad joke at this point :( Anyway, that can be easily fixed...
It looks like the same --help output is shared between insmod/modprobe/depmod/rmmod/..., which isn't ideal, but that explains it. But yeah, that's easy enough to fix and the patch works great otherwise.
Created attachment 350320 [details, diff] call depmod at runtime to workaround busybox depmod -b bug v2
Tested new patch, success
I hate to add yet another facet to this bug, but-- Even with a proper busybox-depmod-generated modules.dep.bb, I still see module load problems with pcmcia.ko, and there may be others. But if I turn off MODPROBE_SMALL and go back to regular MODPROBE, *everything* seems to fall into place. We no longer need the depmod call, and modules like pcmcia.ko show no failures. I think MODPROBE_SMALL is causing more trouble that it's worth.
> I think MODPROBE_SMALL is causing more trouble that it's worth. What is the advantage of MODPROBE_SMALL? it is just a few kb? I'm inclined to agree that it's more trouble than it's worth.
(In reply to Ben Kohler from comment #27) Did you enable CONFIG_INSMOD CONFIG_LSMOD CONFIG_RMMOD? See: http://git.busybox.net/busybox/tree/modutils/Config.src?h=1_21_stable In particular, I've just stumbled upon this: """ At the first attempt to load a module by alias modprobe will try to generate modules.dep.bb file in order to speed up future loads by alias. Failure to do so (read-only /lib/modules, etc) is not reported, and future modprobes will be slow too. """ Now I am even more confused... I didn't expect busybox modutils to be such a mess.
Here are the changes I made, it would probably be a good idea to do a full menuconfig run (based on existing genkernel/defaults/busy-config) to get a config more properly aligned with current busybox. # Linux Module Utilities # CONFIG_MODINFO=y -CONFIG_MODPROBE_SMALL=y -CONFIG_FEATURE_MODPROBE_SMALL_OPTIONS_ON_CMDLINE=y -CONFIG_FEATURE_MODPROBE_SMALL_CHECK_ALREADY_LOADED=y -# CONFIG_INSMOD is not set -# CONFIG_RMMOD is not set -# CONFIG_LSMOD is not set +# CONFIG_MODPROBE_SMALL is not set +# CONFIG_FEATURE_MODPROBE_SMALL_OPTIONS_ON_CMDLINE is not set +# CONFIG_FEATURE_MODPROBE_SMALL_CHECK_ALREADY_LOADED is not set +CONFIG_INSMOD=y +CONFIG_RMMOD=y +CONFIG_LSMOD=y # CONFIG_FEATURE_LSMOD_PRETTY_2_6_OUTPUT is not set -# CONFIG_MODPROBE is not set +CONFIG_MODPROBE=y # CONFIG_FEATURE_MODPROBE_BLACKLIST is not set -# CONFIG_DEPMOD is not set +CONFIG_DEPMOD=y
Created attachment 350410 [details, diff] switch away from MODPROBE_SMALL This disables MODPROBE_SMALL. Can you test it?
Looks good. Builds fine, boots fine with zero module loading problems.
(In reply to Fabio Erculiani from comment #15) > (In reply to Richard Yao from comment #14) > > It will break cross compilation even more (which is a good excuse for > getting rid of it :)) due to the chroot call. > Then, I'd place the code in a separate function. > Then, I don't know if we already depend against fakeroot, if not, with that > patch we should. > I realize that busybox depmod has a broken -b support, but maybe it would be > easier to fix it. Or, would it be possible to use the kmod/module-init-tools > depmod generated modules.dep? Is the syntax different? Nice catch. I am in agreement with you and Ben. Until option -b is implemented, the initramfs should invoke depmod at runtime. I will commit a patch to master to resolve this as soon as I have finished testing. On an unrelated topic, it appears that the switch to busybox modprobe causes a race condition on systems using ZFS where pool import might be attempted before the modules are initialized. I am looking into fixes for both this bug and the ZFS regression concurrently.
Please read the other comments. Disabling MODPROBE_SMALL is much better than any other solution. If other bugs arose in other areas, then they should be fixed there. Using depmod just because of a race in ZFS is not a solution to either problems.
(In reply to Fabio Erculiani from comment #34) > Please read the other comments. > Disabling MODPROBE_SMALL is much better than any other solution. > If other bugs arose in other areas, then they should be fixed there. Using > depmod just because of a race in ZFS is not a solution to either problems. The race involving ZFS is entirely separate from depmod. The only reason I mentioned it is because they are both regressions triggered by the busybox modprobe patch and I am looking into fixes for both of them in parallel.
(In reply to Fabio Erculiani from comment #34) > Please read the other comments. > Disabling MODPROBE_SMALL is much better than any other solution. > If other bugs arose in other areas, then they should be fixed there. Using > depmod just because of a race in ZFS is not a solution to either problems. I should probably also state that I had planned to review your MODPROBE_SMALL patch after I wrote the other comment. It did not occur to me that MODPROBE_SMALL was the trigger for this. Anyway, this solution looks much nicer to me. I am all for adopting it after I do some additional testing. :)
(In reply to Fabio Erculiani from comment #31) > Created attachment 350410 [details, diff] [details, diff] > switch away from MODPROBE_SMALL > > This disables MODPROBE_SMALL. > Can you test it? This patch causes problems when loading ZFS. I am debugging the problem now.
(In reply to Richard Yao from comment #37) > (In reply to Fabio Erculiani from comment #31) > > Created attachment 350410 [details, diff] [details, diff] [details, diff] > > switch away from MODPROBE_SMALL > > > > This disables MODPROBE_SMALL. > > Can you test it? > > This patch causes problems when loading ZFS. I am debugging the problem now. To be clear, I mean that the zfs module is not loaded at all with this patch applied. I had to run `modprobe zfs` three times in a debug shell before busybox modprobe would load it.
(In reply to Richard Yao from comment #37) > (In reply to Fabio Erculiani from comment #31) > > Created attachment 350410 [details, diff] [details, diff] [details, diff] > > switch away from MODPROBE_SMALL > > > > This disables MODPROBE_SMALL. > > Can you test it? > > This patch causes problems when loading ZFS. I am debugging the problem now. If I `modprobe -r zavl`, which removes the last module loaded before something went wrong and `modprobe zfs`, I see: # modprobe -r zavl [ 965.633012] SPL: Unloaded module v0.6.1-1 (DEBUG mode) # modprobe zfs [ 974.821043] SPL: Loaded module v0.6.1-1 (DEBUG mode) modprobe: can't load module zcommon (extra/zcommon/zcommon.ko): No such file or directory Doing `modprobe zfs` three additional times will make ZFS load: # modprobe zfs modprobe: can't load module zunicode (extra/unicode/zunicode.ko): No such file or directory # modprobe zfs modprobe: can't load module zfs (extra/zfs/zfs.ko): No such file or directory # modprobe zfs [ 1039.751476] ZFS: Loaded module v0.6.1-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5 [ 1048.193017] zd0: unknown partition table The zfs and zpool commands will invoke modprobe if the zfs module is not loaded. This masked the problem when the original patch to switch to the busybox modprobe was applied. I wrote a patch to fix the race condition that I cited earlier, which caused me to uncover the problem. The failure to load ZFS is clearly a separate problem, despite also being triggered by the busybox modprobe patch. I think Fabio Erculiani's latest patch is okay to merge independently of a fix for the other issue. However, I am inclined to break it into two patches. One by Ben Kohler and one by Fabio Erculiani. I plan to commit it to git master tomorrow unless I catch another problem in my testing.
I have broken Fabio's patch into two separate patches, one by Ben Kohler and another by Fabio Erculiani. Both have been committed to master. I have also written workarounds for the ZFS regressions that were introduced by the switch to busybox modprobe, which have also been committed.
Fixed in 3.4.47, just released. Closing.