Summary: | linux-sources: cached /proc/cpuinfo is invalidated by updated microcode (which can break with glibc-2.20) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Vlad Horko <scjthm> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Miscellaneous <kernel-misc> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alexander, billie, dflogeras2, flyser42, james05+gentoo, kernel, krinpaus, kripton, lionel-dev, lists, marecki, microcai, necheffa.misc, realnc, rhill, ryao, saintdev, toolchain, uwe, wschlich |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://bugzilla.kernel.org/show_bug.cgi?id=88001 | ||
See Also: |
https://bugzilla.kernel.org/show_bug.cgi?id=88001 https://bugzilla.redhat.com/show_bug.cgi?id=1083716 |
||
Whiteboard: | 3.19 | ||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | 557278 | ||
Bug Blocks: | |||
Attachments: | glibc 2.20 does not respect --enable-lock-elision=no without this patch |
Description
Vlad Horko
2014-11-09 08:29:16 UTC
Output of... # emerge -pv sys-apps/microcode-ctl sys-apps/microcode-data As well as `emerge --info` output... Are required. Changing subject, since the bug seems to be in the microcode, rather than the loader, and in any case, it's no longer the job of an udev to load anykind of firmwares, with USE=firmware-loader being obsolete and removed in latest version Rebuilding glibc --enable-lock-elision=no. Thanks for the rules and regulations, but I am happy now. If anyone else wants their system to boot, and then have systemd and they find that they cannot because this crashes quite early on, or god forbid they use a ramfs that has glibc in it, then they should build glibc with --enable-lock-elision=no (In reply to Samuli Suominen from comment #1) > Output of... > > # emerge -pv sys-apps/microcode-ctl sys-apps/microcode-data > > As well as `emerge --info` output... > > Are required. Whats your address? I can send you some fries with that. > Changing subject, since the bug seems to be in the microcode, rather than > the loader, and in any case, it's no longer the job of an udev to load > anykind of firmwares, with USE=firmware-loader being obsolete and removed in > latest version Kernel at least 3.7 and built with CONFIG_FW_LOADER_USER_HELPER=n and systemd built with USE="-firmware-loader" or systemd at least 217 where the userspace loader has been removed? Just trying to estabilish if it's the kernel, or the systemd(-udevd) that's failing to load the firmware You should be migrating to the kernelspace loader, the userspace loader is obsolete Not sure if it has any impact if glibc is 2.20 or not when using the kernelspace loader... my guess is you're running into this bug: https://sourceware.org/ml/libc-alpha/2014-11/msg00110.html which is where the kernel checks functionality, caches the result, then loads the new firmware which changes the functionality, but doesn't update the cache. glibc then checks cpuinfo (which reports the stalet cache) to see which functionality to enable, sees that HLE is available, and tries to use it. then everything falls down. i.e. it's a bug in the kernel. (In reply to SpanKY from comment #4) > my guess is you're running into this bug: > https://sourceware.org/ml/libc-alpha/2014-11/msg00110.html > > which is where the kernel checks functionality, caches the result, then > loads the new firmware which changes the functionality, but doesn't update > the cache. > > glibc then checks cpuinfo (which reports the stalet cache) to see which > functionality to enable, sees that HLE is available, and tries to use it. > then everything falls down. > > i.e. it's a bug in the kernel. It might be. Its a vanilla kernel. But, in order for this machine to boot I had to completely remove any code which contained the _xbegin transaction instructions from libpthread. objdump -D on the libpthread was showing that the intruction was still there even after adding --enable-lock-elision=no. I had to make certain that the #define was set to zero. (I think that the the CPU that I have a desktop version of haswell) doesn't even support these instructions. I think that I might be due a refund from intel, if it these were supported on the processor). I wasn't able to downgrade glibc and didn't want to go down that path but I believe that the most recent change was the upgrade from glibc 2.19 to glib 2.20. In order for the systemd based machine to boot, I had completely remove the code from glibc. Also, I think that the cpu microcode/firmware changes are permanent. I hope that this helps anyone else out there. disabling microcode update resovled the systemd-udevd invalid op problem. (In reply to microcai from comment #6) > disabling microcode update resovled the systemd-udevd invalid op problem. How did you do that? Does systemd update the microcode or do you have a systemd service that runs microcode_ctl? systemd does not need microcode_ctl , it load microcode.ko and microcode.ko loads microcode.dat itself. I just backlisted microcode.ko (In reply to SpanKY from comment #4) > my guess is you're running into this bug: > https://sourceware.org/ml/libc-alpha/2014-11/msg00110.html > > which is where the kernel checks functionality, caches the result, then > loads the new firmware which changes the functionality, but doesn't update > the cache. > > glibc then checks cpuinfo (which reports the stalet cache) to see which > functionality to enable, sees that HLE is available, and tries to use it. > then everything falls down. > > i.e. it's a bug in the kernel. If another user was able to reproduce this by stopping the microcode from being downloaded, and I had to remove all tsx instructions, then I would disagree. I think that glib bypasses the kernel and doesn't check for the instructions that are supported. I don't believe that you need to be running in ring level 0 to do this. Hi, I'm experiencing a similar issue, although with I'm using OpenRC, not systemd. For me, lightdm would fail to start, not the entire init system failing. Looking in dmesg, I would have traps: lightdm[3009] trap invalid opcode ip:7f4257b5895a sp:7fff24779ce8 error:0 in libpthread-2.20.so[7f4257b4d000+16000] traps: console-kit-dae[2844] trap invalid opcode ip:7f5d382cdc52 sp:7fff3c837878 error:0 in libpthread-2.20.so[7f5d382c2000+16000] If I attempted to restart lightdm, dbus would then crash again with an invalid opcode. traps: dbus-daemon[2592] trap invalid opcode ip:7f20a5f0c95a sp:7fffda2e6478 error:0 in libpthread-2.20.so[7f20a5f01000+16000] After that if restart dbus, consolekit and lightdm everything worked great. After finding the launchpad bug in comment #1, I did a couple of tests. Disabling the microcode_ctl service and everything started up fine. Moving the microcode_ctl service to boot instead of default runlevel and again, everything started fine. What I suspect is going on is each of these binaries are starting up, initializing libpthread for their threads. libpthread's CPU dispatcher checks if the CPU supports HLE (this is done by calling the CPUID instruction). Determines it does support HLE and uses the HLE codepath for locking. Now, microcode_ctl starts and updates the microcode, disabling the HLE instruction. libpthread goes to lock something (using HLE) and crashes, because that instruction is no longer supported. After restarting the service, libpthread again checks CPUID and now determines that HLE is not supported, so it doesn't attempt to use the HLE codepath. In the case of systemd most likely a similar issue is happening. systemd starts, libpthread sees HLE support, microcode is loaded by systemd, libpthread attempts to lock, systemd crashes. *** Bug 531026 has been marked as a duplicate of this bug. *** Since this seems to be the parent bug now, could the summary be adjusted to be something that makes sense? Bug 531528 or Bug 531026 seem to have the best summaries of what is actually happening. Also, what needs to be done to get the patch in Bug 531528 included in portage so this stops breaking systems? (In reply to Nathan Caldwell from comment #12) since the bug isn't in glibc, i have no plans to include patches for it. not sure the patch is even correct, although i haven't looked too closely at it and comparing working-vs-broken hardware. The upstream tracker bug contains a patch that seems to have made it to the latest kernels. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fb86b97300d930b57471068720c52bfa8622eab7 However it seems the stable kernels haven't picked it up yet (at least 3.14 seems to not contain this patch so maybe worth applying this patch) so maybe worth applying this patch to our gentoo-sources? I have cc'd the kernel@ team for that. IIUC that patch just covers a corner case for resuming from suspend. It doesn't solve the root problem. If we're not going to patch glibc or the kernel then we need to start loading microcode earlier in the boot process. To do that we would need to enable MICROCODE_EARLY in the kernel, which requires MICROCODE to be built-in (not a module) and BLK_DEV_INITRD. Every Haswell system would then need to use an initrd. On my system it wasn't enough to take microcode_ctl out of the runlevels. It seems udev happily loads it anyways. I had to uninstall it completely. This sucks. I really hope there's a kernel/glibc solution. the elision code, iirc, ran into issues in older versions independent of the TSX microcode issue. i can add --enable-lock-elision=no to <=glibc-2.20 to keep stable limping along, but that doesn't help with newer versions where the code is doing the right thing and the microcode/kernel handling is still broken. can you verify that flag helps w/glibc-2.20 ? the configure code seems to already disable things by default: AC_ARG_ENABLE([lock-elision], AC_HELP_STRING([--enable-lock-elision[=yes/no]], [Enable lock elision for pthread mutexes by default]), [enable_lock_elision=$enableval], [enable_lock_elision=no]) so i can't see how --enable-lock-elision=no is any different from not setting it at all -- enable_lock_elision is getting set to no either way. *** Bug 544498 has been marked as a duplicate of this bug. *** We just upgraded one of our servers to glibc-2.20-r2 and started to get similar errors. We're pretty sure the glibc upgrade triggered this : zabbix polls our ceph cluster every 20s and according to emerge.log and our kernel logs we saw the first invalid opcode message less than 10s after the glibc was installed. Mar 27 22:44:10 virtcluster-02-a kernel: traps: ceph[4275] trap invalid opcode ip:7f7e0503e492 sp:7f7dfe11c218 error:0 in libpthread-2.20.so[7f7e05032000+16000] We run the latest stable gentoo-sources (3.18.9) and tried several configuration without success from a base configuration where microcode was loaded by initramfs : * loading the latest (~20150121) microcode-data manually, * loading microcode update early, * disabling microcode update support in the kernel. Nothing worked. Fortunately the build chain still works so I could upgrade/downgrade (unfortunately downgrading glibc isn't allowed). Is there any safe way to fix this short of a full chroot reinstall where with package.mask >=glibc-2.20? I found a Gentoo glibc downgrade guide on the wiki but it doesn't really seem safe... (In reply to Lionel Bouton from comment #18) uninstall microcode packages (In reply to SpanKY from comment #19) > (In reply to Lionel Bouton from comment #18) > > uninstall microcode packages I did, they weren't installed in the first place. (In reply to Lionel Bouton from comment #20) > (In reply to SpanKY from comment #19) > > (In reply to Lionel Bouton from comment #18) > > > > uninstall microcode packages > > I did, they weren't installed in the first place. By the way I was under the impression that microcode_ctl was using the kernel interface, so I don't see how they could work without kernel support : https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/5.6_Technical_Notes/microcode_ctl.html If there is a way for microcode_ctl to work without kernel support I'll have to test this later : the server is still usable and used in a Ceph cluster under heavy rebalancing right now (additional load would bring it to its knees). (In reply to Lionel Bouton from comment #21) > (In reply to Lionel Bouton from comment #20) > > (In reply to SpanKY from comment #19) > > > (In reply to Lionel Bouton from comment #18) > > > > > > uninstall microcode packages > > > > I did, they weren't installed in the first place. > > By the way I was under the impression that microcode_ctl was using the > kernel interface, so I don't see how they could work without kernel support : > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/ > html/5.6_Technical_Notes/microcode_ctl.html > > If there is a way for microcode_ctl to work without kernel support I'll have > to test this later : the server is still usable and used in a Ceph cluster > under heavy rebalancing right now (additional load would bring it to its > knees). I don't see how it would still cause issues without the microcode update in the kernel. You can try using epatch_user to apply glibc-2.20-blacklist_HLERTM_Haswell.patch from Bug 531520. That way glibc never sees the instruction available even if it is loaded before the microcode. > > I don't see how it would still cause issues without the microcode update in > the kernel. You can try using epatch_user to apply > glibc-2.20-blacklist_HLERTM_Haswell.patch from Bug 531520. Bug 531528, sorry. (In reply to Nathan Caldwell from comment #23) > > > > I don't see how it would still cause issues without the microcode update in > > the kernel. You can try using epatch_user to apply > > glibc-2.20-blacklist_HLERTM_Haswell.patch from Bug 531520. > > Bug 531528, sorry. Unfortunately the traps are still here (I didn't reboot yet but glibc is installed with the patch and if I understand the problem correctly it should have been fixed). Reviewing the patch, it only deals with processor models 60, 63, 69 and 70. Our processor is a Xeon E5-2650, family 6, model 45, stepping 7. Are models missing from the patch or is there a separate bug? (In reply to Lionel Bouton from comment #24) sounds like a different issue. file a different bug please and we'll triage it there. *** Bug 546666 has been marked as a duplicate of this bug. *** *** Bug 547166 has been marked as a duplicate of this bug. *** *** Bug 547164 has been marked as a duplicate of this bug. *** Meanwhile, sys-apps/microcode-data-20150121 - which includes the TSX-disabling erratum - has been marked stable on both amd64 and x86 lately. In other words, as of now ANY Haswell box running Gentoo + "emerge -uD @world" + reboot or microcode reload = unusable system. Fun. Sorry for the delay, I'll grab the patch for gentoo-sources and start rolling out releases. Thanks, Markos. For the record, the referenced patch from comment #14 is already in the following kernels: >= v3.18 >= v3.19 v4.0 If 3.18 and newer already contain this patch then unfortunately it does NOT fix the libpthread problem - the Haswell system I managed to kill with TSX-disabling microcode last Saturday runs the latest stable version of hardened-sources i.e. 3.18. Looks like what I've read in the Debian bug report on the matter is indeed correct - glibc identifies CPU capabilities without involving the Linux kernel to patching the problem with /proc/cpuinfo does not help. Given SpanKY seems to dislike the idea of patching glibc to blacklist Haswell from hardware lock elision and that as a result of the erratum no existing CPUs actually support it, looks like we should indeed disable it at build time (assuming it is really possible 2.20, apparently in 2.19 the relevant configure option failed to fully disable HLE) and hope that by the time CPUs with working TSX reach the market the blacklist patch will have made it upstream. I too am running 3.18.11 (stable). I checked my local 3.18.11 source tree and the file from the patch: /usr/src/linux/arch/x86/kernel/cpu/microcode/core.c does not seem to have the added lines from said patch. I hit this as well. Spent my time on restoring my system from a two-month-old backup and some additional work before I stumbled upon some invalid opcode errors in my dmesg that I could finally Google, and that led me here. It's been fun to say the least. I too run a 3.18.11 kernel, with genpatches (and others), and I hit this never the less. Have now a kernel with early microcode update support, and a dracut-initramfs that with the updated microcode embedded. So 'tis all good now. But I, as others above, would of course wish for smoother procedures. (In reply to Tamas Jantvik from comment #34) > I hit this as well. Spent my time on restoring my system Exact same story here. I've done a world update last friday and got an unbootable laptop. Since i was upgrading both glibc and gcc (4.8.4 required by glibc), i thought that was a build problem. I've solved only today (tuesday, after much sleep deprivation) by just masking the latest microcode-data and going back to microcode-data-20140430. Maybe =sys-apps/microcode-data-20150121 should NOT be marked as stable yet ?? Personally, i would prefer a system that MAY, in some remote cases, hit a subtle cpu bug, rather than a system that won't boot, period. Does this patch from fedora fix this issue? http://pkgs.fedoraproject.org/cgit/kernel.git/commit/?h=f21&id=a357c223627a0fe704e38e5c4eb4aa39869cd114 -CONFIG_MICROCODE=m +CONFIG_MICROCODE=y +CONFIG_MICROCODE_EARLY=y CONFIG_MICROCODE_INTEL=y CONFIG_MICROCODE_INTEL_EARLY=y CONFIG_MICROCODE_AMD=y +CONFIG_MICROCODE_AMD_EARLY=y Or does this have to happen in conjunction with the glibc blacklist patch that debian and I think fedora is using? That's the early microcode loading I was talking about in comment #15. It requires people set up and use a suitably configured initrd so it's not really a solution for us. I agree that sys-apps/microcode-data-20150121 should be masked until a solution is in place. My system also fell victim to this problem. Created attachment 408770 [details, diff] glibc 2.20 does not respect --enable-lock-elision=no without this patch My system was affected too, but I had it limp along for lack of time to debug. At the moment, I have rebuilt sys-libs/glibc-2.20-r2 with the following patch and --enable-lock-elision=no. Both are necessary to prevent glibc from emitting the elision instructions. Without it, udevd is killed by SIGILL: Core was generated by `/sbin/udevd --daemon --debug\'. Program terminated with signal SIGILL, Illegal instruction. #0 0x00007ff45b05da5a in _xbegin () at ../sysdeps/unix/sysv/linux/x86/hle.h:53 53 asm volatile (".byte 0xc7,0xf8 ; .long 0" : "+a" (ret) :: "memory"); (gdb) bt #0 0x00007ff45b05da5a in _xbegin () at ../sysdeps/unix/sysv/linux/x86/hle.h:53 #1 __GI___pthread_rwlock_rdlock (rwlock=0x7ff45bc43b20 <__libc_setlocale_lock>) at pthread_rwlock_rdlock.c:106 #2 0x00007ff45b8d9965 in __dcigettext (domainname=0x7ff45ba0d3d1 <_libc_intl_domainname> "libc", msgid1=0x7ff45ba0d7d6 "No such file or directory", msgid2=msgid2@entry=0x0, plural=plural@entry=0, n=n@entry=0, category=category@entry=5) at dcigettext.c:453 #3 0x00007ff45b8d834f in __GI___dcgettext (domainname=<optimized out>, msgid=<optimized out>, category=category@entry=5) at dcgettext.c:52 #4 0x00007ff45b92c63e in __GI___strerror_r (errnum=errnum@entry=2, buf=buf@entry=0x0, buflen=buflen@entry=0) at _strerror.c:71 #5 0x00007ff45b92c56f in strerror (errnum=errnum@entry=2) at strerror.c:32 #6 0x00007ff45bc51730 in kmod_module_get_initstate (mod=<optimized out>) at /usr/src/debug/sys-apps/kmod-20/kmod-20/libkmod/libkmod-module.c:1743 #7 0x00007ff45bc51cf5 in module_is_inkernel (mod=0x1e82c10) at /usr/src/debug/sys-apps/kmod-20/kmod-20/libkmod/libkmod-module.c:128 #8 kmod_module_probe_insert_module (mod=mod@entry=0x1e82c10, flags=flags@entry=131072, extra_options=extra_options@entry=0x0, run_install=run_install@entry=0x0, data=data@entry=0x0, print_action=print_action@entry=0x0) at /usr/src/debug/sys-apps/kmod-20/kmod-20/libkmod/libkmod-module.c:1252 #9 0x0000000000416dc1 in load_module (udev=<optimized out>, alias=<optimized out>) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-builtin-kmod.c:51 #10 builtin_kmod (dev=<optimized out>, argc=<optimized out>, argv=0x7fffbe4eb1e0, test=<optimized out>) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-builtin-kmod.c:84 #11 0x0000000000411f59 in udev_builtin_run (dev=0x1e80c20, cmd=cmd@entry=UDEV_BUILTIN_KMOD, command=command@entry=0x7fffbe4eba40 "kmod load cpu:type:x86,ven0000fam0006mod003C:feature:,0000,0001,0002,0003,0004,0005,0006,0007,0008,0009,000B,000C,000D,000E,000F,0010,0011,0013,0015,0016,0017,0018,0019,001A,001B,001C,001D,001F,002B,0"..., test=test@entry=false) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-builtin.c:122 #12 0x0000000000409fb5 in udev_event_execute_run (event=event@entry=0x1e82560, timeout_usec=180000000, timeout_warn_usec=60000000, sigmask=sigmask@entry=0x6477a0 <sigmask_orig>) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-event.c:917 #13 0x0000000000404b7b in worker_spawn (event=event@entry=0x1ed7ef0) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:344 #14 0x000000000040788b in event_run (event=0x1ed7ef0) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:473 #15 event_queue_start (udev=0x1e7b010) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:601 #16 main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:1502 Anyone who wants to get their own backtrace will want to rebuild glibc, kmod and udev with debuginfo using -ggdb in CFLAGS and FEATURES=splitdebug, set something like kernel.core_pattern=/var/crash/core-%e-%s-%u-%g-%p-%t and kernel.core_uses_pid=1 in /etc/sysctl.conf, disable udev from starting at boot and start it manually after boot. Magic Sysreq+R is needed to switch from X to a VT because otherwise X has control of the keyboard. That said, I cannot find the code that SpanKY mentions in comment #4. I found code in sysdeps/x86_64/multiarch/test-multiarch.c for checking avx, fma4, sse4_2, sse4_1, ssse3 and popcnt. I also found code in sysdeps/unix/sysv/linux/getsysstats.c for finding the number of CPUs and sysdeps/unix/sysv/linux/getsysstats.c for finding the clock speed. If there is code for doing what SpanKY suggests, either it is in newer versions of glibc or I missed it. Consequently, I do not believe that fixing the kernel to stop reporting hle will fix things for glibc. I think that it is possible to add logic for to the exception handler for exception 0x06 (Invalid Opcode) to make userland enter the fallback path, print a warning to dmesg and resume execution. I could see myself writing a kernel patch to do that. Due to my time being tight (especially before LinuxCon), I cannot promise that I will be able to write it. The relevant files for this code path are: nptl/pthread_rwlock_wrlock.c nptl/pthread_rwlock_rdlock.c sysdeps/x86/elide.h sysdeps/unix/sysv/linux/x86/hle.h Presumably, there would be a flag variable set by a check elsewhere, but I see no such thing. If someone else sees one, please reply saying where it is. i've added the iucode-tool package to the tree (bug 509742) and extended the microcode-data package with a USE=initramfs flag. this way you can quickly build the minimal initramfs needed in order to load the microcode at boot. to resolve the issue on your system: (1) get the latest kernel (linux-3.9+) (2) enable CONFIG_BLK_DEV_INITRD & CONFIG_MICROCODE & CONFIG_MICROCODE_EARLY in your kernel (and the intel option obviously) (3) install the microcode-data package with USE=initramfs (4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio (5) boot that kernel and it should load the microcode before running userspace i filed bug 557278 to integrate this logic into genkernel itself. if you don't want to do that any of that, then simply stop installing microcode updates. unmerge the microcode-data package from your system and/or disable CONFIG_MICROCODE. otherwise there's really nothing that can be done in userspace. you're trying to take a running process where the CPU supports an insn, then disable that insn on the fly, then expect that all the processes will suddenly stop trying to use that feature. that's why long running daemons crash -- when they launched, the insn was supported, but when they try to use it later, it isn't. i've also updated the microcode-ctl package to not include an init script since updating on the fly is dangerous http://gitweb.gentoo.org/repo/gentoo.git/commit/?id=719cc5ef240b766953ddbe1e7a6593f8091eed12 Just to add to this, from studying the logs, it looks like udev is loading the microcode: Aug 10 21:28:51 desktop kernel: udevd[4140]: starting version 3.1.2 Aug 10 21:28:51 desktop kernel: udevd[4176]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4177]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4178]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb4': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4179]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4181]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb4/4-0:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4180]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-0:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4183]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1/1-0:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4172]: Error changing net interface name 'eth0' to 'eth1': Device or resource busy Aug 10 21:28:51 desktop kernel: udevd[4184]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb5': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4185]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4187]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1/1-1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4189]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-9': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4190]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb6': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4188]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4192]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4193]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4191]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13/3-13:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4195]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb5/5-0:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4197]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb6/6-0:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4199]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1:1.1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4200]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2/2-1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4201]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.2': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4194]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4203]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4204]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4198]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2/2-0:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4207]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4196]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4211]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.2': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4202]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13/3-13.1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4209]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4210]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4206]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.2/3-14.2:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4212]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.2/3-14.1.2:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4213]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.2/3-14.1.2:1.1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4215]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.1': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4216]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13/3-13.1/3-13.1:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4218]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.3': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4217]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.0': No such file or directory Aug 10 21:28:51 desktop kernel: udevd[4214]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.2': No such file or directory Aug 10 21:28:51 desktop kernel: input: PC Speaker as /devices/platform/pcspkr/input/input9 Aug 10 21:28:51 desktop kernel: microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU0 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU1 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU2 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU3 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: CPU4 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU4 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k Aug 10 21:28:51 desktop kernel: e1000e: Copyright(c) 1999 - 2014 Intel Corporation. Aug 10 21:28:51 desktop kernel: e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode Aug 10 21:28:51 desktop kernel: microcode: CPU4 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: CPU5 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU5 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU5 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: CPU6 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU6 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU6 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: CPU7 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU7 sig=0x306c3, pf=0x2, revision=0x19 Aug 10 21:28:51 desktop kernel: microcode: CPU7 updated to revision 0x1c, date = 2014-07-03 Aug 10 21:28:51 desktop kernel: microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba Aug 10 21:28:51 desktop kernel: traps: udevd[4156] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4154] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4153] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4151] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4157] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4149] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: Linux video capture interface: v2.00 Aug 10 21:28:51 desktop kernel: traps: udevd[4159] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4171] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4166] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] Aug 10 21:28:51 desktop kernel: traps: udevd[4170] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000] My guess is that it loads the module(s) and then the new in-kernel microcode loading infrastructure loads it. Then the microcode changes under udev. The OpenRC script is not the cause of this problem. It hasn't even run by the time udev has segfaulted. A minor correction. That is sigilled, not segfaulted. (In reply to SpanKY from comment #41) > (3) install the microcode-data package with USE=initramfs The dependency to sys-apps/iucode_tool is missing: Calculating dependencies... done! [ebuild R ~] sys-apps/microcode-data-20150121-r1::gentoo USE="initramfs* split-ucode -monolithic" 0 KiB Total: 1 package (1 reinstall), Size of downloads: 0 KiB Would you like to merge these packages? [Yes/No] >>> Verifying ebuild manifests >>> Emerging (1 of 1) sys-apps/microcode-data-20150121-r1::gentoo * microcode-20150121.tgz SHA256 SHA512 WHIRLPOOL size ;-) ... [ ok ] >>> Unpacking source... >>> Unpacking microcode-20150121.tgz to /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work >>> Source unpacked in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work >>> Preparing source in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work ... >>> Source prepared. >>> Configuring source in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work ... >>> Source configured. >>> Compiling source in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work ... /var/tmp/portage/sys-apps/microcode-data-20150121-r1/temp/environment: line 741: iucode_tool: command not found init script still seems necessary: upon resume from sleep (I believe hibernation is the same as sleep), the microcode update is lost, and they have to be reloaded. without init script, how this should be done please? (In reply to LiuCougar from comment #46) > init script still seems necessary: upon resume from sleep (I believe > hibernation is the same as sleep), the microcode update is lost, and they > have to be reloaded. > > without init script, how this should be done please? I was wondering this too, and found this: "The cached microcode patch is applied when CPUs resume from a sleep state." from https://www.kernel.org/doc/Documentation/x86/early-microcode.txt I understand this would apply when resuming from suspend to RAM. With hibernation (suspend to disk), the kernel will boot again, presumably loading the initramfs. (In reply to SpanKY from comment #41) > (4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio This doesn't work for me, since that is already set to something else: "/usr/share/v86d/initramfs" for the uvesafb driver. If I set that option to: /usr/share/v86d/initramfs /lib/firmware/microcode.cpio then nothing works. I get no console (uvesafb doesn't work) and no microcode update (microcode.cpio is not included in the early initramfs). The first file (/usr/share/v86d/initramfs) is an auto-generated text file and contains: dir /dev 0755 0 0 nod /dev/console 0600 0 0 c 5 1 nod /dev/tty1 0600 0 0 c 4 1 nod /dev/zero 0600 0 0 c 1 5 nod /dev/mem 0600 0 0 c 1 1 dir /root 0700 0 0 dir /sbin 0755 0 0 file /sbin/v86d /sbin/v86d 0755 0 0 So... what do we do? OK, found a solution. You can add that to grub instead. For grub 2, it's important (at least here) to list it first, before your normal initrd image. For example: menuentry 'Gentoo' { ... initrd /lib/firmware/microcode.cpio /boot/initrd } If you add it after any other initrd image, it doesn't work. I think that for grub legacy, you can do this instead: initrd /lib/firmware/microcode.cpio initrd /boot/initrd (In reply to Nikos Chantziaras from comment #49) You can just concatenate these files. cpio archive with microcode must be the first. (In reply to Alexander Tsoy from comment #50) > You can just concatenate these files. cpio archive with microcode must be > the first. I could, but that would break future microcode updates. When the package is updated, the system would continue to use the old microcode. So the only maintainable solution is to let the bootmanager concatenate them. (In reply to jannis from comment #45) sorry, forgot to add the package to DEPEND http://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7b0f83daffd2d69390f6de2bff262e439a582861 (In reply to LiuCougar from comment #46) that's why the kernel saves & restores it itself. trying to do it in userland is still unsafe for the same reasons i detailed. (In reply to Nikos Chantziaras from comment #48) sorry, but that's something you'll have to manage yourself. (In reply to Nikos Chantziaras from comment #49) > OK, found a solution. You can add that to grub instead. For grub 2, it's > important (at least here) to list it first, before your normal initrd image. > For example: > > menuentry 'Gentoo' { > ... > initrd /lib/firmware/microcode.cpio /boot/initrd > } > > If you add it after any other initrd image, it doesn't work. Is there a known way to make grub2-mkconfig do this automagically? (In reply to SpanKY from comment #41) > > to resolve the issue on your system: > (1) get the latest kernel (linux-3.9+) > (2) enable CONFIG_BLK_DEV_INITRD & CONFIG_MICROCODE & CONFIG_MICROCODE_EARLY > in your kernel (and the intel option obviously) > (3) install the microcode-data package with USE=initramfs > (4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio > (5) boot that kernel and it should load the microcode before running > userspace Am I missing something obvious. I just receive a kernel panic (rootfs not found) when including the INITRAMFS_SOURCE. Kernel version is gentoo-sources-4.0.5. (In reply to Daniel Pielmeier from comment #54) > (In reply to SpanKY from comment #41) > > > > to resolve the issue on your system: > > (1) get the latest kernel (linux-3.9+) > > (2) enable CONFIG_BLK_DEV_INITRD & CONFIG_MICROCODE & CONFIG_MICROCODE_EARLY > > in your kernel (and the intel option obviously) > > (3) install the microcode-data package with USE=initramfs > > (4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio > > (5) boot that kernel and it should load the microcode before running > > userspace > > Am I missing something obvious. I just receive a kernel panic (rootfs not > found) when including the INITRAMFS_SOURCE. > > Kernel version is gentoo-sources-4.0.5. I had the same problem which was due to not having an initramfs (my previous kernel configuration did not have CONFIG_BLK_DEV_INITRD set). When I followed the 5 steps, the new kernel panics at boot. Problem explanation: 1st problem: Grub2 changes its behaviour because CONFIG_INITRAMFS_SOURCE is now set and not empty: it no more generates a "root=/dev/sda5" option (fallback behaviour due to the absence of initramfs) but instead write a "root=UUID=<uuid of sda5>" which does not work without an initramfs. Uncommenting line "#GRUB_DISABLE_LINUX_UUID=true" in /etc/default/grub restores the "root=/dev/sda5" but this only change the panic error from root device not found to root device cannot be mounted with error -2 (ENOENT). 2nd problem: When CONFIG_BLK_DEV_INITRD is not set, the kernel early create dirs /dev/ and /root/ and char device /dev/console in a ramfs then it mounts the root fs in /root/. When CONFIG_BLK_DEV_INITRD is set and CONFIG_INITRAMFS_SOURCE is empty, the build process creates an internal initramfs which includes these dirs and char device (see scripts/gen_initramfs_list.sh -d). However, when CONFIG_INITRAMFS_SOURCE is not empty, the build process uses only the content of this option. If you set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio and you don't have another initramfs which includes these 3 elements, the kernel fails to write to the console (error "Warning: unable to open an initial console") and cannot mount the rootfs in /root because it does not exist. My solution: If you don't use an initramfs, you have to follow all steps except the 4th one. Instead, leave CONFIG_INITRAMFS_SOURCE empty and ask your bootloader to load /lib/firmware/microcode.cpio as the initrd. Hence the kernel includes the internal initramfs with /dev/, /dev/console and /root/ which is unpacked first, then the /lib/firmware/microcode.cpio loaded by the bootloader is unpacked. As a result the microcode is early updated and the system boots. You even don't have to change /etc/default/grub. I hope this can help. (In reply to Tobias Klausmann from comment #53) > (In reply to Nikos Chantziaras from comment #49) > > OK, found a solution. You can add that to grub instead. For grub 2, it's > > important (at least here) to list it first, before your normal initrd image. > > For example: > > > > menuentry 'Gentoo' { > > ... > > initrd /lib/firmware/microcode.cpio /boot/initrd > > } > > > > If you add it after any other initrd image, it doesn't work. > > Is there a known way to make grub2-mkconfig do this automagically? It would be great that grub2-mkconfig also adds a line "initrd /lib/firmware/microcode.cpio" when no other initrd is used (see my comment #55). (In reply to Cédric Delmas from comment #55) > 2nd problem: When CONFIG_BLK_DEV_INITRD is not set, the kernel early create > dirs /dev/ and /root/ and char device /dev/console in a ramfs then it mounts > the root fs in /root/. > When CONFIG_BLK_DEV_INITRD is set and CONFIG_INITRAMFS_SOURCE is empty, the > build process creates an internal initramfs which includes these dirs and > char device (see scripts/gen_initramfs_list.sh -d). > However, when CONFIG_INITRAMFS_SOURCE is not empty, the build process uses > only the content of this option. If you set CONFIG_INITRAMFS_SOURCE to > /lib/firmware/microcode.cpio and you don't have another initramfs which > includes these 3 elements, the kernel fails to write to the console (error > "Warning: unable to open an initial console") and cannot mount the rootfs in > /root because it does not exist. Thanks for explanation, I can confirm this. I created bug 558192 to include those files into the microcode.cpio initramfs archive. |