Bug 528712

Summary:	linux-sources: cached /proc/cpuinfo is invalidated by updated microcode (which can break with glibc-2.20)
Product:	Gentoo Linux	Reporter:	Vlad Horko <scjthm>
Component:	[OLD] Core system	Assignee:	Gentoo Kernel Miscellaneous <kernel-misc>
Status:	RESOLVED FIXED
Severity:	normal	CC:	alexander, billie, dflogeras2, flyser42, james05+gentoo, kernel, krinpaus, kripton, lionel-dev, lists, marecki, microcai, necheffa.misc, realnc, rhill, ryao, saintdev, toolchain, uwe, wschlich
Priority:	Normal
Version:	unspecified
Hardware:	All
OS:	Linux
URL:	https://bugzilla.kernel.org/show_bug.cgi?id=88001
See Also:	https://bugzilla.kernel.org/show_bug.cgi?id=88001 https://bugzilla.redhat.com/show_bug.cgi?id=1083716
Whiteboard:	3.19
Package list:		Runtime testing required:	---
Bug Depends on:	557278
Bug Blocks:
Attachments:	glibc 2.20 does not respect --enable-lock-elision=no without this patch

Description Vlad Horko 2014-11-09 08:29:16 UTC

The microcode update in the firmware update has removed an instruction from the haswell cpu.  See https://bugs.launchpad.net/intel/+bug/1370352 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=762195

With glibc.2.20. Sorry I cannot get anymore info but the build machine isn't booting.

Reproducible: Always




Try downgrading the firmware or disable the microcode update from the firmware.

Comment 1 Samuli Suominen (RETIRED) gentoo-dev

2014-11-09 09:19:39 UTC

Output of...

# emerge -pv sys-apps/microcode-ctl sys-apps/microcode-data

As well as `emerge --info` output...

Are required.

Changing subject, since the bug seems to be in the microcode, rather than the loader, and in any case, it's no longer the job of an udev to load anykind of firmwares, with USE=firmware-loader being obsolete and removed in latest version

Comment 2 Vlad Horko 2014-11-09 09:27:18 UTC

Rebuilding glibc --enable-lock-elision=no.

Thanks for the rules and regulations, but I am happy now. If anyone else wants their system to boot, and then have systemd and they find that they cannot because this crashes quite early on, or god forbid they use a ramfs that has glibc in it, then they should build glibc with --enable-lock-elision=no


(In reply to Samuli Suominen from comment #1)
> Output of...
> 
> # emerge -pv sys-apps/microcode-ctl sys-apps/microcode-data
> 
> As well as `emerge --info` output...
> 
> Are required.

Whats your address? I can send you some fries with that.

> Changing subject, since the bug seems to be in the microcode, rather than
> the loader, and in any case, it's no longer the job of an udev to load
> anykind of firmwares, with USE=firmware-loader being obsolete and removed in
> latest version

Comment 3 Samuli Suominen (RETIRED) gentoo-dev

2014-11-09 10:55:27 UTC

Kernel at least 3.7 and built with CONFIG_FW_LOADER_USER_HELPER=n and systemd built with USE="-firmware-loader" or systemd at least 217 where the userspace loader has been removed?
Just trying to estabilish if it's the kernel, or the systemd(-udevd) that's failing to load the firmware
You should be migrating to the kernelspace loader, the userspace loader is obsolete
Not sure if it has any impact if glibc is 2.20 or not when using the kernelspace loader...

Comment 4 SpanKY gentoo-dev

2014-11-09 15:13:07 UTC

my guess is you're running into this bug:
https://sourceware.org/ml/libc-alpha/2014-11/msg00110.html

which is where the kernel checks functionality, caches the result, then loads the  new firmware which changes the functionality, but doesn't update the cache.

glibc then checks cpuinfo (which reports the stalet cache) to see which functionality to enable, sees that HLE is available, and tries to use it.  then everything falls down.

i.e. it's a bug in the kernel.

Comment 5 Vlad Horko 2014-11-11 07:37:03 UTC

(In reply to SpanKY from comment #4)
> my guess is you're running into this bug:
> https://sourceware.org/ml/libc-alpha/2014-11/msg00110.html
> 
> which is where the kernel checks functionality, caches the result, then
> loads the  new firmware which changes the functionality, but doesn't update
> the cache.
> 
> glibc then checks cpuinfo (which reports the stalet cache) to see which
> functionality to enable, sees that HLE is available, and tries to use it. 
> then everything falls down.
> 
> i.e. it's a bug in the kernel.

It might be. Its a vanilla kernel. But, in order for this machine to boot I had to completely remove any code which contained the _xbegin transaction instructions from libpthread. objdump -D on the libpthread was showing that the intruction was still there even after adding  --enable-lock-elision=no. I had to  make certain that the #define was set to zero. (I think that the the CPU that I have a desktop version of haswell) doesn't even support these instructions. I think that I might be due a refund from intel, if it these were supported on the processor).

I wasn't able to downgrade glibc and didn't want to go down that path but I believe that the most recent change was the upgrade from glibc 2.19 to glib 2.20. 

In order for the systemd based machine to boot, I had completely remove the code from glibc. Also, I think that the cpu microcode/firmware changes are permanent. I hope that this helps anyone else out there.

Comment 6 microcai 2014-11-11 13:09:56 UTC

disabling microcode update resovled the systemd-udevd invalid op problem.

Comment 7 Vlad Horko 2014-11-12 03:36:14 UTC

(In reply to microcai from comment #6)
> disabling microcode update resovled the systemd-udevd invalid op problem.

How did you do that? Does systemd update the microcode or do you have a systemd service that runs microcode_ctl?

Comment 8 microcai 2014-11-12 03:38:28 UTC

systemd does not need microcode_ctl , it load microcode.ko and microcode.ko loads microcode.dat itself.

I just backlisted microcode.ko

Comment 9 Vlad Horko 2014-11-12 11:06:39 UTC

(In reply to SpanKY from comment #4)
> my guess is you're running into this bug:
> https://sourceware.org/ml/libc-alpha/2014-11/msg00110.html
> 
> which is where the kernel checks functionality, caches the result, then
> loads the  new firmware which changes the functionality, but doesn't update
> the cache.
> 
> glibc then checks cpuinfo (which reports the stalet cache) to see which
> functionality to enable, sees that HLE is available, and tries to use it. 
> then everything falls down.
> 
> i.e. it's a bug in the kernel.

If another user was able to reproduce this by stopping the microcode from being downloaded, and I had to remove all tsx instructions, then I would disagree. 

I think that glib bypasses the kernel and doesn't check for the instructions that are supported. I don't believe that you need to be running in ring level 0 to do this.

Comment 10 Nathan Caldwell 2014-11-13 02:56:40 UTC

Hi, I'm experiencing a similar issue, although with I'm using OpenRC, not systemd.

For me, lightdm would fail to start, not the entire init system failing. Looking in dmesg, I would have

traps: lightdm[3009] trap invalid opcode ip:7f4257b5895a sp:7fff24779ce8 error:0 in libpthread-2.20.so[7f4257b4d000+16000]
traps: console-kit-dae[2844] trap invalid opcode ip:7f5d382cdc52 sp:7fff3c837878 error:0 in libpthread-2.20.so[7f5d382c2000+16000]

If I attempted to restart lightdm, dbus would then crash again with an invalid opcode.

traps: dbus-daemon[2592] trap invalid opcode ip:7f20a5f0c95a sp:7fffda2e6478 error:0 in libpthread-2.20.so[7f20a5f01000+16000]

After that if restart dbus, consolekit and lightdm everything worked great.

After finding the launchpad bug in comment #1, I did a couple of tests. Disabling the microcode_ctl service and everything started up fine. Moving the microcode_ctl service to boot instead of default runlevel and again, everything started fine.

What I suspect is going on is each of these binaries are starting up, initializing libpthread for their threads. libpthread's CPU dispatcher checks if the CPU supports HLE (this is done by calling the CPUID instruction). Determines it does support HLE and uses the HLE codepath for locking. Now, microcode_ctl starts and updates the microcode, disabling the HLE instruction. libpthread goes to lock something (using HLE) and crashes, because that instruction is no longer supported. After restarting the service, libpthread again checks CPUID and now determines that HLE is not supported, so it doesn't attempt to use the HLE codepath.

In the case of systemd most likely a similar issue is happening. systemd starts, libpthread sees HLE support, microcode is loaded by systemd, libpthread attempts to lock, systemd crashes.

Comment 11 SpanKY gentoo-dev

2015-03-03 02:09:55 UTC

*** Bug 531026 has been marked as a duplicate of this bug. ***

Comment 12 Nathan Caldwell 2015-03-03 02:29:19 UTC

Since this seems to be the parent bug now, could the summary be adjusted to be something that makes sense? Bug 531528 or Bug 531026 seem to have the best summaries of what is actually happening.

Also, what needs to be done to get the patch in Bug 531528 included in portage so this stops breaking systems?

Comment 13 SpanKY gentoo-dev

2015-03-03 03:52:32 UTC

(In reply to Nathan Caldwell from comment #12)

since the bug isn't in glibc, i have no plans to include patches for it.  not sure the patch is even correct, although i haven't looked too closely at it and comparing working-vs-broken hardware.

Comment 14 Markos Chandras (RETIRED) gentoo-dev

2015-03-03 22:33:08 UTC

The upstream tracker bug contains a patch that seems to have made it to the latest kernels.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fb86b97300d930b57471068720c52bfa8622eab7

However it seems the stable kernels haven't picked it up yet (at least 3.14 seems to not contain this patch so maybe worth applying this patch) so maybe worth applying this patch to our gentoo-sources?

I have cc'd the kernel@ team for that.

Comment 15 Ryan Hill (RETIRED) gentoo-dev

2015-03-26 05:38:50 UTC

IIUC that patch just covers a corner case for resuming from suspend.  It doesn't solve the root problem.

If we're not going to patch glibc or the kernel then we need to start loading microcode earlier in the boot process.  To do that we would need to enable MICROCODE_EARLY in the kernel, which requires MICROCODE to be built-in (not a module) and BLK_DEV_INITRD.  Every Haswell system would then need to use an initrd.

On my system it wasn't enough to take microcode_ctl out of the runlevels.  It seems udev happily loads it anyways.  I had to uninstall it completely.

This sucks.  I really hope there's a kernel/glibc solution.

Comment 16 SpanKY gentoo-dev

2015-03-26 06:17:08 UTC

the elision code, iirc, ran into issues in older versions independent of the TSX microcode issue.  i can add --enable-lock-elision=no to <=glibc-2.20 to keep stable limping along, but that doesn't help with newer versions where the code is doing the right thing and the microcode/kernel handling is still broken.

can you verify that flag helps w/glibc-2.20 ?  the configure code seems to already disable things by default:
AC_ARG_ENABLE([lock-elision],
          AC_HELP_STRING([--enable-lock-elision[=yes/no]],
                 [Enable lock elision for pthread mutexes by default]),
          [enable_lock_elision=$enableval],
          [enable_lock_elision=no])

so i can't see how --enable-lock-elision=no is any different from not setting it at all -- enable_lock_elision is getting set to no either way.

Comment 17 SpanKY gentoo-dev

2015-03-27 20:19:12 UTC

*** Bug 544498 has been marked as a duplicate of this bug. ***

Comment 18 Lionel Bouton 2015-03-28 11:25:08 UTC

We just upgraded one of our servers to glibc-2.20-r2 and started to get similar errors. We're pretty sure the glibc upgrade triggered this : zabbix polls our ceph cluster every 20s and according to emerge.log and our kernel logs we saw the first invalid opcode message less than 10s after the glibc was installed.

Mar 27 22:44:10 virtcluster-02-a kernel: traps: ceph[4275] trap invalid opcode ip:7f7e0503e492 sp:7f7dfe11c218 error:0 in libpthread-2.20.so[7f7e05032000+16000]

We run the latest stable gentoo-sources (3.18.9) and tried several configuration without success from a base configuration where microcode was loaded by initramfs :
* loading the latest (~20150121) microcode-data manually,
* loading microcode update early,
* disabling microcode update support in the kernel.

Nothing worked. Fortunately the build chain still works so I could upgrade/downgrade (unfortunately downgrading glibc isn't allowed).

Is there any safe way to fix this short of a full chroot reinstall where with package.mask >=glibc-2.20?

I found a Gentoo glibc downgrade guide on the wiki but it doesn't really seem safe...

Comment 19 SpanKY gentoo-dev

2015-03-28 17:08:11 UTC

(In reply to Lionel Bouton from comment #18)

uninstall microcode packages

Comment 20 Lionel Bouton 2015-03-28 18:47:06 UTC

(In reply to SpanKY from comment #19)
> (In reply to Lionel Bouton from comment #18)
> 
> uninstall microcode packages

I did, they weren't installed in the first place.

Comment 21 Lionel Bouton 2015-03-28 18:52:44 UTC

(In reply to Lionel Bouton from comment #20)
> (In reply to SpanKY from comment #19)
> > (In reply to Lionel Bouton from comment #18)
> > 
> > uninstall microcode packages
> 
> I did, they weren't installed in the first place.

By the way I was under the impression that microcode_ctl was using the kernel interface, so I don't see how they could work without kernel support :
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/5.6_Technical_Notes/microcode_ctl.html

If there is a way for microcode_ctl to work without kernel support I'll have to test this later : the server is still usable and used in a Ceph cluster under heavy rebalancing right now (additional load would bring it to its knees).

Comment 22 Nathan Caldwell 2015-03-28 19:19:11 UTC

(In reply to Lionel Bouton from comment #21)
> (In reply to Lionel Bouton from comment #20)
> > (In reply to SpanKY from comment #19)
> > > (In reply to Lionel Bouton from comment #18)
> > > 
> > > uninstall microcode packages
> > 
> > I did, they weren't installed in the first place.
> 
> By the way I was under the impression that microcode_ctl was using the
> kernel interface, so I don't see how they could work without kernel support :
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/
> html/5.6_Technical_Notes/microcode_ctl.html
> 
> If there is a way for microcode_ctl to work without kernel support I'll have
> to test this later : the server is still usable and used in a Ceph cluster
> under heavy rebalancing right now (additional load would bring it to its
> knees).

I don't see how it would still cause issues without the microcode update in the kernel. You can try using epatch_user to apply glibc-2.20-blacklist_HLERTM_Haswell.patch from Bug 531520. That way glibc never sees the instruction available even if it is loaded before the microcode.

Comment 23 Nathan Caldwell 2015-03-28 19:20:01 UTC

> 
> I don't see how it would still cause issues without the microcode update in
> the kernel. You can try using epatch_user to apply
> glibc-2.20-blacklist_HLERTM_Haswell.patch from Bug 531520. 

Bug 531528, sorry.

Comment 24 Lionel Bouton 2015-03-28 20:26:17 UTC

(In reply to Nathan Caldwell from comment #23)
> > 
> > I don't see how it would still cause issues without the microcode update in
> > the kernel. You can try using epatch_user to apply
> > glibc-2.20-blacklist_HLERTM_Haswell.patch from Bug 531520. 
> 
> Bug 531528, sorry.

Unfortunately the traps are still here (I didn't reboot yet but glibc is installed with the patch and if I understand the problem correctly it should have been fixed).

Reviewing the patch, it only deals with processor models 60, 63, 69 and 70.
Our processor is a Xeon E5-2650, family 6, model 45, stepping 7.

Are models missing from the patch or is there a separate bug?

Comment 25 SpanKY gentoo-dev

2015-03-28 20:45:58 UTC

(In reply to Lionel Bouton from comment #24)

sounds like a different issue.  file a different bug please and we'll triage it there.

Comment 26 SpanKY gentoo-dev

2015-04-16 19:15:31 UTC

*** Bug 546666 has been marked as a duplicate of this bug. ***

Comment 27 SpanKY gentoo-dev

2015-04-20 16:04:34 UTC

*** Bug 547166 has been marked as a duplicate of this bug. ***

Comment 28 SpanKY gentoo-dev

2015-04-20 16:05:45 UTC

*** Bug 547164 has been marked as a duplicate of this bug. ***

Comment 29 Marek Szuba archtester

2015-04-21 12:25:11 UTC

Meanwhile, sys-apps/microcode-data-20150121 - which includes the TSX-disabling erratum - has been marked stable on both amd64 and x86 lately. In other words, as of now ANY Haswell box running Gentoo + "emerge -uD @world" + reboot or microcode reload = unusable system. Fun.

Comment 30 Mike Pagano gentoo-dev

2015-04-21 17:00:57 UTC

Sorry for the delay, I'll grab the patch for gentoo-sources and start rolling out releases.

Thanks, Markos.

Comment 31 Mike Pagano gentoo-dev

2015-04-21 17:12:33 UTC

For the record, the referenced patch from comment #14 is already in the following kernels:

>= v3.18
>= v3.19
v4.0

Comment 32 Marek Szuba archtester

2015-04-21 18:07:13 UTC

If 3.18 and newer already contain this patch then unfortunately it does NOT fix the libpthread problem - the Haswell system I managed to kill with TSX-disabling microcode last Saturday runs the latest stable version of hardened-sources i.e. 3.18. Looks like what I've read in the Debian bug report on the matter is indeed correct - glibc identifies CPU capabilities without involving the Linux kernel to patching the problem with /proc/cpuinfo does not help.

Given SpanKY seems to dislike the idea of patching glibc to blacklist Haswell from hardware lock elision and that as a result of the erratum no existing CPUs actually support it, looks like we should indeed disable it at build time (assuming it is really possible 2.20, apparently in 2.19 the relevant configure option failed to fully disable HLE) and hope that by the time CPUs with working TSX reach the market the blacklist patch will have made it upstream.

Comment 33 David Flogeras 2015-04-21 18:20:34 UTC

I too am running 3.18.11 (stable).

I checked my local 3.18.11 source tree and the file from the patch:
/usr/src/linux/arch/x86/kernel/cpu/microcode/core.c

does not seem to have the added lines from said patch.

Comment 34 Tamas Jantvik 2015-04-28 14:18:30 UTC

I hit this as well. Spent my time on restoring my system from a two-month-old backup and some additional work before I stumbled upon some invalid opcode errors in my dmesg that I could finally Google, and that led me here. It's been fun to say the least.

I too run a 3.18.11 kernel, with genpatches (and others), and I hit this never the less.

Have now a kernel with early microcode update support, and a dracut-initramfs that with the updated microcode embedded. So 'tis all good now. But I, as others above, would of course wish for smoother procedures.

Comment 35 arom 2015-06-02 16:33:24 UTC

(In reply to Tamas Jantvik from comment #34)
> I hit this as well. Spent my time on restoring my system

Exact same story here. I've done a world update last friday and got an unbootable laptop. Since i was upgrading both glibc and gcc (4.8.4 required by glibc), i thought that was a build problem. I've solved only today (tuesday, after much sleep deprivation) by just masking the latest microcode-data and going back to microcode-data-20140430.

Maybe =sys-apps/microcode-data-20150121 should NOT be marked as stable yet ??

Personally, i would prefer a system that MAY, in some remote cases, hit a subtle cpu bug, rather than a system that won't boot, period.

Comment 36 Mike Pagano gentoo-dev

2015-06-05 17:00:58 UTC

Does this patch from fedora fix this issue?

http://pkgs.fedoraproject.org/cgit/kernel.git/commit/?h=f21&id=a357c223627a0fe704e38e5c4eb4aa39869cd114

-CONFIG_MICROCODE=m
+CONFIG_MICROCODE=y
+CONFIG_MICROCODE_EARLY=y
 CONFIG_MICROCODE_INTEL=y
 CONFIG_MICROCODE_INTEL_EARLY=y
 CONFIG_MICROCODE_AMD=y
+CONFIG_MICROCODE_AMD_EARLY=y

Or does this have to happen in conjunction with the glibc blacklist patch that debian and I think fedora is using?

Comment 37 Ryan Hill (RETIRED) gentoo-dev

2015-06-06 08:59:44 UTC

That's the early microcode loading I was talking about in comment #15.  It requires people set up and use a suitably configured initrd so it's not really a solution for us.

Comment 38 ganthore 2015-06-18 15:17:38 UTC

I agree that sys-apps/microcode-data-20150121 should be masked until a solution is in place. My system also fell victim to this problem.

Comment 39 Richard Yao (RETIRED) gentoo-dev

2015-08-11 03:46:20 UTC

Created attachment 408770 [details, diff]
glibc 2.20 does not respect --enable-lock-elision=no without this patch

My system was affected too, but I had it limp along for lack of time to debug. At the moment, I have rebuilt sys-libs/glibc-2.20-r2 with the following patch and --enable-lock-elision=no. Both are necessary to prevent glibc from emitting the elision instructions.

Without it, udevd is killed by SIGILL:

Core was generated by `/sbin/udevd --daemon --debug\'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00007ff45b05da5a in _xbegin () at ../sysdeps/unix/sysv/linux/x86/hle.h:53
53        asm volatile (".byte 0xc7,0xf8 ; .long 0" : "+a" (ret) :: "memory");
(gdb) bt
#0  0x00007ff45b05da5a in _xbegin () at ../sysdeps/unix/sysv/linux/x86/hle.h:53
#1  __GI___pthread_rwlock_rdlock (rwlock=0x7ff45bc43b20 <__libc_setlocale_lock>) at pthread_rwlock_rdlock.c:106
#2  0x00007ff45b8d9965 in __dcigettext (domainname=0x7ff45ba0d3d1 <_libc_intl_domainname> "libc", msgid1=0x7ff45ba0d7d6 "No such file or directory", msgid2=msgid2@entry=0x0, plural=plural@entry=0, n=n@entry=0, category=category@entry=5)
    at dcigettext.c:453
#3  0x00007ff45b8d834f in __GI___dcgettext (domainname=<optimized out>, msgid=<optimized out>, category=category@entry=5) at dcgettext.c:52
#4  0x00007ff45b92c63e in __GI___strerror_r (errnum=errnum@entry=2, buf=buf@entry=0x0, buflen=buflen@entry=0) at _strerror.c:71
#5  0x00007ff45b92c56f in strerror (errnum=errnum@entry=2) at strerror.c:32
#6  0x00007ff45bc51730 in kmod_module_get_initstate (mod=<optimized out>) at /usr/src/debug/sys-apps/kmod-20/kmod-20/libkmod/libkmod-module.c:1743
#7  0x00007ff45bc51cf5 in module_is_inkernel (mod=0x1e82c10) at /usr/src/debug/sys-apps/kmod-20/kmod-20/libkmod/libkmod-module.c:128
#8  kmod_module_probe_insert_module (mod=mod@entry=0x1e82c10, flags=flags@entry=131072, extra_options=extra_options@entry=0x0, run_install=run_install@entry=0x0, data=data@entry=0x0, print_action=print_action@entry=0x0)
    at /usr/src/debug/sys-apps/kmod-20/kmod-20/libkmod/libkmod-module.c:1252
#9  0x0000000000416dc1 in load_module (udev=<optimized out>, alias=<optimized out>) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-builtin-kmod.c:51
#10 builtin_kmod (dev=<optimized out>, argc=<optimized out>, argv=0x7fffbe4eb1e0, test=<optimized out>) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-builtin-kmod.c:84
#11 0x0000000000411f59 in udev_builtin_run (dev=0x1e80c20, cmd=cmd@entry=UDEV_BUILTIN_KMOD,
    command=command@entry=0x7fffbe4eba40 "kmod load cpu:type:x86,ven0000fam0006mod003C:feature:,0000,0001,0002,0003,0004,0005,0006,0007,0008,0009,000B,000C,000D,000E,000F,0010,0011,0013,0015,0016,0017,0018,0019,001A,001B,001C,001D,001F,002B,0"..., test=test@entry=false) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-builtin.c:122
#12 0x0000000000409fb5 in udev_event_execute_run (event=event@entry=0x1e82560, timeout_usec=180000000, timeout_warn_usec=60000000, sigmask=sigmask@entry=0x6477a0 <sigmask_orig>)
    at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udev-event.c:917
#13 0x0000000000404b7b in worker_spawn (event=event@entry=0x1ed7ef0) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:344
#14 0x000000000040788b in event_run (event=0x1ed7ef0) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:473
#15 event_queue_start (udev=0x1e7b010) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:601
#16 main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/sys-fs/eudev-3.1.2/eudev-3.1.2/src/udev/udevd.c:1502

Anyone who wants to get their own backtrace will want to rebuild glibc, kmod and udev with debuginfo using -ggdb in CFLAGS and FEATURES=splitdebug, set something like kernel.core_pattern=/var/crash/core-%e-%s-%u-%g-%p-%t and kernel.core_uses_pid=1 in /etc/sysctl.conf, disable udev from starting at boot and start it manually after boot. Magic Sysreq+R is needed to switch from X to a VT because otherwise X has control of the keyboard.

That said, I cannot find the code that SpanKY mentions in comment #4. I found code in sysdeps/x86_64/multiarch/test-multiarch.c for checking avx, fma4, sse4_2, sse4_1, ssse3 and popcnt. I also found code in sysdeps/unix/sysv/linux/getsysstats.c for finding the number of CPUs and sysdeps/unix/sysv/linux/getsysstats.c for finding the clock speed. If there is code for doing what SpanKY suggests, either it is in newer versions of glibc or I missed it.

Consequently, I do not believe that fixing the kernel to stop reporting hle will fix things for glibc. I think that it is possible to add logic for to the exception handler for exception 0x06 (Invalid Opcode) to make userland enter the fallback path, print a warning to dmesg and resume execution. I could see myself writing a kernel patch to do that. Due to my time being tight (especially before LinuxCon), I cannot promise that I will be able to write it.

Comment 40 Richard Yao (RETIRED) gentoo-dev

2015-08-11 03:54:00 UTC

The relevant files for this code path are:

nptl/pthread_rwlock_wrlock.c
nptl/pthread_rwlock_rdlock.c
sysdeps/x86/elide.h
sysdeps/unix/sysv/linux/x86/hle.h

Presumably, there would be a flag variable set by a check elsewhere, but I see no such thing. If someone else sees one, please reply saying where it is.

Comment 41 SpanKY gentoo-dev

2015-08-11 06:17:21 UTC

i've added the iucode-tool package to the tree (bug 509742) and extended the microcode-data package with a USE=initramfs flag.  this way you can quickly build the minimal initramfs needed in order to load the microcode at boot.

to resolve the issue on your system:
(1) get the latest kernel (linux-3.9+)
(2) enable CONFIG_BLK_DEV_INITRD & CONFIG_MICROCODE & CONFIG_MICROCODE_EARLY in your kernel (and the intel option obviously)
(3) install the microcode-data package with USE=initramfs
(4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio
(5) boot that kernel and it should load the microcode before running userspace

i filed bug 557278 to integrate this logic into genkernel itself.

if you don't want to do that any of that, then simply stop installing microcode updates.  unmerge the microcode-data package from your system and/or disable CONFIG_MICROCODE.

otherwise there's really nothing that can be done in userspace.  you're trying to take a running process where the CPU supports an insn, then disable that insn on the fly, then expect that all the processes will suddenly stop trying to use that feature.  that's why long running daemons crash -- when they launched, the insn was supported, but when they try to use it later, it isn't.

Comment 42 SpanKY gentoo-dev

2015-08-11 06:36:52 UTC

i've also updated the microcode-ctl package to not include an init script since updating on the fly is dangerous

http://gitweb.gentoo.org/repo/gentoo.git/commit/?id=719cc5ef240b766953ddbe1e7a6593f8091eed12

Comment 43 Richard Yao (RETIRED) gentoo-dev

2015-08-11 06:51:03 UTC

Just to add to this, from studying the logs, it looks like udev is loading the microcode:

Aug 10 21:28:51 desktop kernel: udevd[4140]: starting version 3.1.2
Aug 10 21:28:51 desktop kernel: udevd[4176]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4177]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4178]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb4': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4179]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4181]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb4/4-0:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4180]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-0:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4183]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1/1-0:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4172]: Error changing net interface name 'eth0' to 'eth1': Device or resource busy
Aug 10 21:28:51 desktop kernel: udevd[4184]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb5': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4185]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4187]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1/1-1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4189]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-9': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4190]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb6': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4188]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4192]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4193]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4191]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13/3-13:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4195]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb5/5-0:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4197]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1c.6/0000:05:00.0/usb6/6-0:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4199]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1:1.1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4200]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2/2-1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4201]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.2': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4194]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4203]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4204]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4198]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2/2-0:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4207]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4196]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-1/3-1:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4211]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.2': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4202]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13/3-13.1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4209]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4210]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4206]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.2/3-14.2:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4212]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.2/3-14.1.2:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4213]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.2/3-14.1.2:1.1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4215]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.1': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4216]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-13/3-13.1/3-13.1:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4218]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.3': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4217]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.0': No such file or directory
Aug 10 21:28:51 desktop kernel: udevd[4214]: failed to execute '/lib/udev/usb-db' 'usb-db /devices/pci0000:00/0000:00:14.0/usb3/3-14/3-14.1/3-14.1.1/3-14.1.1:1.2': No such file or directory
Aug 10 21:28:51 desktop kernel: input: PC Speaker as /devices/platform/pcspkr/input/input9
Aug 10 21:28:51 desktop kernel: microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU0 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU1 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU2 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU3 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: CPU4 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU4 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
Aug 10 21:28:51 desktop kernel: e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
Aug 10 21:28:51 desktop kernel: e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Aug 10 21:28:51 desktop kernel: microcode: CPU4 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: CPU5 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU5 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU5 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: CPU6 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU6 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU6 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: CPU7 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU7 sig=0x306c3, pf=0x2, revision=0x19
Aug 10 21:28:51 desktop kernel: microcode: CPU7 updated to revision 0x1c, date = 2014-07-03
Aug 10 21:28:51 desktop kernel: microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
Aug 10 21:28:51 desktop kernel: traps: udevd[4156] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4154] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4153] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4151] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4157] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4149] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: Linux video capture interface: v2.00
Aug 10 21:28:51 desktop kernel: traps: udevd[4159] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4171] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4166] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]
Aug 10 21:28:51 desktop kernel: traps: udevd[4170] trap invalid opcode ip:7f42f29f6a5a sp:7ffcbb1ea908 error:0 in libpthread-2.20.so[7f42f29eb000+16000]

My guess is that it loads the module(s) and then the new in-kernel microcode loading infrastructure loads it. Then the microcode changes under udev. The OpenRC script is not the cause of this problem. It hasn't even run by the time udev has segfaulted.

Comment 44 Richard Yao (RETIRED) gentoo-dev

2015-08-11 06:51:34 UTC

A minor correction. That is sigilled, not segfaulted.

Comment 45 jannis 2015-08-11 11:25:23 UTC

(In reply to SpanKY from comment #41)
> (3) install the microcode-data package with USE=initramfs

The dependency to sys-apps/iucode_tool is missing:

Calculating dependencies... done!
[ebuild   R   ~] sys-apps/microcode-data-20150121-r1::gentoo  USE="initramfs* split-ucode -monolithic" 0 KiB

Total: 1 package (1 reinstall), Size of downloads: 0 KiB

Would you like to merge these packages? [Yes/No] 

>>> Verifying ebuild manifests

>>> Emerging (1 of 1) sys-apps/microcode-data-20150121-r1::gentoo
 * microcode-20150121.tgz SHA256 SHA512 WHIRLPOOL size ;-) ...                                                                                                                                                                                                                                          [ ok ]
>>> Unpacking source...
>>> Unpacking microcode-20150121.tgz to /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work
>>> Source unpacked in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work
>>> Preparing source in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work ...
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work ...
>>> Source configured.
>>> Compiling source in /var/tmp/portage/sys-apps/microcode-data-20150121-r1/work ...
/var/tmp/portage/sys-apps/microcode-data-20150121-r1/temp/environment: line 741: iucode_tool: command not found

Comment 46 LiuCougar 2015-08-11 11:36:45 UTC

init script still seems necessary: upon resume from sleep (I believe hibernation is the same as sleep), the microcode update is lost, and they have to be reloaded.

without init script, how this should be done please?

Comment 47 Risto A. Paju 2015-08-11 12:18:04 UTC

(In reply to LiuCougar from comment #46)
> init script still seems necessary: upon resume from sleep (I believe
> hibernation is the same as sleep), the microcode update is lost, and they
> have to be reloaded.
> 
> without init script, how this should be done please?

I was wondering this too, and found this:

"The cached microcode patch is applied when CPUs resume from a sleep state."

from https://www.kernel.org/doc/Documentation/x86/early-microcode.txt

I understand this would apply when resuming from suspend to RAM. With hibernation (suspend to disk), the kernel will boot again, presumably loading the initramfs.

Comment 48 Nikos Chantziaras 2015-08-11 17:12:53 UTC

(In reply to SpanKY from comment #41)
> (4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio

This doesn't work for me, since that is already set to something else: "/usr/share/v86d/initramfs" for the uvesafb driver. If I set that option to:

  /usr/share/v86d/initramfs /lib/firmware/microcode.cpio

then nothing works. I get no console (uvesafb doesn't work) and no microcode update (microcode.cpio is not included in the early initramfs).

The first file (/usr/share/v86d/initramfs) is an auto-generated text file and contains:

  dir /dev 0755 0 0
  nod /dev/console 0600 0 0 c 5 1
  nod /dev/tty1 0600 0 0 c 4 1
  nod /dev/zero 0600 0 0 c 1 5
  nod /dev/mem 0600 0 0 c 1 1
  dir /root 0700 0 0
  dir /sbin 0755 0 0
  file /sbin/v86d /sbin/v86d 0755 0 0

So... what do we do?

Comment 49 Nikos Chantziaras 2015-08-11 21:48:15 UTC

OK, found a solution. You can add that to grub instead. For grub 2, it's important (at least here) to list it first, before your normal initrd image. For example:

  menuentry 'Gentoo' {
      ...
      initrd /lib/firmware/microcode.cpio /boot/initrd
  }

If you add it after any other initrd image, it doesn't work.

I think that for grub legacy, you can do this instead:

  initrd /lib/firmware/microcode.cpio
  initrd /boot/initrd

Comment 50 Alexander Tsoy 2015-08-11 21:52:40 UTC

(In reply to Nikos Chantziaras from comment #49)
You can just concatenate these files. cpio archive with microcode must be the first.

Comment 51 Nikos Chantziaras 2015-08-11 22:02:18 UTC

(In reply to Alexander Tsoy from comment #50)
> You can just concatenate these files. cpio archive with microcode must be
> the first.

I could, but that would break future microcode updates. When the package is updated, the system would continue to use the old microcode.

So the only maintainable solution is to let the bootmanager concatenate them.

Comment 52 SpanKY gentoo-dev

2015-08-12 03:31:25 UTC

(In reply to jannis from comment #45)

sorry, forgot to add the package to DEPEND
http://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7b0f83daffd2d69390f6de2bff262e439a582861

(In reply to LiuCougar from comment #46)

that's why the kernel saves & restores it itself.  trying to do it in userland is still unsafe for the same reasons i detailed.

(In reply to Nikos Chantziaras from comment #48)

sorry, but that's something you'll have to manage yourself.

Comment 53 Tobias Klausmann (RETIRED) gentoo-dev

2015-08-12 09:40:24 UTC

(In reply to Nikos Chantziaras from comment #49)
> OK, found a solution. You can add that to grub instead. For grub 2, it's
> important (at least here) to list it first, before your normal initrd image.
> For example:
> 
>   menuentry 'Gentoo' {
>       ...
>       initrd /lib/firmware/microcode.cpio /boot/initrd
>   }
> 
> If you add it after any other initrd image, it doesn't work.

Is there a known way to make grub2-mkconfig do this automagically?

Comment 54 Daniel Pielmeier gentoo-dev

2015-08-12 20:37:24 UTC

(In reply to SpanKY from comment #41)
> 
> to resolve the issue on your system:
> (1) get the latest kernel (linux-3.9+)
> (2) enable CONFIG_BLK_DEV_INITRD & CONFIG_MICROCODE & CONFIG_MICROCODE_EARLY
> in your kernel (and the intel option obviously)
> (3) install the microcode-data package with USE=initramfs
> (4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio
> (5) boot that kernel and it should load the microcode before running
> userspace

Am I missing something obvious. I just receive a kernel panic (rootfs not found) when including the INITRAMFS_SOURCE.

Kernel version is gentoo-sources-4.0.5.

Comment 55 Cédric Delmas 2015-08-15 10:39:45 UTC

(In reply to Daniel Pielmeier from comment #54)
> (In reply to SpanKY from comment #41)
> > 
> > to resolve the issue on your system:
> > (1) get the latest kernel (linux-3.9+)
> > (2) enable CONFIG_BLK_DEV_INITRD & CONFIG_MICROCODE & CONFIG_MICROCODE_EARLY
> > in your kernel (and the intel option obviously)
> > (3) install the microcode-data package with USE=initramfs
> > (4) set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio
> > (5) boot that kernel and it should load the microcode before running
> > userspace
> 
> Am I missing something obvious. I just receive a kernel panic (rootfs not
> found) when including the INITRAMFS_SOURCE.
> 
> Kernel version is gentoo-sources-4.0.5.

I had the same problem which was due to not having an initramfs (my previous kernel configuration did not have CONFIG_BLK_DEV_INITRD set). When I followed the 5 steps, the new kernel panics at boot.

Problem explanation:
1st problem: Grub2 changes its behaviour because CONFIG_INITRAMFS_SOURCE is now set and not empty: it no more generates a "root=/dev/sda5" option (fallback behaviour due to the absence of initramfs) but instead write a "root=UUID=<uuid of sda5>" which does not work without an initramfs. Uncommenting line "#GRUB_DISABLE_LINUX_UUID=true" in /etc/default/grub restores the "root=/dev/sda5" but this only change the panic error from root device not found to root device cannot be mounted with error -2 (ENOENT).

2nd problem: When CONFIG_BLK_DEV_INITRD is not set, the kernel early create dirs /dev/ and /root/ and char device /dev/console in a ramfs then it mounts the root fs in /root/.
When CONFIG_BLK_DEV_INITRD is set and CONFIG_INITRAMFS_SOURCE is empty, the build process creates an internal initramfs which includes these dirs and char device (see scripts/gen_initramfs_list.sh -d).
However, when CONFIG_INITRAMFS_SOURCE is not empty, the build process uses only the content of this option. If you set CONFIG_INITRAMFS_SOURCE to /lib/firmware/microcode.cpio and you don't have another initramfs which includes these 3 elements, the kernel fails to write to the console (error "Warning: unable to open an initial console") and cannot mount the rootfs in /root because it does not exist.

My solution:
If you don't use an initramfs, you have to follow all steps except the 4th one. Instead, leave CONFIG_INITRAMFS_SOURCE empty and ask your bootloader to load /lib/firmware/microcode.cpio as the initrd. Hence the kernel includes the internal initramfs with /dev/, /dev/console and /root/ which is unpacked first, then the /lib/firmware/microcode.cpio loaded by the bootloader is unpacked. As a result the microcode is early updated and the system boots. You even don't have to change /etc/default/grub.


I hope this can help.

Comment 56 Cédric Delmas 2015-08-15 11:07:16 UTC

(In reply to Tobias Klausmann from comment #53)
> (In reply to Nikos Chantziaras from comment #49)
> > OK, found a solution. You can add that to grub instead. For grub 2, it's
> > important (at least here) to list it first, before your normal initrd image.
> > For example:
> > 
> >   menuentry 'Gentoo' {
> >       ...
> >       initrd /lib/firmware/microcode.cpio /boot/initrd
> >   }
> > 
> > If you add it after any other initrd image, it doesn't work.
> 
> Is there a known way to make grub2-mkconfig do this automagically?

It would be great that grub2-mkconfig also adds a line "initrd /lib/firmware/microcode.cpio" when no other initrd is used (see my comment #55).

Comment 57 Balint SZENTE 2015-08-19 19:14:02 UTC

(In reply to Cédric Delmas from comment #55)
> 2nd problem: When CONFIG_BLK_DEV_INITRD is not set, the kernel early create
> dirs /dev/ and /root/ and char device /dev/console in a ramfs then it mounts
> the root fs in /root/.
> When CONFIG_BLK_DEV_INITRD is set and CONFIG_INITRAMFS_SOURCE is empty, the
> build process creates an internal initramfs which includes these dirs and
> char device (see scripts/gen_initramfs_list.sh -d).
> However, when CONFIG_INITRAMFS_SOURCE is not empty, the build process uses
> only the content of this option. If you set CONFIG_INITRAMFS_SOURCE to
> /lib/firmware/microcode.cpio and you don't have another initramfs which
> includes these 3 elements, the kernel fails to write to the console (error
> "Warning: unable to open an initial console") and cannot mount the rootfs in
> /root because it does not exist.

Thanks for explanation, I can confirm this. I created bug 558192 to include those files into the microcode.cpio initramfs archive.