643830 – init script to trigger microcode reload in late boot

Bug 643830 - init script to trigger microcode reload in late boot

Summary: init script to trigger microcode reload in late boot

Status:	UNCONFIRMED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-01-07 21:48 UTC by Mike Gilbert
Modified:	2018-01-08 19:35 UTC (History)
CC List:	3 users (show)

See Also:	https://github.com/gentoo/gentoo/pull/6787
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Gilbert gentoo-dev

2018-01-07 21:48:15 UTC

There was a discussion in #gentoo-infra recently about the various methods to get CPU microcode loaded at boot time. It is generally best to do it as early as possible, but this relies on the user to build their kernel/initramfs appropriately.

As a fallback, I propose we install the following tmpfiles.d snippet to cause the kernel to (re)load microcode in late boot:

> w /sys/devices/system/cpu/microcode/reload - - - - 1

This could be installed by a new package referenced in RDEPEND in sys-firmware/intel-microcode and sys-kernel/linux-firmware.

Does this seem like a reasonable idea?

Comment 1 Thomas Deutschmann (RETIRED) gentoo-dev

2018-01-07 22:57:05 UTC

Do we know for sure what will happen if the microcode before initramfs/kernel is newer than the one in /lib/firmware and we issue a reload command? I.e. may it cause a undesired downgrade?

Not sure if we want that firmware packages depend on sys-firmware/microcode-reload because this will force a microcode updates. I am not sure we should do that at all. At the moment I think people should do that on their own. I don't want to be responsible if a microcode update goes bad and cause a server crash.

BTW, RH is shipping

> [Unit]
> Description=Load CPU microcode update
> After=basic.target
> ConditionVirtualization=false
> ConditionPathExists=/sys/devices/system/cpu/microcode/reload
> 
> [Service]
> Type=oneshot
> RemainAfterExit=no
> ExecStart=/usr/bin/bash -c 'grep -l GenuineIntel /proc/cpuinfo | xargs grep -l -E "model[[:space:]]*: 79$" > /dev/null || echo 1 > /sys/devices/system/cpu/microcode/reload'
> [Install]
> WantedBy=basic.target

with their Intel microcode package.

Comment 2 Robin Johnson archtester

2018-01-07 22:57:40 UTC

@floppym:
I'm wondering about disable paths if the microcode causes the system to hang (a rare case, but still an important one).

For that case, I'd prefer an init script rather than the tmpfiles rule.

(For the early-microcode case, we should document passing 'dis_ucode_ldr' on the kernel commandline).

Comment 3 Robin Johnson archtester

2018-01-07 23:11:04 UTC

(In reply to Thomas Deutschmann from comment #1)
> Do we know for sure what will happen if the microcode before
> initramfs/kernel is newer than the one in /lib/firmware and we issue a
> reload command? I.e. may it cause a undesired downgrade?
The only way to downgrade, AFAIK, is specifically using the the iucode_tool with the --downgrade option. I don't have a system with multiple firmware available test it however.

> Not sure if we want that firmware packages depend on
> sys-firmware/microcode-reload because this will force a microcode updates. I
> am not sure we should do that at all. At the moment I think people should do
> that on their own. I don't want to be responsible if a microcode update goes
> bad and cause a server crash.
In that case, let's put it in an openrc service, and tell users to put it into the init or boot runlevels. If they have a problem, they can skip the service.

AMD is possibly going to ship firmware for the Spectre side as well, and ARM could too (they already announced a new instruction for it).

> BTW, RH is shipping
...
> > ExecStart=/usr/bin/bash -c 'grep -l GenuineIntel /proc/cpuinfo | xargs grep -l -E "model[[:space:]]*: 79$" > /dev/null || echo 1 > /sys/devices/system/cpu/microcode/reload'

model=79 is a Xeon E5-26xx v4 series (confirmed on 2650, 2697). Why are they being excluded? I'd REALLY like to know.

Comment 4 Thomas Deutschmann (RETIRED) gentoo-dev

2018-01-07 23:20:24 UTC

(In reply to Robin Johnson from comment #3)
> model=79 is a Xeon E5-26xx v4 series (confirmed on 2650, 2697). Why are they
> being excluded? I'd REALLY like to know.

The microcode_ctl package attempts a live update of the CPU's microcode during the package installation. On certain CPU models (Intel Xeon v4, family 6, model 79), updating microcode_ctl could previously cause the system to become unresponsive or reboot instantly. As a consequence, this could leave the system in an unbootable state and with duplicate packages in the RPM database. To fix this bug, the live update is skipped on the affected CPU models, and the described problem no longer occurs.

https://bugzilla.redhat.com/show_bug.cgi?id=1402512

Comment 5 Mike Gilbert gentoo-dev

2018-01-08 01:15:15 UTC

The tmpfiles.d snippet may be disabled by placing an empty file (or symlink to /dev/null) at /etc/tmpfiles.d/microcode-reload.conf.

As for downgrades, it seems extremely unlikely that the early initramfs would be more recent than /lib/firmware. Regardless, I think the kernel ignores older versions.

Comment 6 Mike Gilbert gentoo-dev

2018-01-08 01:45:52 UTC

It seems to me that if the user has installed intel-microcode or linux-firmware, and has enabled the microcode driver in the kernel, they should expect it to be loaded by the kernel automatically at boot at the very least.

I guess I can kind of see the case for removing the pkg_postinst call; we don't generally do live updates like this. However, I wonder how many systems are really so broken that they crash on a microcode update.

If we aren't going to do this automatically at boot or in pkg_postinst, then I don't think we should do anything more at all. The user should configure an early boot initramfs if manual action is required. An init script makes it too convenient to do this the wrong way.

Comment 7 Robin Johnson archtester

2018-01-08 02:25:56 UTC

That Xeon errata will HURT for early-initramfs, because you can't skip just model=79 like that during early-initramfs.

There was a seperate bug asking for the early-initramfs to contain just the system-specific firmware, and I'm wondering if we can utilize that a bit.

I'm wondering if we tackle it the other way: 
1. EXCLUDE the model=79 firmware on install (provide some way to override the skip)
2. Support generating system-specific early-initramfs cpio.
3. Ship the boot init script to trigger the reload.

I'll see if I can find an affected Xeon for testing (if you have one that can be rebooted a few times, please volunteer here!).

Comment 8 Mike Pagano gentoo-dev

2018-01-08 11:24:59 UTC

My brand new kaby lake system hangs on the latest intel-microcode I tried (1215).
So yes, I should have checked the version and supported cpu models, but this is just an example of a brand new system that would have trouble if we started to automatically do this.

Comment 9 Alexander Tsoy 2018-01-08 12:14:06 UTC

(In reply to Robin Johnson from comment #7)
> That Xeon errata will HURT for early-initramfs, because you can't skip just
> model=79 like that during early-initramfs.
IIUC the problem only occur on live (late) update of microcode. Early microcode update is fine for this CPU model.

I also want to remind you guys about bug 528712

Comment 10 Alexander Tsoy 2018-01-08 12:49:21 UTC

(In reply to Alexander Tsoy from comment #9)
> (In reply to Robin Johnson from comment #7)
> > That Xeon errata will HURT for early-initramfs, because you can't skip just
> > model=79 like that during early-initramfs.
> IIUC the problem only occur on live (late) update of microcode. Early
> microcode update is fine for this CPU model.

Otherwise they would completely disable microcode loading for this CPU :)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.15-rc7&id=723f2828a98c8ca19842042f418fb30dd8cfc0f7

Comment 11 Mike Gilbert gentoo-dev

2018-01-08 14:10:30 UTC

Well, I can't argue with all that evidence. Thanks for the feedback all.

Comment 12 Robin Johnson archtester

2018-01-08 18:51:18 UTC

reopening this, so we can still have a fallback late-load init script that users can use if they are on a platform where it's needed, won't screw up their system, and they can't reboot yet.

Rough version of the init script:
is_cpu_blacklisted() {
  # check blacklisted CPUs  
  # model=79 / Broadwell X 
  # model=??? / Kaby Lake
}

start() {
  if is_cpu_blacklisted ; then
    eerror "Your CPU is blacklisted for late-loading as it can cause crashes"
    errror "Please use the early-firmware mechanism & reboot instead"
    return 1
  else
    ebegin "Triggering CPU microcode reload"
    echo 1 >/sys/devices/system/cpu/microcode/reload
    eend $?
  fi
}

And this script would be in PDEPEND of intel-microcode & linux-firmware, not RDEPEND.

Comment 13 Mike Pagano gentoo-dev

2018-01-08 19:17:55 UTC

Here's my Kaby Lake Model for reference:

CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Comment 14 Mike Gilbert gentoo-dev

2018-01-08 19:35:05 UTC

(In reply to Robin Johnson from comment #12)
> And this script would be in PDEPEND of intel-microcode & linux-firmware, not
> RDEPEND.

PDEPEND is for breaking circular deps only. It is otherwise functionally equivalent to RDEPEND.