There was a discussion in #gentoo-infra recently about the various methods to get CPU microcode loaded at boot time. It is generally best to do it as early as possible, but this relies on the user to build their kernel/initramfs appropriately. As a fallback, I propose we install the following tmpfiles.d snippet to cause the kernel to (re)load microcode in late boot: > w /sys/devices/system/cpu/microcode/reload - - - - 1 This could be installed by a new package referenced in RDEPEND in sys-firmware/intel-microcode and sys-kernel/linux-firmware. Does this seem like a reasonable idea?
Do we know for sure what will happen if the microcode before initramfs/kernel is newer than the one in /lib/firmware and we issue a reload command? I.e. may it cause a undesired downgrade? Not sure if we want that firmware packages depend on sys-firmware/microcode-reload because this will force a microcode updates. I am not sure we should do that at all. At the moment I think people should do that on their own. I don't want to be responsible if a microcode update goes bad and cause a server crash. BTW, RH is shipping > [Unit] > Description=Load CPU microcode update > After=basic.target > ConditionVirtualization=false > ConditionPathExists=/sys/devices/system/cpu/microcode/reload > > [Service] > Type=oneshot > RemainAfterExit=no > ExecStart=/usr/bin/bash -c 'grep -l GenuineIntel /proc/cpuinfo | xargs grep -l -E "model[[:space:]]*: 79$" > /dev/null || echo 1 > /sys/devices/system/cpu/microcode/reload' > [Install] > WantedBy=basic.target with their Intel microcode package.
@floppym: I'm wondering about disable paths if the microcode causes the system to hang (a rare case, but still an important one). For that case, I'd prefer an init script rather than the tmpfiles rule. (For the early-microcode case, we should document passing 'dis_ucode_ldr' on the kernel commandline).
(In reply to Thomas Deutschmann from comment #1) > Do we know for sure what will happen if the microcode before > initramfs/kernel is newer than the one in /lib/firmware and we issue a > reload command? I.e. may it cause a undesired downgrade? The only way to downgrade, AFAIK, is specifically using the the iucode_tool with the --downgrade option. I don't have a system with multiple firmware available test it however. > Not sure if we want that firmware packages depend on > sys-firmware/microcode-reload because this will force a microcode updates. I > am not sure we should do that at all. At the moment I think people should do > that on their own. I don't want to be responsible if a microcode update goes > bad and cause a server crash. In that case, let's put it in an openrc service, and tell users to put it into the init or boot runlevels. If they have a problem, they can skip the service. AMD is possibly going to ship firmware for the Spectre side as well, and ARM could too (they already announced a new instruction for it). > BTW, RH is shipping ... > > ExecStart=/usr/bin/bash -c 'grep -l GenuineIntel /proc/cpuinfo | xargs grep -l -E "model[[:space:]]*: 79$" > /dev/null || echo 1 > /sys/devices/system/cpu/microcode/reload' model=79 is a Xeon E5-26xx v4 series (confirmed on 2650, 2697). Why are they being excluded? I'd REALLY like to know.
(In reply to Robin Johnson from comment #3) > model=79 is a Xeon E5-26xx v4 series (confirmed on 2650, 2697). Why are they > being excluded? I'd REALLY like to know. The microcode_ctl package attempts a live update of the CPU's microcode during the package installation. On certain CPU models (Intel Xeon v4, family 6, model 79), updating microcode_ctl could previously cause the system to become unresponsive or reboot instantly. As a consequence, this could leave the system in an unbootable state and with duplicate packages in the RPM database. To fix this bug, the live update is skipped on the affected CPU models, and the described problem no longer occurs. https://bugzilla.redhat.com/show_bug.cgi?id=1402512
The tmpfiles.d snippet may be disabled by placing an empty file (or symlink to /dev/null) at /etc/tmpfiles.d/microcode-reload.conf. As for downgrades, it seems extremely unlikely that the early initramfs would be more recent than /lib/firmware. Regardless, I think the kernel ignores older versions.
It seems to me that if the user has installed intel-microcode or linux-firmware, and has enabled the microcode driver in the kernel, they should expect it to be loaded by the kernel automatically at boot at the very least. I guess I can kind of see the case for removing the pkg_postinst call; we don't generally do live updates like this. However, I wonder how many systems are really so broken that they crash on a microcode update. If we aren't going to do this automatically at boot or in pkg_postinst, then I don't think we should do anything more at all. The user should configure an early boot initramfs if manual action is required. An init script makes it too convenient to do this the wrong way.
That Xeon errata will HURT for early-initramfs, because you can't skip just model=79 like that during early-initramfs. There was a seperate bug asking for the early-initramfs to contain just the system-specific firmware, and I'm wondering if we can utilize that a bit. I'm wondering if we tackle it the other way: 1. EXCLUDE the model=79 firmware on install (provide some way to override the skip) 2. Support generating system-specific early-initramfs cpio. 3. Ship the boot init script to trigger the reload. I'll see if I can find an affected Xeon for testing (if you have one that can be rebooted a few times, please volunteer here!).
My brand new kaby lake system hangs on the latest intel-microcode I tried (1215). So yes, I should have checked the version and supported cpu models, but this is just an example of a brand new system that would have trouble if we started to automatically do this.
(In reply to Robin Johnson from comment #7) > That Xeon errata will HURT for early-initramfs, because you can't skip just > model=79 like that during early-initramfs. IIUC the problem only occur on live (late) update of microcode. Early microcode update is fine for this CPU model. I also want to remind you guys about bug 528712
(In reply to Alexander Tsoy from comment #9) > (In reply to Robin Johnson from comment #7) > > That Xeon errata will HURT for early-initramfs, because you can't skip just > > model=79 like that during early-initramfs. > IIUC the problem only occur on live (late) update of microcode. Early > microcode update is fine for this CPU model. Otherwise they would completely disable microcode loading for this CPU :) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.15-rc7&id=723f2828a98c8ca19842042f418fb30dd8cfc0f7
Well, I can't argue with all that evidence. Thanks for the feedback all.
reopening this, so we can still have a fallback late-load init script that users can use if they are on a platform where it's needed, won't screw up their system, and they can't reboot yet. Rough version of the init script: is_cpu_blacklisted() { # check blacklisted CPUs # model=79 / Broadwell X # model=??? / Kaby Lake } start() { if is_cpu_blacklisted ; then eerror "Your CPU is blacklisted for late-loading as it can cause crashes" errror "Please use the early-firmware mechanism & reboot instead" return 1 else ebegin "Triggering CPU microcode reload" echo 1 >/sys/devices/system/cpu/microcode/reload eend $? fi } And this script would be in PDEPEND of intel-microcode & linux-firmware, not RDEPEND.
Here's my Kaby Lake Model for reference: CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
(In reply to Robin Johnson from comment #12) > And this script would be in PDEPEND of intel-microcode & linux-firmware, not > RDEPEND. PDEPEND is for breaking circular deps only. It is otherwise functionally equivalent to RDEPEND.