Created attachment 594048 [details]
I'm not sure the bug resides in openrc or not, but for now I'm filing the bug against it.
This is the second time I'm able to reproduce this bug on a different hardware. (they are minipc, intel nuc and asrock ubox).
After update all packages with the stable profile 17.1, the system is unable to boot without an entropy generator (like haveged)
I'm attaching a png poc. The latest line is pertinent to syslog-ng, but if I disable syslog-ng from the default runlevel, it hangs around another service.
Atm, to make it boot, is enough to attach a keyboard and press some digits or install haveged and enable it at boot runlevel.
Just curious, why do you suspect the source of the problem is low entropy? Just because of
> Atm, to make it boot, is enough to attach a keyboard and press some digits or install haveged and enable it at boot runlevel.
> I'm not sure the bug resides in openrc or not, but for now I'm filing the bug against it.
May I ask what do you expect from an init system to do about this? It's your system. In case you are right and don't have enough entropy at start for user space programs, isn't it system administrator's job to add something like haveged? Or use CONFIG_RANDOM_TRUST_CPU (kernel)?
See also https://wiki.debian.org/BoottimeEntropyStarvation
(In reply to Thomas Deutschmann from comment #1)
> Just curious, why do you suspect the source of the problem is low entropy?
> Just because of
> > Atm, to make it boot, is enough to attach a keyboard and press some digits or install haveged and enable it at boot runlevel.
> May I ask what do you expect from an init system to do about this? It's your
> system. In case you are right and don't have enough entropy at start for
> user space programs, isn't it system administrator's job to add something
> like haveged? Or use CONFIG_RANDOM_TRUST_CPU (kernel)?
> See also https://wiki.debian.org/BoottimeEntropyStarvation
It can be a system administrator task, but when something is required to run a software I'd expect one of the following:
1) a kernel config check in the ebuild about CONFIG_RANDOM_TRUST_CPU
2) a warn about to have an entropy generator.
Since we are talking about something that boots the system and for the standard way you would install the system, you won't be notified about both 1 and 2(because you just untar the stage3 and go ahead), you may have a system that won't without know why, does it make sense?
My system doesn't have CONFIG_RANDOM_TRUST_CPU and boots fine, so I
don't think OpenRC requires that setting.
I'm not sure what you want me to do. :-)
Another setting I was pointed to is CONFIG_GCC_PLUGIN_LATENT_ENTROPY.
I was advised that if I add a config check it should be for this setting,
but I'm not convinced that this is an issue for OpenRC to worry about.
Here are a few things to keep in mind …
After logging in, one may find out the time at which the CRNG had been fully seeded by running dmesg -T | grep "crng init".
Until such time as the CRNG is seeded, reading from /dev/random will block. Also, getentropy(3) can block. Reading from /dev/urandom will not block. In other words, /dev/urandom only returns cryptographically secure entropy after the point at which "crng init" occurs.
Reading from /dev/random always blocks, if the entropy estimator believes that there is insufficient entropy, whereas /dev/urandom does not. That does not mean that urandom yields entropy of worse quality - that's a common myth. However, that is contingent upon the seeding having concluded before first use.
It is probable that the reporter has an application being launched early that either reads directly from /dev/random or getentropy(3). This is the correct thing to do if an application being launched early during boot process requires cryptographically secure entropy because, if necessary, it will be blocked until the CRNG has been seeded. On the other hand, some applications don't genuinely require secure entropy and are thus abusing these resources.
OpenRC is already doing what it can by contributing a previously written random-seed early during the boot process.
The reporter's issue is an edge case, albeit not a particularly uncommon one. So, do those suffering from limited natural sources of entropy during the boot process - and perhaps also lacking RDRAND/RDSEED support - have any recourse? Yes, and here are some of them …
• CONFIG_GCC_PLUGIN_LATENT_ENTROPY: See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=38addce. Available as of v4.9. There should be no harm in enabling this in general. It also supports an "extra_latent_entropy" kernel parameter, which should _not_ generally be used (it results in the injection of insecure entropy).
• CONFIG_RANDOM_TRUST_CPU: See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39a8883. Available as of v4.19. Allows for a CPU-based generator - such as the RDSEED instruction - to be fully trusted in the course of seeding the kernel's CRNG. Controversial, because you then have to trust that the hardware has not been deliberately weakened in order to support the actions of a state-level actor. While useful to have as an option, advising users to enable it in general would be highly questionable, and lacking in neutrality.
• Linux v5.4 will make use of a special jitter entropy generator (for CPUs with a cycle counter). See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=50ee752. This patch is simple enough that it should not be hard to backport, if so desired.
• A supplementary hardware random number generator (for where the CPU does not already offer one).
• A userspace entropy daemon, such as havaged. This is the least desirable option, in my opinion. The kernel is generally a better place to interact with - and handle - viable entropy sources.
In summary, I don't think there is much for OpenRC to do here. Recommending the gcc plugin should be OK. I would oppose the general recommendation of RANDOM_TRUST_CPU in the strongest possible terms, certainly as part of a CONFIG_CHECK. Also, the kernel team might be interested in the aforementioned patch.
I'm taking the liberty of copying in kernel, in case there is any interest in the jitter entropy patch. Also, I forgot to mention a relevant article in my previous comment, which is https://lwn.net/Articles/802360/.
I am assigning this to the kernel team since there isn't anything for
OpenRC to do.
I'm experiencing this problem on kvm/qemu collocated box (hosted).
As the quick workaround I installed one of suggested rnd daemons (haveged/clrngd)
Here is a good with tracking the problem
(In reply to Anton Bolshakov from comment #8)
> I'm experiencing this problem on kvm/qemu collocated box (hosted).
> sys-kernel/gentoo-sources-4.19.86 (stable)
> As the quick workaround I installed one of suggested rnd daemons
> Here is a good with tracking the problem
Ideally, your hosting company would support VirtIO RNG, and back it with a non-blocking source of entropy. If enabling CONFIG_HW_RANDOM_VIRTIO in your kernel does not help, I would suggest bringing it up with their support team.
(In reply to Kerin Millar from comment #9)
> Ideally, your hosting company would support VirtIO RNG, and back it with a
> non-blocking source of entropy. If enabling CONFIG_HW_RANDOM_VIRTIO in your
> kernel does not help, I would suggest bringing it up with their support team.
I forgot to add that, even if they will not properly support VirtIO RNG, the measures described earlier in the thread may still help you i.e. upgrading to the latest 5.4 longterm kernel and also enabling RANDOM_TRUST_CPU if the hypervisor exposes RDRAND/RDSEED and you happen to trust it.
(In reply to Kerin Millar from comment #10)
> I forgot to add that, even if they will not properly support VirtIO RNG, the
> measures described earlier in the thread may still help you i.e. upgrading
> to the latest 5.4 longterm kernel and also enabling RANDOM_TRUST_CPU if the
> hypervisor exposes RDRAND/RDSEED and you happen to trust it.
Thanks for your suggestions. I have enabled RANDOM_TRUST_CPU but it did not help (the boot delay may be shortened but I can't be certain). That's why I went ahead with entropy daemon approach. I have also requested my hosting provider (vpsserver) to look at the problem.
As for this bug report, it sounds like it should be marked as dependency of 5.4 kernel stabilization.
I am having this problem too, and it is reproducible on a new Gentoo install on a Mac Mini 6,2 when I boot with the latest stable vanilla-sources kernel 4.4.213.
I encountered it after updating my world, rebooting, then being locked out of remote systems because they are blocked during bootup.
I am using kernel 4.4.213 because there is a regression all the newer stable kernels that silently drops all packets on eth0: https://bugzilla.kernel.org/show_bug.cgi?id=205717
I tried bisecting the kernel to isolate the problem so that I can upgrade my kernel, but I'm having problems getting any bisected kernels to even boot. I'm not sure if the kernel bug is getting any attention right now.
Would it be appropriate to add a long timeout that is enabled by default for services that are blocking boot? Something very long like 1 hour because as older systems receive package updates, then eventually reboot, they will never come back online. If they are remote then this is actually very problematic.
I also submitted this bug since it seems related: https://github.com/syslog-ng/syslog-ng/issues/3124#issuecomment-584099565