Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 820773 - sys-fs/cryptsetup: dmcrypt runscript should need udev-settle
Summary: sys-fs/cryptsetup: dmcrypt runscript should need udev-settle
Status: RESOLVED CANTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-29 22:29 UTC by kfm
Modified: 2022-10-31 17:16 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kfm 2021-10-29 22:29:35 UTC
Recently, I installed cryptsetup so as to be able to support encrypted swap devices. Having tested at least a dozen different machines, I found that all of them failed to activate swap upon rebooting. The /etc/conf.d/dmcrypt configuration is straightforward.

  swap=cryptswap
  source='/dev/disk/by-partlabel/swap'

Upon some consideration, the reason for this becomes obvious. The dmcrypt runscript is fundamentally racing against the time at which udev settles. It cannot be known in advance what the user may specify as a source device, nor can it be known how long it will take for udev to create all of the associated files beneath /dev. Given the limitations of OpenRC, the only correct thing to do is to wait for udev to settle. Therefore, I am requesting that "need udev-settle" be defined by the runscript.

I imagine that there are some that would argue that the user is capable of defining that requirement in /etc/conf.d/dmcrypt. While that is true, I wouldn't consider it be a substantive counter-argument. The reason for this is elementary: in the event that dmcrypt is not instructed to wait for udev to settle, the behaviour of a given system that uses dmcrypt is fundamentally unpredictable, both in the present and in the future. Thus, as already mentioned, depending upon udev-settle is the only provably correct thing to do.

As an aside, dmcrypt is not the only software affected by race conditions on Gentoo/OpenRC. Recently, I rolled out thin provisioning on some of my systems and found that thin volumes invariably fail to be activated upon booting. In that particular instance, it was very hard to debug (it took me several weeks) but the underlying cause turned out to be much the same. I reworked the lvm runscript and, among other things, had it depend on udev-settle. Since then, it works correctly, constantly. The point is that this sort of unpredictability gives a sense of Gentoo/OpenRC not being well engineered or tested for the various use cases that it purports to support. Indeed, I would expect some users to have great difficulty in determining what the correct course of action is, other than to give up and switch to systemd. Or, perhaps even some other distribution that uses systemd. I would say that it's worth trading off a few seconds worth of boot time to improve the overall experience.
Comment 1 Matt Whitlock 2022-10-20 02:34:02 UTC
The 'mdraid' runscript has the same problem. If udev hasn't settled yet, then device nodes beneath /dev/disk/by-*/ may not have been created yet. If /etc/mdadm.conf specifies, for example, "DEVICE /dev/disk/by-id/ata-*-part[0-9]*", then whether the arrays get assembled correctly at boot falls to chance. Perplexingly, though, I have found that adding rc_want="dev-settle" to /etc/conf.d/mdraid is inadequate to resolve the race. I've had to fall back to defining a start_pre() function that sleeps in a loop until all needed device nodes exist.
Comment 2 Mike Gilbert gentoo-dev 2022-10-20 02:47:39 UTC
udev-settle is a racy hack. There is really no way to reliably tell when udev is "done" processing events since the kernel may generate new events at any time.
Comment 3 Mike Gilbert gentoo-dev 2022-10-20 02:51:55 UTC
OpenRC provides no way to reliably wait for devices to appear. Marking this as CANTFIX.
Comment 4 kfm 2022-10-31 16:43:40 UTC
(In reply to Mike Gilbert from comment #2)
> udev-settle is a racy hack. There is really no way to reliably tell when
> udev is "done" processing events since the kernel may generate new events at
> any time.

Technically, this is correct. It cannot be refuted. Still, there is an old aphorism about allowing perfect to be the enemy of good, which I think applies here - even if the word, good, be a stretch.

I do not exaggerate when I say that there is not a single OpenRC-using system that I have in production that is able to use the dmcrypt runscript, under the circumstances described, or LVM thin provisioning, at all (!), without involving udev-settle. Granted, the scripts are poor and I specifically had to overhaul the lvm runscript. My instance supports a somewhat narrower range of use cases; I don't care about udev-less systems, for example. Still, I attempted to test the use cases that remain and ensure that they stand half a chance of working.

The thing is that Gentoo effectively purports to support my use case by mere virtue of the fact that the relevant packages, flags and runscripts exist [1]. No matter how bad that support may be, it's not going to be removed in favour of systemd-exclusivity. In view of that, which of the following two options amounts to a better user experience?

a) the system stands a lower chance of functioning correctly; support is hard to come by when things fail and the most obvious alternative is to switch to systemd or use a 'mainstream' distribution

b) precisely the same as (a), except that the system stands a demonstrably higher chance of functioning correctly

If one is to accept the premise of the question, the answer seems clear. While WorksOnMyMachine(tm) is no high bar of engineering, it beats DoesntWork(tm) every time.

For me, the experience has been a tipping point that made me realise several things. Firstly, that the world of alternative init systems, for want of a better phrase, is encumbered by a deeply entrenched apathy when facing the very real problems such as those raised by this bug. I acknowledge that my suggestion was imperfect but, at the same time, the response was a predictable one.

Secondly, that I was spending more time addressing deficiencies in OpenRC and its ecosystem of runscripts than I was on productively managing my systems. 

Thirdly, that in some cases - and I emphasise the word some - Gentoo presents a promise of flexibility that it simply does not deliver on. In particular, the experience of getting thinp to work was tortuous. These are relatively simple setups that do not entail the rootfs being layered over LVM. Were I a new user in 2021, I might well have disqualified the entire distribution on that basis alone.

Fourthly, that most of the runscripts are really bad. OK, I already knew that.

Anyway, my advice to anyone that's read this far would be to use systemd. I made that decision months ago and now treat it as a first-class citizen for my own packaging process. Though not without its own foibles, it has made life simpler.

[1] Although the fact that thinp cannot be provisioned using the minimal install media is, perhaps, telling.
Comment 5 Larry the Git Cow gentoo-dev 2022-10-31 17:16:38 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e60a8d35dfa988cb1147c32e2b147a1d75fd0564

commit e60a8d35dfa988cb1147c32e2b147a1d75fd0564
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2022-10-31 17:14:04 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2022-10-31 17:16:03 +0000

    sys-fs/cryptsetup: order dmcrypt after dev-settle
    
    This is still racy, but should increase the probability of success.
    
    Bug: https://bugs.gentoo.org/820773
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 sys-fs/cryptsetup/files/2.4.3-dmcrypt.rc | 1 +
 1 file changed, 1 insertion(+)