Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 717774 - sys-kernel/genkernel: race: mdadm should wait for drive scan to settle
Summary: sys-kernel/genkernel: race: mdadm should wait for drive scan to settle
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Genkernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-17 00:46 UTC by Ed Santiago
Modified: 2020-07-23 23:58 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Santiago 2020-04-17 00:46:29 UTC
When invoked with USE_MDADM, genkernel runs mdadm --assemble --scan; but it does so while the kernel is still probing and poking. This leads to incomplete recognition of arrays, and DegradedArray events on every boot. Here's a sample dmesg showing the chronology:

  Apr 16 14:24:15 myhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0
  Apr 16 14:24:15 myhost kernel: sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
  ...
  Apr 16 14:24:15 myhost kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
  Apr 16 14:24:15 myhost kernel: ata2.00: Enabling discard_zeroes_data
  Apr 16 14:24:15 myhost kernel: sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
  ...
  Apr 16 14:24:15 myhost kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0
  Apr 16 14:24:15 myhost kernel: sd 2:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
  ...
  Apr 16 14:24:15 myhost kernel: md: md2 stopped.
  Apr 16 14:24:15 myhost kernel: md/raid10:md2: active with 3 out of 4 devices
  Apr 16 14:24:15 myhost kernel: md2: detected capacity change from 0 to 3141533696
  Apr 16 14:24:15 myhost kernel: md: md3 stopped.
  Apr 16 14:24:15 myhost kernel: md/raid10:md3: active with 3 out of 4 devices
  Apr 16 14:24:15 myhost kernel: md3: detected capacity change from 0 to 1996571541504
  ...
  Apr 16 14:24:15 myhost kernel: md: Autodetecting RAID arrays.
  Apr 16 14:24:15 myhost kernel: md: could not open unknown-block(8,2).
  Apr 16 14:24:15 myhost kernel: md: could not open unknown-block(8,3).
  Apr 16 14:24:15 myhost kernel: md: could not open unknown-block(8,18).
  Apr 16 14:24:15 myhost kernel: md: could not open unknown-block(8,19).
  Apr 16 14:24:15 myhost kernel: md: could not open unknown-block(8,34).
  Apr 16 14:24:15 myhost kernel: md: could not open unknown-block(8,35).
  Apr 16 14:24:15 myhost kernel: md: autorun ...
  Apr 16 14:24:15 myhost kernel: md: ... autorun DONE.
  ...
  Apr 16 14:24:15 myhost kernel: sd 3:0:0:0: Attached scsi generic sg3 type 0
  Apr 16 14:24:15 myhost kernel: sd 3:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)

Workaround: adding 'scandelay=5' to kernel boot options seems to give me a clean array on boot.

sys-kernel/genkernel-4.0.5  sys-fs/mdadm-4.1  sys-kernel/gentoo-sources-5.4.28
Comment 1 Thomas Deutschmann (RETIRED) gentoo-dev 2020-04-21 22:20:27 UTC
Not a bug. Options like scandelay, rootdelay... are existing exactly for this kind of problem.
Comment 2 Ed Santiago 2020-04-21 23:29:06 UTC
I'm afraid I must respectfully disagree. Adding delays  is, by definition, racy. The correct way to avoid a race condition is not by adding delays but by adding hooks such that a flag is set indicating completion of processing. This is in no way controversial.

I understand that genkernel has an aversion to udev, which is why I didn't suggest `udevadm settle` in my initial comment: I was hoping you would be able to use other processing hooks for determining that device processing is complete. I still hope you'll consider doing so.
Comment 3 Thomas Deutschmann (RETIRED) gentoo-dev 2020-07-23 23:58:58 UTC
Genkernel switched to udev due to bug 706434. Now that bundled mdadm is also udev-aware (https://gitweb.gentoo.org/proj/genkernel.git/commit/?id=f6f9384b423e9bb9b8cee294f4ddbeee7c518463), the reported problem should be resolved.

Please re-open/file a new bug if you are still experiencing the problem for >=genkernel-4.1.0_beta1.