682578 – =sys-fs/lvm2-2.02.183 USE=udev leads to long delays in the initramfs

Bug 682578 - =sys-fs/lvm2-2.02.183 USE=udev leads to long delays in the initramfs

Summary: =sys-fs/lvm2-2.02.183 USE=udev leads to long delays in the initramfs

Status:	CONFIRMED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2019-04-05 11:00 UTC by Herbert Wantesh
Modified:	2020-12-27 23:04 UTC (History)
CC List:	7 users (show)

See Also:	684882
Package list:
Runtime testing required:	---

Attachments
my complete /etc/init.d/assemble-raid script (assemble-raid,870 bytes, text/plain) 2019-04-23 17:45 UTC, Daniel Santos	Details
dmesg.log (dmesg.log,91.44 KB, text/plain) 2019-04-23 17:54 UTC, Daniel Santos	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Herbert Wantesh 2019-04-05 11:00:55 UTC

When =sys-fs/lvm2-2.02.183 is compiled with the udev flag enabled, it prints multiple lines after lvm vgscan or other lvm commands in the initramfs, which lacks udev and just uses devtmpfs. 

The error messages:

WARNING: Device /dev/sda not initialized in udev database even after waiting 1000000 microseconds

After waiting around 30 seconds the command finishes

Compiling lvm2 without the udev use flag enabled, fixes the problem.

btw. sys-fs/lvm2-2.02.145-r2 works normal with the udev use flag enabled

Comment 1 Morton Pellung 2019-04-08 18:16:13 UTC

I also see this error when trying to run grub-install and root on LVM etc. :-(

Basically grub tries every device for 10s and fails, and again, and again...

Workaround is to forward host lvm into chroot, see here https://bbs.archlinux.org/viewtopic.php?id=242594

Comment 2 Morton Pellung 2019-04-08 18:20:44 UTC

I guess my last message was not very clear...

I chroot into my new root partition that is one LVM on LUKS and try to grub-install into /dev/sda -> this fails unless /run/lvm from host is imported into the chroot.

Comment 3 Ben Kohler gentoo-dev

2019-04-08 18:22:59 UTC

This seems to be fixed in the new upstream release 2.02.184

Comment 4 Larry the Git Cow gentoo-dev

2019-04-08 18:34:35 UTC

The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=53d8205ff9f4533c17a265a88e44363fd9fd66ed

commit 53d8205ff9f4533c17a265a88e44363fd9fd66ed
Author:     Lars Wendler <polynomial-c@gentoo.org>
AuthorDate: 2019-04-08 18:34:13 +0000
Commit:     Lars Wendler <polynomial-c@gentoo.org>
CommitDate: 2019-04-08 18:34:27 +0000

    sys-fs/lvm2: Bump to version 2.02.184
    
    this should fix udev related issues as well as scan_lvs issues.
    
    Bug: https://bugs.gentoo.org/682578
    Bug: https://bugs.gentoo.org/682380
    Package-Manager: Portage-2.3.62, Repoman-2.3.12
    Signed-off-by: Lars Wendler <polynomial-c@gentoo.org>

 sys-fs/lvm2/Manifest             |   1 +
 sys-fs/lvm2/lvm2-2.02.184.ebuild | 258 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 259 insertions(+)

Comment 5 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev

2019-04-08 18:37:09 UTC

Please test =sys-fs/lvm2-2.02.184 and report back success/failure.

Comment 6 Herbert Wantesh 2019-04-08 20:40:43 UTC

i test =sys-fs/lvm2-2.02.184-r1 and it fixes the problem for me, thanks

Comment 7 Morton Pellung 2019-04-13 20:17:59 UTC

Tried again, unfortunately on another box and not totally identical setup, but the grub-install worked ok with 184-r1.

Comment 8 Daniel Santos 2019-04-23 16:43:45 UTC

Please change severity of this bug to critical or blocker.  It rendered my system entirely unbootable and I had to do an hour and a half of rescue work to get it back.

Perhaps I needed to wait several more minutes?

Please mask or remove this ebuild ASAP!!!

Comment 9 Daniel Santos 2019-04-23 17:38:00 UTC

I can confirm that this is not about rather or not you have udev started.  I start my raid via init script and it still happens.

depend() {
        after sysfs devfs udev
        before checkfs fsck
}

assemble() {
        /sbin/fsck /dev/nvme0n1p1 || return 1
        /bin/mount /mnt/nvme0 || return 1
        /sbin/mdadm --assemble /dev/md0 /dev/sd{a,b,c,d}1 --bitmap /mnt/nvme0/md0-write-intent.dat || return 1
        /sbin/vgchange -ay /dev/vg0 || return 1
}

disassemble() {
        /sbin/vgchange -an /dev/vg0 || return 1
        /sbin/mdadm --stop /dev/md0 || return 1
        /bin/umount /mnt/nvme0 || return 1
}

Comment 10 Ben Kohler gentoo-dev

2019-04-23 17:39:55 UTC

Please also share the dmesg output that shows the udev delays.  It may be about non-started OR unresponsive udev.

Comment 11 Daniel Santos 2019-04-23 17:43:43 UTC

Also I'm wrong about it making my system unbootable.  I tried again with a stopwatch and patience and it proceeded after 4 minutes and 40 seconds.  It did loop through all block devices twice and left these love letters in my kernel log:

[    5.841247] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[    5.853304] md: md0 stopped.
[    5.859397] md/raid:md0: device sda1 operational as raid disk 0
[    5.859519] md/raid:md0: device sdd1 operational as raid disk 3
[    5.859639] md/raid:md0: device sdc1 operational as raid disk 2
[    5.859758] md/raid:md0: device sdb1 operational as raid disk 1
[    5.861881] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[    5.864511] udevd[2067]: GOTO 'libsane_rules_end' has no matching label in: '/etc/udev/rules.d/S99-2000S1.rules'
[    5.866365] md0: detected capacity change from 0 to 8796106653696
[   66.863157] udevd[2067]: worker [2105] /devices/virtual/block/md0 is taking a long time
[  186.965091] udevd[2105]: timeout 'udisks-lvm-pv-export d2uPGL-1XLV-VdAb-L05Y-KhdX-uTHk-QFspo3'
[  186.965345] udevd[2105]: slow: 'udisks-lvm-pv-export d2uPGL-1XLV-VdAb-L05Y-KhdX-uTHk-QFspo3' [2109]
[  186.999448] udevd[2067]: worker [2105] /devices/virtual/block/md0 timeout; kill it
[  186.999679] udevd[2067]: seq 1385 '/devices/virtual/block/md0' killed
[  187.000471] udevd[2067]: worker [2105] terminated by signal 9 (Killed)
[  187.000755] udevd[2067]: worker [2105] failed while handling '/devices/virtual/block/md0'
[  250.070660] udevd[2067]: worker [2106] /devices/virtual/block/md0 is taking a long time
[  288.846412] udevd[2426]: failed to execute '/lib/udev/pci-db' 'pci-db /devices/pci0000:00/0000:00:01.3/0000:03:00.1': No such file or directory

<3 <3 <3

Comment 12 Daniel Santos 2019-04-23 17:45:50 UTC

Created attachment 573910 [details]
my complete /etc/init.d/assemble-raid script

Comment 13 Daniel Santos 2019-04-23 17:54:37 UTC

Created attachment 573912 [details]
dmesg.log

My full dmesg.log.  I have some kernel hacking enabled and there's a driver that always gets flagged by the UB sanitizer (unrelated).  Incidentally I also have a misbehaving wifi device I fixed with another init script hack, but this is also after starting udev and my assemble-raid, which is why my that wifi device goes away and comes back.

#!/sbin/openrc-run

name="iwlwifi-rescue"
description="wifi device, driver or firmware loading is screwed"

depend() {
        after udev sysfs
        before NetworkManager
}

start() {
        ebegin "Attempting to ressurect Intel wifi device..."
        set -x
        echo 0000:27:00.0 > /sys/module/iwlwifi/drivers/pci:iwlwifi/unbind || return -1
        sleep 0.25
        echo 0000:27:00.0 > /sys/module/iwlwifi/drivers/pci:iwlwifi/bind || return -1
        set +x
        eend
}

Comment 14 Herbert Wantesh 2019-05-10 08:54:30 UTC

how about stabilizing lvm2-2.02.184.*?