Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 306491 - sys-fs/lvm2: 64-device-mapper.rule causes race condition w/ udev and device removal
Summary: sys-fs/lvm2: 64-device-mapper.rule causes race condition w/ udev and device r...
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High major with 1 vote (vote)
Assignee: Robin Johnson
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-23 08:40 UTC by Matthias Dahl
Modified: 2011-05-29 16:06 UTC (History)
9 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Dahl 2010-02-23 08:40:03 UTC
I already reported this as part of bug #287650 but unfortunately it was  accidentally overlooked I guess.

64-device-mapper.rule causes a race condition w/ udev and lvm device removal. Deleting the rule or using the lvm packaged rules works w/o any problems. This major bug is a real PITA because of you are using automated backup scripts which rely on lvm snapshot creation/removal.

By the way, this is _not_ limited to snapshots I just chose this as an example.

Reproducible: Always

Steps to Reproduce:
1. lvcreate -L30G -s -n test-snapshot group/testvolume
2. mount /dev/mapper/group-test-snapshot /mnt/testsnapshot
3. ls /mnt/testsnapshot
4. umount /mnt/testsnapshot
5. lvremove -f group/test-snapshot
Actual Results:  
It might or might not work. It can fail several times in a row or work several times w/o any problem. If it fails, it fails with "Can't remove open logical volume "test-snapshot" and you have to retry several times before it works.


Expected Results:  
Snapshot gets removed - always.

Linux ceto 2.6.32.8 #2 SMP Mon Feb 22 14:27:44 CET 2010 x86_64 Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz GenuineIntel GNU/Linux

sys-fs/udev-151-r1
sys-fs/lvm2-2.02.56-r3
Comment 1 Matthias Dahl 2010-02-23 19:33:23 UTC
After spending the better half of the day trying to find a situable solution, I came across the following which I am adding here so it can benefit others:

1) dmsetup splitname is _broken_ - in earlier and newer versions. It "fails" in dm_report_object() @ if(!data). 

2) LVM udev rules except an enviroment variable STARTUP to be set. Neither udev nor any Gentoo scripts sets this. Mandriva did/does this, if I recall correctly.

3) Due to (1)/(2), also the LVM supplied udev rules are broken. Simply commenting the appropriate rule lines which check for $env{STARTUP} fixes that and the rules work mostly. Unfortunately (1) prevents udev from creating vg/lv symlinks in /dev which the lvm tools complain about.

I tried this w/ lvm 2.02.61 (simple bump worked) and enabled udev_sync. I can confirm this also applies to earlier versions like current ~arch.
Comment 2 Michael Härtl 2010-03-06 16:57:30 UTC
I confirm this problem: Can't remove LVM snapshots. I've also commented on bug #287650, as it seemed more related. But it's already closed - and the problem still persists.
Comment 3 Michael Härtl 2010-03-06 16:58:19 UTC
Forgot to mention my versions:

 sys-fs/lvm2-2.02.56-r2
 sys-fs/udev-149
Comment 4 Wolfram Schlich (RETIRED) gentoo-dev 2010-03-18 14:41:50 UTC
Same here. Removing several snapshots in a row
(each with a dedicated lvremove call) randomly
fails on one or more LVs.

Doesn't happen with udevd stopped.
I can also observe several udevd child
processes that seem to be somehow
"waiting" during that time (usually it
has only 2 child processes).

The lvremove exit code is always 5 which seems
to mean "error getting VGDA from kernel".

If I do a sleep 1 between lvremove tries for every LV,
sometimes it takes up to 7 tries to remove the LV.

If I call "lvremove @snapshots" instead (I have
all snapshots tagged that way), it's much quicker
and it never fails (I cannot even see 1 more
udevd child process!).
Comment 5 Wolfram Schlich (RETIRED) gentoo-dev 2010-03-18 14:58:23 UTC
This is sick!
Create some snapshot volumes,
run "udevadm control --log-priority=debug",
run lvremove to remove ONE of those volumes
and see what udev is logging... a load of stuff
not having anything to do with that one
snapshot volume. WTF?
Comment 6 Wolfram Schlich (RETIRED) gentoo-dev 2010-03-18 15:19:04 UTC
It seems that on lvremove udev gets triggered not only
with block "remove" actions, but also with block "change"
actions, which makes the rules just behave like on "add".

There is a lot of stuff going on in udev on lvremove
that's not supposed to happen (from my point of view)...

Ok, I compared the 64-device-mapper.rules from an older
lvm2 that does not have that problem with the one from
this lvm2 that has the problem. After commenting out
the line with OPTIONS+="watch", the problem disappears.
Comment 7 Wolfram Schlich (RETIRED) gentoo-dev 2010-03-18 16:06:00 UTC
Ubuntu has a similar bug:
https://bugs.launchpad.net/ubuntu/+source/udev/+bug/332270

Comment 9 Wolfram Schlich (RETIRED) gentoo-dev 2010-03-18 16:18:15 UTC
Ok, I can confirm this patch fixes the behaviour I've observed,
even with OPTIONS+="watch" enabled.
Several snapshots are deleted in a row without problems.
Can we please consider including that patch in our LVM2 package?
Comment 10 Matthias Dahl 2010-04-13 09:11:58 UTC
Is there anything holding this patch back from hitting the tree? Works fine here on my servers so far.
Comment 11 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-05-01 21:31:31 UTC
Please retest against .64, the udev rules have changed dramatically.
Comment 12 J. Roeleveld 2010-05-26 12:20:00 UTC
(In reply to comment #11)
> Please retest against .64, the udev rules have changed dramatically.
> 

I did a quick test and "sys-fs/lvm2-2.02.64" together with "sys-fs/udev-154".
(Had to update udev to a ~amd64 version as well for the lvm2 upgrade)

With this, my backup-script is no longer failing with the lvremove of snapshots.
It was failing consistently before.
Comment 13 Lubomir Krajcovic 2010-11-21 14:26:02 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Please retest against .64, the udev rules have changed dramatically.
> > 
> 
> I did a quick test and "sys-fs/lvm2-2.02.64" together with "sys-fs/udev-154".
> (Had to update udev to a ~amd64 version as well for the lvm2 upgrade)
> 
> With this, my backup-script is no longer failing with the lvremove of
> snapshots.
> It was failing consistently before.
> 

My packages:
  sys-fs/udev-164:0
  sys-fs/lvm2-2.02.74:0
  sys-fs/udisks-1.0.1-r2:0
I'm using automated backup via LVM snapshots and udev was still blocking lvremove attempts. Research showed, that I had to comment this udev rule (belonging to udisks) in /lib/udev/rules.d/80-udisks.rules file:
KERNEL=="dm-*", OPTIONS+="watch"

For further info see:
https://bugzilla.redhat.com/show_bug.cgi?id=577798

So, this bug should not be marked RESOLVED, but maybe reassigned to udisks maintainer?
Comment 14 peyser.alex 2011-05-29 16:05:59 UTC
After updating, I found the exact same thing -- wasted time hunting down 80-udisk.rules bug. This bug should be reopened -- or another bug opened -- to eliminate that ill-thought out udev rule.