Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 468206 - sys-fs/lvm2-2.02.98 - lvm segfaults during autoactivation via udev+lvmetad
Summary: sys-fs/lvm2-2.02.98 - lvm segfaults during autoactivation via udev+lvmetad
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Robin Johnson
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-01 21:15 UTC by Alexander Tsoy
Modified: 2014-02-01 21:20 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info lvm2 (lvm2.info,5.82 KB, text/plain)
2013-05-01 21:16 UTC, Alexander Tsoy
Details
lsblk (lsblk.out,3.01 KB, text/plain)
2013-05-01 21:16 UTC, Alexander Tsoy
Details
lvm backtrace (lvm2-2.02.98) (lvm2-2.02.98-lvm-pvscan-cache_1.bt,11.38 KB, text/plain)
2013-05-01 21:19 UTC, Alexander Tsoy
Details
lvm backtrace (lvm2-2.02.99) (lvm2-2.02.99-lvm-pvscan-cache_1.bt,11.91 KB, text/plain)
2013-05-01 21:20 UTC, Alexander Tsoy
Details
lvm2-lvmetad-pvscan-cache-fix-segv.patch (lvm2-2.02.98-lvmetad-fix-segv.patch,416 bytes, patch)
2013-05-03 12:11 UTC, Alexander Tsoy
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Tsoy 2013-05-01 21:15:07 UTC
On one of my PCs lvm segfaults on boot during autoactivation via udev+lvmetad. I've tried 2.02.98 with systemd support (bug 453594) and 2.02.99 (last revision from git) - results are the same.

lvm2-activation-generator works without problems.

Software versions:
sys-apps/systemd-200-r1
sys-fs/lvm2-2.02.{98,99}

Reproducible: Always

Steps to Reproduce:
1. modify lvm2 ebuild with patch from bug 453594 and merge it
2. "systemctl enable lvm2-lvmetad.service"
3. "use_lvmetad = 1" in /etc/lvm/lvm.conf
4. reboot
Actual Results:  
From journal:

апр. 30 19:14:15 home.puleglot kernel: lvm[2776]: segfault at 2c ip 00000076f34180dd sp 000003be8bc2aab0 error 4 in lvm[76f336e000+f0000]
апр. 30 19:14:15 home.puleglot kernel: grsec: Segmentation fault occurred at 000000000000002c in /sbin/lvm[lvm:2776] uid/euid:0/0 gid/egid:0/0, parent /sbin/udevd[systemd-udevd:2637] uid/euid:0/0 gid/egid:0/0
апр. 30 19:14:15 home.puleglot systemd-udevd[2637]: '/sbin/lvm pvscan --cache --activate ay --major 9 --minor 2' [2776] terminated by signal 11 (Segmentation fault)
апр. 30 19:14:15 home.puleglot systemd-udevd[2644]: '/sbin/lvm pvscan --cache --activate ay --major 9 --minor 3' [2825] terminated by signal 11 (Segmentation fault)
апр. 30 19:14:15 home.puleglot kernel: lvm[2825]: segfault at 2c ip 00000065dfbc80dd sp 000003eb632149b0 error 4 in lvm[65dfb1e000+f0000]
апр. 30 19:14:15 home.puleglot kernel: grsec: Segmentation fault occurred at 000000000000002c in /sbin/lvm[lvm:2825] uid/euid:0/0 gid/egid:0/0, parent /sbin/udevd[systemd-udevd:2644] uid/euid:0/0 gid/egid:0/0


vg_system/root and vg_system/usr are activated by dracut. Then each "/sbin/lvm pvscan --cache --activate ay ..." command activates only 1 LV in corresponding VG before segfault. This leaves unactivated LVs in the system which cannot be activated later: vg_system/opt and vg_user/data.

Backtraces, lsblk output and other info attached below.
Comment 1 Alexander Tsoy 2013-05-01 21:16:03 UTC
Created attachment 347074 [details]
emerge --info lvm2
Comment 2 Alexander Tsoy 2013-05-01 21:16:48 UTC
Created attachment 347076 [details]
lsblk
Comment 3 Alexander Tsoy 2013-05-01 21:19:40 UTC
Created attachment 347078 [details]
lvm backtrace (lvm2-2.02.98)
Comment 4 Alexander Tsoy 2013-05-01 21:20:03 UTC
Created attachment 347080 [details]
lvm backtrace (lvm2-2.02.99)
Comment 5 Alexander Tsoy 2013-05-03 12:11:11 UTC
Created attachment 347264 [details, diff]
lvm2-lvmetad-pvscan-cache-fix-segv.patch

Maybe this patch doesn't fix the root of the problem, but it fixes segfaults for me. All LVs gets activated during boot now.
Comment 6 Alasdair Kergon 2013-05-03 12:57:34 UTC
Thanks for the report!  We'll investigate this and sort out an upstream fix!
Comment 7 Peter Rajnoha 2013-05-07 11:23:52 UTC
(In reply to comment #5)
> Created attachment 347264 [details, diff] [details, diff]
> lvm2-lvmetad-pvscan-cache-fix-segv.patch
> 
> Maybe this patch doesn't fix the root of the problem, but it fixes segfaults
> for me. All LVs gets activated during boot now.

Well, yes, we need to find the real source of the problem here... The situation described here should not normally happen - if the PV does belong to some VG, then *all* metadata areas that belong to that PV should refer to the same existing VG.

What happened here is that one of the calls to vg_read for one of the metadata areas returned no VG though the call returned some VG for *other* metadata area which is bound to same PV. This is clearly an inconsistent lvmetad/lvmcache state.

For starters, could you please try following:

 - use the original code without that patch
 - set use_lvmetad=0 in the lvm.conf
 - reboot
 - after reboot, set use_lvmetad=1 and start the lvm2-lvmetad.service
 - run "pvscan --cache /dev/<PV_device>" for each device on the cmd line directly - can you reproduce the segfault this way?

Thanks.
Comment 8 Alexander Tsoy 2013-05-07 19:10:54 UTC
(In reply to comment #7)
> For starters, could you please try following:
> 
>  - use the original code without that patch
>  - set use_lvmetad=0 in the lvm.conf
>  - reboot
>  - after reboot, set use_lvmetad=1 and start the lvm2-lvmetad.service
>  - run "pvscan --cache /dev/<PV_device>" for each device on the cmd line
> directly - can you reproduce the segfault this way?
> 
> Thanks.

No segfaults, but on the first attempt it tries to open cdrom and sdd (flash card reader), both are empty.

$ sudo LANG=C pvscan --cache /dev/md2
  /dev/cdrom: open failed: No medium found
  /dev/sdd: open failed: No medium found
$ sudo LANG=C pvscan --cache /dev/md3
$ sudo LANG=C pvscan --cache /dev/md2
$ sudo LANG=C pvscan --cache /dev/md3
Comment 9 Alexander Tsoy 2013-05-07 20:04:25 UTC
This also "solves" the problem with segfaults :)

$ diff -u /etc/lvm/lvm.conf{.dist,}
--- /etc/lvm/lvm.conf.dist	2013-04-25 22:42:19.000000000 +0400
+++ /etc/lvm/lvm.conf	2013-05-07 23:58:26.755399905 +0400
@@ -87,7 +87,7 @@
     # global_filter. The syntax is the same as for normal "filter"
     # above. Devices that fail the global_filter are not even opened by LVM.
 
-    # global_filter = []
+    global_filter = [ "r|/dev/cdrom|", "r|/dev/sdd|" ]
 
     # The results of the filtering are cached on disk to avoid
     # rescanning dud devices (which can take a very long time).
@@ -495,7 +495,7 @@
     #
     # If lvmetad has been running while use_lvmetad was 0, it MUST be stopped
     # before changing use_lvmetad to 1 and started again afterwards.
-    use_lvmetad = 0
+    use_lvmetad = 1
 
     # Full path of the utility called to check that a thin metadata device
     # is in a state that allows it to be used.
Comment 10 Alexander Tsoy 2013-05-09 16:48:53 UTC
(In reply to comment #9)
> This also "solves" the problem with segfaults :)
> 
> $ diff -u /etc/lvm/lvm.conf{.dist,}
> --- /etc/lvm/lvm.conf.dist	2013-04-25 22:42:19.000000000 +0400
> +++ /etc/lvm/lvm.conf	2013-05-07 23:58:26.755399905 +0400
> @@ -87,7 +87,7 @@
>      # global_filter. The syntax is the same as for normal "filter"
>      # above. Devices that fail the global_filter are not even opened by LVM.
>  
> -    # global_filter = []
> +    global_filter = [ "r|/dev/cdrom|", "r|/dev/sdd|" ]
>  
>      # The results of the filtering are cached on disk to avoid
>      # rescanning dud devices (which can take a very long time).

Hmm.. This not completely solve the problem. Segfaults still rarely occurs. Then in emergency shell I do the following:

home ~ # systemctl start lvm2-lvmetad.service
home ~ # ls -l
total 16
-rw-r--r-- 1 root root    0 May  9 19:16 typescript
home ~ # pvscan --cache /dev/md2
  �EF: open failed: Is a directory
home ~ # ls -l
total 20
-rw-r--r-- 1 root root    0 May  9 19:16 typescript
drwx------ 2 root root 4096 May  9 19:17 ??E?F

Directory with strange name got created. O_o
Comment 11 Alexander Tsoy 2013-05-28 19:59:11 UTC
I just tried latest snapshot from Git again. Problem still exist. Any ideas?
Comment 12 Alexander Tsoy 2013-08-04 21:10:17 UTC
I can't reproduce these segfaults with 2.02.99 release. So looks like this bug was fixed. ^^
Comment 13 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2014-02-01 21:20:01 UTC
Closing as upstream per comment #12