468206 – sys-fs/lvm2-2.02.98 - lvm segfaults during autoactivation via udev+lvmetad

Bug 468206 - sys-fs/lvm2-2.02.98 - lvm segfaults during autoactivation via udev+lvmetad

Summary: sys-fs/lvm2-2.02.98 - lvm segfaults during autoactivation via udev+lvmetad

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Robin Johnson

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-05-01 21:15 UTC by Alexander Tsoy
Modified:	2014-02-01 21:20 UTC (History)
CC List:	4 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
emerge --info lvm2 (lvm2.info,5.82 KB, text/plain) 2013-05-01 21:16 UTC, Alexander Tsoy	Details
lsblk (lsblk.out,3.01 KB, text/plain) 2013-05-01 21:16 UTC, Alexander Tsoy	Details
lvm backtrace (lvm2-2.02.98) (lvm2-2.02.98-lvm-pvscan-cache_1.bt,11.38 KB, text/plain) 2013-05-01 21:19 UTC, Alexander Tsoy	Details
lvm backtrace (lvm2-2.02.99) (lvm2-2.02.99-lvm-pvscan-cache_1.bt,11.91 KB, text/plain) 2013-05-01 21:20 UTC, Alexander Tsoy	Details
lvm2-lvmetad-pvscan-cache-fix-segv.patch (lvm2-2.02.98-lvmetad-fix-segv.patch,416 bytes, patch) 2013-05-03 12:11 UTC, Alexander Tsoy	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alexander Tsoy 2013-05-01 21:15:07 UTC

On one of my PCs lvm segfaults on boot during autoactivation via udev+lvmetad. I've tried 2.02.98 with systemd support (bug 453594) and 2.02.99 (last revision from git) - results are the same.

lvm2-activation-generator works without problems.

Software versions:
sys-apps/systemd-200-r1
sys-fs/lvm2-2.02.{98,99}

Reproducible: Always

Steps to Reproduce:
1. modify lvm2 ebuild with patch from bug 453594 and merge it
2. "systemctl enable lvm2-lvmetad.service"
3. "use_lvmetad = 1" in /etc/lvm/lvm.conf
4. reboot
Actual Results:  
From journal:

апр. 30 19:14:15 home.puleglot kernel: lvm[2776]: segfault at 2c ip 00000076f34180dd sp 000003be8bc2aab0 error 4 in lvm[76f336e000+f0000]
апр. 30 19:14:15 home.puleglot kernel: grsec: Segmentation fault occurred at 000000000000002c in /sbin/lvm[lvm:2776] uid/euid:0/0 gid/egid:0/0, parent /sbin/udevd[systemd-udevd:2637] uid/euid:0/0 gid/egid:0/0
апр. 30 19:14:15 home.puleglot systemd-udevd[2637]: '/sbin/lvm pvscan --cache --activate ay --major 9 --minor 2' [2776] terminated by signal 11 (Segmentation fault)
апр. 30 19:14:15 home.puleglot systemd-udevd[2644]: '/sbin/lvm pvscan --cache --activate ay --major 9 --minor 3' [2825] terminated by signal 11 (Segmentation fault)
апр. 30 19:14:15 home.puleglot kernel: lvm[2825]: segfault at 2c ip 00000065dfbc80dd sp 000003eb632149b0 error 4 in lvm[65dfb1e000+f0000]
апр. 30 19:14:15 home.puleglot kernel: grsec: Segmentation fault occurred at 000000000000002c in /sbin/lvm[lvm:2825] uid/euid:0/0 gid/egid:0/0, parent /sbin/udevd[systemd-udevd:2644] uid/euid:0/0 gid/egid:0/0


vg_system/root and vg_system/usr are activated by dracut. Then each "/sbin/lvm pvscan --cache --activate ay ..." command activates only 1 LV in corresponding VG before segfault. This leaves unactivated LVs in the system which cannot be activated later: vg_system/opt and vg_user/data.

Backtraces, lsblk output and other info attached below.

Comment 1 Alexander Tsoy 2013-05-01 21:16:03 UTC

Created attachment 347074 [details]
emerge --info lvm2

Comment 2 Alexander Tsoy 2013-05-01 21:16:48 UTC

Created attachment 347076 [details]
lsblk

Comment 3 Alexander Tsoy 2013-05-01 21:19:40 UTC

Created attachment 347078 [details]
lvm backtrace (lvm2-2.02.98)

Comment 4 Alexander Tsoy 2013-05-01 21:20:03 UTC

Created attachment 347080 [details]
lvm backtrace (lvm2-2.02.99)

Comment 5 Alexander Tsoy 2013-05-03 12:11:11 UTC

Created attachment 347264 [details, diff]
lvm2-lvmetad-pvscan-cache-fix-segv.patch

Maybe this patch doesn't fix the root of the problem, but it fixes segfaults for me. All LVs gets activated during boot now.

Comment 6 Alasdair Kergon 2013-05-03 12:57:34 UTC

Thanks for the report!  We'll investigate this and sort out an upstream fix!

Comment 7 Peter Rajnoha 2013-05-07 11:23:52 UTC

(In reply to comment #5)
> Created attachment 347264 [details, diff] [details, diff]
> lvm2-lvmetad-pvscan-cache-fix-segv.patch
> 
> Maybe this patch doesn't fix the root of the problem, but it fixes segfaults
> for me. All LVs gets activated during boot now.

Well, yes, we need to find the real source of the problem here... The situation described here should not normally happen - if the PV does belong to some VG, then *all* metadata areas that belong to that PV should refer to the same existing VG.

What happened here is that one of the calls to vg_read for one of the metadata areas returned no VG though the call returned some VG for *other* metadata area which is bound to same PV. This is clearly an inconsistent lvmetad/lvmcache state.

For starters, could you please try following:

 - use the original code without that patch
 - set use_lvmetad=0 in the lvm.conf
 - reboot
 - after reboot, set use_lvmetad=1 and start the lvm2-lvmetad.service
 - run "pvscan --cache /dev/<PV_device>" for each device on the cmd line directly - can you reproduce the segfault this way?

Thanks.

Comment 8 Alexander Tsoy 2013-05-07 19:10:54 UTC

(In reply to comment #7)
> For starters, could you please try following:
> 
>  - use the original code without that patch
>  - set use_lvmetad=0 in the lvm.conf
>  - reboot
>  - after reboot, set use_lvmetad=1 and start the lvm2-lvmetad.service
>  - run "pvscan --cache /dev/<PV_device>" for each device on the cmd line
> directly - can you reproduce the segfault this way?
> 
> Thanks.

No segfaults, but on the first attempt it tries to open cdrom and sdd (flash card reader), both are empty.

$ sudo LANG=C pvscan --cache /dev/md2
  /dev/cdrom: open failed: No medium found
  /dev/sdd: open failed: No medium found
$ sudo LANG=C pvscan --cache /dev/md3
$ sudo LANG=C pvscan --cache /dev/md2
$ sudo LANG=C pvscan --cache /dev/md3

Comment 9 Alexander Tsoy 2013-05-07 20:04:25 UTC

This also "solves" the problem with segfaults :)

$ diff -u /etc/lvm/lvm.conf{.dist,}
--- /etc/lvm/lvm.conf.dist	2013-04-25 22:42:19.000000000 +0400
+++ /etc/lvm/lvm.conf	2013-05-07 23:58:26.755399905 +0400
@@ -87,7 +87,7 @@
     # global_filter. The syntax is the same as for normal "filter"
     # above. Devices that fail the global_filter are not even opened by LVM.
 
-    # global_filter = []
+    global_filter = [ "r|/dev/cdrom|", "r|/dev/sdd|" ]
 
     # The results of the filtering are cached on disk to avoid
     # rescanning dud devices (which can take a very long time).
@@ -495,7 +495,7 @@
     #
     # If lvmetad has been running while use_lvmetad was 0, it MUST be stopped
     # before changing use_lvmetad to 1 and started again afterwards.
-    use_lvmetad = 0
+    use_lvmetad = 1
 
     # Full path of the utility called to check that a thin metadata device
     # is in a state that allows it to be used.

Comment 10 Alexander Tsoy 2013-05-09 16:48:53 UTC

(In reply to comment #9)
> This also "solves" the problem with segfaults :)
> 
> $ diff -u /etc/lvm/lvm.conf{.dist,}
> --- /etc/lvm/lvm.conf.dist	2013-04-25 22:42:19.000000000 +0400
> +++ /etc/lvm/lvm.conf	2013-05-07 23:58:26.755399905 +0400
> @@ -87,7 +87,7 @@
>      # global_filter. The syntax is the same as for normal "filter"
>      # above. Devices that fail the global_filter are not even opened by LVM.
>  
> -    # global_filter = []
> +    global_filter = [ "r|/dev/cdrom|", "r|/dev/sdd|" ]
>  
>      # The results of the filtering are cached on disk to avoid
>      # rescanning dud devices (which can take a very long time).

Hmm.. This not completely solve the problem. Segfaults still rarely occurs. Then in emergency shell I do the following:

home ~ # systemctl start lvm2-lvmetad.service
home ~ # ls -l
total 16
-rw-r--r-- 1 root root    0 May  9 19:16 typescript
home ~ # pvscan --cache /dev/md2
  �EF: open failed: Is a directory
home ~ # ls -l
total 20
-rw-r--r-- 1 root root    0 May  9 19:16 typescript
drwx------ 2 root root 4096 May  9 19:17 ??E?F

Directory with strange name got created. O_o

Comment 11 Alexander Tsoy 2013-05-28 19:59:11 UTC

I just tried latest snapshot from Git again. Problem still exist. Any ideas?

Comment 12 Alexander Tsoy 2013-08-04 21:10:17 UTC

I can't reproduce these segfaults with 2.02.99 release. So looks like this bug was fixed. ^^

Comment 13 Robin Johnson archtester

2014-02-01 21:20:01 UTC

Closing as upstream per comment #12