On one of my PCs lvm segfaults on boot during autoactivation via udev+lvmetad. I've tried 2.02.98 with systemd support (bug 453594) and 2.02.99 (last revision from git) - results are the same. lvm2-activation-generator works without problems. Software versions: sys-apps/systemd-200-r1 sys-fs/lvm2-2.02.{98,99} Reproducible: Always Steps to Reproduce: 1. modify lvm2 ebuild with patch from bug 453594 and merge it 2. "systemctl enable lvm2-lvmetad.service" 3. "use_lvmetad = 1" in /etc/lvm/lvm.conf 4. reboot Actual Results: From journal: апр. 30 19:14:15 home.puleglot kernel: lvm[2776]: segfault at 2c ip 00000076f34180dd sp 000003be8bc2aab0 error 4 in lvm[76f336e000+f0000] апр. 30 19:14:15 home.puleglot kernel: grsec: Segmentation fault occurred at 000000000000002c in /sbin/lvm[lvm:2776] uid/euid:0/0 gid/egid:0/0, parent /sbin/udevd[systemd-udevd:2637] uid/euid:0/0 gid/egid:0/0 апр. 30 19:14:15 home.puleglot systemd-udevd[2637]: '/sbin/lvm pvscan --cache --activate ay --major 9 --minor 2' [2776] terminated by signal 11 (Segmentation fault) апр. 30 19:14:15 home.puleglot systemd-udevd[2644]: '/sbin/lvm pvscan --cache --activate ay --major 9 --minor 3' [2825] terminated by signal 11 (Segmentation fault) апр. 30 19:14:15 home.puleglot kernel: lvm[2825]: segfault at 2c ip 00000065dfbc80dd sp 000003eb632149b0 error 4 in lvm[65dfb1e000+f0000] апр. 30 19:14:15 home.puleglot kernel: grsec: Segmentation fault occurred at 000000000000002c in /sbin/lvm[lvm:2825] uid/euid:0/0 gid/egid:0/0, parent /sbin/udevd[systemd-udevd:2644] uid/euid:0/0 gid/egid:0/0 vg_system/root and vg_system/usr are activated by dracut. Then each "/sbin/lvm pvscan --cache --activate ay ..." command activates only 1 LV in corresponding VG before segfault. This leaves unactivated LVs in the system which cannot be activated later: vg_system/opt and vg_user/data. Backtraces, lsblk output and other info attached below.
Created attachment 347074 [details] emerge --info lvm2
Created attachment 347076 [details] lsblk
Created attachment 347078 [details] lvm backtrace (lvm2-2.02.98)
Created attachment 347080 [details] lvm backtrace (lvm2-2.02.99)
Created attachment 347264 [details, diff] lvm2-lvmetad-pvscan-cache-fix-segv.patch Maybe this patch doesn't fix the root of the problem, but it fixes segfaults for me. All LVs gets activated during boot now.
Thanks for the report! We'll investigate this and sort out an upstream fix!
(In reply to comment #5) > Created attachment 347264 [details, diff] [details, diff] > lvm2-lvmetad-pvscan-cache-fix-segv.patch > > Maybe this patch doesn't fix the root of the problem, but it fixes segfaults > for me. All LVs gets activated during boot now. Well, yes, we need to find the real source of the problem here... The situation described here should not normally happen - if the PV does belong to some VG, then *all* metadata areas that belong to that PV should refer to the same existing VG. What happened here is that one of the calls to vg_read for one of the metadata areas returned no VG though the call returned some VG for *other* metadata area which is bound to same PV. This is clearly an inconsistent lvmetad/lvmcache state. For starters, could you please try following: - use the original code without that patch - set use_lvmetad=0 in the lvm.conf - reboot - after reboot, set use_lvmetad=1 and start the lvm2-lvmetad.service - run "pvscan --cache /dev/<PV_device>" for each device on the cmd line directly - can you reproduce the segfault this way? Thanks.
(In reply to comment #7) > For starters, could you please try following: > > - use the original code without that patch > - set use_lvmetad=0 in the lvm.conf > - reboot > - after reboot, set use_lvmetad=1 and start the lvm2-lvmetad.service > - run "pvscan --cache /dev/<PV_device>" for each device on the cmd line > directly - can you reproduce the segfault this way? > > Thanks. No segfaults, but on the first attempt it tries to open cdrom and sdd (flash card reader), both are empty. $ sudo LANG=C pvscan --cache /dev/md2 /dev/cdrom: open failed: No medium found /dev/sdd: open failed: No medium found $ sudo LANG=C pvscan --cache /dev/md3 $ sudo LANG=C pvscan --cache /dev/md2 $ sudo LANG=C pvscan --cache /dev/md3
This also "solves" the problem with segfaults :) $ diff -u /etc/lvm/lvm.conf{.dist,} --- /etc/lvm/lvm.conf.dist 2013-04-25 22:42:19.000000000 +0400 +++ /etc/lvm/lvm.conf 2013-05-07 23:58:26.755399905 +0400 @@ -87,7 +87,7 @@ # global_filter. The syntax is the same as for normal "filter" # above. Devices that fail the global_filter are not even opened by LVM. - # global_filter = [] + global_filter = [ "r|/dev/cdrom|", "r|/dev/sdd|" ] # The results of the filtering are cached on disk to avoid # rescanning dud devices (which can take a very long time). @@ -495,7 +495,7 @@ # # If lvmetad has been running while use_lvmetad was 0, it MUST be stopped # before changing use_lvmetad to 1 and started again afterwards. - use_lvmetad = 0 + use_lvmetad = 1 # Full path of the utility called to check that a thin metadata device # is in a state that allows it to be used.
(In reply to comment #9) > This also "solves" the problem with segfaults :) > > $ diff -u /etc/lvm/lvm.conf{.dist,} > --- /etc/lvm/lvm.conf.dist 2013-04-25 22:42:19.000000000 +0400 > +++ /etc/lvm/lvm.conf 2013-05-07 23:58:26.755399905 +0400 > @@ -87,7 +87,7 @@ > # global_filter. The syntax is the same as for normal "filter" > # above. Devices that fail the global_filter are not even opened by LVM. > > - # global_filter = [] > + global_filter = [ "r|/dev/cdrom|", "r|/dev/sdd|" ] > > # The results of the filtering are cached on disk to avoid > # rescanning dud devices (which can take a very long time). Hmm.. This not completely solve the problem. Segfaults still rarely occurs. Then in emergency shell I do the following: home ~ # systemctl start lvm2-lvmetad.service home ~ # ls -l total 16 -rw-r--r-- 1 root root 0 May 9 19:16 typescript home ~ # pvscan --cache /dev/md2 �EF: open failed: Is a directory home ~ # ls -l total 20 -rw-r--r-- 1 root root 0 May 9 19:16 typescript drwx------ 2 root root 4096 May 9 19:17 ??E?F Directory with strange name got created. O_o
I just tried latest snapshot from Git again. Problem still exist. Any ideas?
I can't reproduce these segfaults with 2.02.99 release. So looks like this bug was fixed. ^^
Closing as upstream per comment #12