Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 376947 - lvm2-2.02.73-r1 pvmove malfunction leads to data loss
Summary: lvm2-2.02.73-r1 pvmove malfunction leads to data loss
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Robin Johnson
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-29 15:01 UTC by Robert Forsman
Modified: 2013-01-17 23:44 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Forsman 2011-07-29 15:01:40 UTC
I was performing a pvmove of an LV from a pair of partitions on one disk onto a single partition on a second disk.

The first pvmove completed, but there were some error messages I had never seen before.  The second pvmove issued a lot of errors and pointed the LV at the new disk, but the data had not been transferred.  This resulted in a corrupt filesystem.

I had to reconstruct the original LV by building it over the original segments.

This is the original LV map:

siwa thoth # lvdisplay -m vg-siwa/lv_homes2
  /dev/hda: open failed: No medium found
  --- Logical volume ---
  LV Name                /dev/vg-siwa/lv_homes2
  VG Name                vg-siwa
  LV UUID                RVJyay-nRtY-AwAc-6fqR-kHhm-hukP-Xx7GiT
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                100.00 GiB
  Current LE             25600
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3
   
  --- Segments ---
  Logical extent 0 to 14235:
    Type                linear
    Physical volume     /dev/sda6
    Physical extents    1024 to 15259
   
  Logical extent 14236 to 25599:
    Type                linear
    Physical volume     /dev/sda8
    Physical extents    1124 to 12487

This is the tail end of the output of the pvmove:

  /dev/sda6: Moved: 98.9%
  /dev/sda6: Moved: 99.5%
  /dev/sda6: Moved: 100.0%
  Internal error: _memlock_count has dropped below 0.
  Internal error: _memlock_count has dropped below 0.
  ABORTING: Unable to deactivate temporary logical volume "pvmove0"
siwa thoth # pvmove
  /dev/hda: open failed: No medium found
siwa thoth #

  I issued the second pvmove to make sure the operation was complete.  (In the past, if a pvmove gets interrupted, the relocation is still occurring in the background, and even if there is a machine crash, running pvmove with no arguments usually resumes the operation and carries it to completion).

siwa thoth # lvdisplay -m vg-siwa/lv_homes2
  /dev/hda: open failed: No medium found
  --- Logical volume ---
  LV Name                /dev/vg-siwa/lv_homes2
  VG Name                vg-siwa
  LV UUID                RVJyay-nRtY-AwAc-6fqR-kHhm-hukP-Xx7GiT
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                100.00 GiB
  Current LE             25600
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3
   
  --- Segments ---
  Logical extent 0 to 14235:
    Type                linear
    Physical volume     /dev/sdb5
    Physical extents    0 to 14235
   
  Logical extent 14236 to 25599:
    Type                linear
    Physical volume     /dev/sda8
    Physical extents    1124 to 12487



When I attempted to complete the pvmove I got this wall of errors:


siwa thoth # pvmove -n lv_homes2 /dev/sda8  /dev/sdb5
  /dev/hda: open failed: No medium found
  device-mapper: create ioctl failed: Device or resource busy
  device-mapper: reload ioctl failed: Invalid argument
  WARNING: dev_open(/dev/sda8) called while suspended
  WARNING: dev_open(/dev/sda5) called while suspended
  WARNING: dev_open(/dev/sda6) called while suspended
  WARNING: dev_open(/dev/sda7) called while suspended
  WARNING: dev_open(/dev/sda9) called while suspended
  WARNING: dev_open(/dev/sdb5) called while suspended
  WARNING: dev_open(/dev/sdb6) called while suspended
  WARNING: dev_open(/dev/sda8) called while suspended
  WARNING: dev_open(/dev/sda5) called while suspended
  WARNING: dev_open(/dev/sda6) called while suspended
  WARNING: dev_open(/dev/sda7) called while suspended
  WARNING: dev_open(/dev/sda9) called while suspended
  WARNING: dev_open(/dev/sdb5) called while suspended
  WARNING: dev_open(/dev/sdb6) called while suspended
  WARNING: dev_open(/dev/sda8) called while suspended
  WARNING: dev_open(/dev/sda5) called while suspended
  WARNING: dev_open(/dev/sda6) called while suspended
  WARNING: dev_open(/dev/sda7) called while suspended
  WARNING: dev_open(/dev/sda9) called while suspended
  WARNING: dev_open(/dev/sdb5) called while suspended
  WARNING: dev_open(/dev/sdb6) called while suspended
  /dev/sda8: Moved: 100.0%
  device-mapper: create ioctl failed: Device or resource busy
  Unable to reactivate logical volume "pvmove0"



At this point, the LV is mapped to the new drive, but the data wasn't copied, so the filesystem is corrupt.


Reproducible: Didn't try



Expected Results:  
I think pvmove should probably pay a little more attention when it gets errors from the OS and back out to prevent data loss.

Linux farafra 2.6.37-gentoo-r4 #11 SMP PREEMPT Thu Jul 14 14:49:25 EDT 2011 x86_64 Intel(R) Core(TM) i3 CPU 530 @ 2.93GHz GenuineIntel GNU/Linux

I doubt it is relevant, but sda is an internal drive and sdb is a usb-attached external drive.
Comment 1 Robert Forsman 2011-07-29 15:07:29 UTC
Ooops, ran uname in the wrong terminal window.

thoth@siwa ~ $ uname -a
Linux siwa 2.6.38-gentoo-r6 #1 SMP Mon Jul 18 12:45:09 EDT 2011 i686 Intel(R) Pentium(R) D CPU 2.80GHz GenuineIntel GNU/Linux
Comment 2 Robert Forsman 2011-07-29 16:03:45 UTC
While performing some experiments I uncovered even more madness.

I performed a pvmove on an LV and got the same memlock_count error as before.  When I attempted to unmount the LV I got 

siwa / # umount /home
Killed
siwa / # 
Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: Oops: 0000 [#1] SMP 

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: last sysfs file: /sys/devices/virtual/block/dm-3/dev

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: Call Trace:

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: Process umount (pid: 3719, ti=f5c72000 task=f5d7e780 task.ti=f5c72000)

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: Stack:

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: Code: 89 d0 5a 5b 5e 5f 5d c3 55 89 e5 57 56 be c3 ff ff ff 53 83 ec 1c 89 45 d8 89 55 dc 8b 58 10 8b 83 78 01 00 00 8b b8 9c 00 00 00 <8b> 47 20 85 c0 0f 84 1e 01 00 00 83 c0 1c e8 7e 04 20 00 8b 83 

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: EIP: [open_xa_dir+35/342]  SS:ESP 0068:f5c73dc0

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: EIP: [<c01c9c04>]  SS:ESP 0068:f5c73dc0

Message from syslogd@siwa at Fri Jul 29 11:50:29 2011 ...
siwa kernel: CR2: 0000000000000020



/var/log/kern.log contains



Jul 29 11:50:29 siwa kernel: BUG: unable to handle kernel NULL pointer dereference at 00000020
Jul 29 11:50:29 siwa kernel: IP: [open_xa_dir+35/342] 0xc01c9c04
Jul 29 11:50:29 siwa kernel: IP: [<c01c9c04>] 0xc01c9c04
Jul 29 11:50:29 siwa kernel: *pde = 00000000 
Jul 29 11:50:29 siwa kernel: Oops: 0000 [#1] SMP 
Jul 29 11:50:29 siwa kernel: last sysfs file: /sys/devices/virtual/block/dm-3/dev
Jul 29 11:50:29 siwa kernel: Modules linked in: dm_mirror dm_region_hash dm_log autofs4 nfsd lockd sunrpc ipv6 dm_mod usbhid snd_hda_codec_analog nvidia(P) usb_storage snd_hda_intel snd_hda_codec snd_pcm uhci_hcd tg3 e100 snd_timer ehci_hcd intel_agp floppy snd rtc_cmos i2c_core sg intel_gtt firmware_class rtc_core agpgart libphy mii usbcore container rtc_lib soundcore processor snd_page_alloc button thermal_sys
Jul 29 11:50:29 siwa kernel: 
Jul 29 11:50:29 siwa kernel: Pid: 3719, comm: umount Tainted: P            2.6.38-gentoo-r6 #1 LENOVO 9637D1U/LENOVO
Jul 29 11:50:29 siwa kernel: EIP: 0060:[open_xa_dir+35/342] EFLAGS: 00010286 CPU: 0
Jul 29 11:50:29 siwa kernel: EIP: 0060:[<c01c9c04>] EFLAGS: 00010286 CPU: 0
Jul 29 11:50:29 siwa kernel: EAX: f4c75780 EBX: f50fe400 ECX: 00000000 EDX: 00000002
Jul 29 11:50:29 siwa kernel: ESI: ffffffc3 EDI: 00000000 EBP: f5c73de8 ESP: f5c73dc0
Jul 29 11:50:29 siwa kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jul 29 11:50:29 siwa kernel: Process umount (pid: 3719, ti=f5c72000 task=f5d7e780 task.ti=f5c72000)
Jul 29 11:50:29 siwa kernel: Stack:
Jul 29 11:50:29 siwa kernel:  f40afe60 00000002 6746cbd9 f60056f8 000000a2 f6005b90 000000ce f40afe60
Jul 29 11:50:29 siwa kernel:  f50fe400 f5c73e2c f5c73e64 c01ca36a 00000000 00000000 c01cab2a 00000086
Jul 29 11:50:29 siwa kernel:  f40afe60 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jul 29 11:50:29 siwa kernel: Call Trace:
Jul 29 11:50:29 siwa kernel:  [reiserfs_for_each_xattr+88/512] 0xc01ca36a
Jul 29 11:50:29 siwa kernel:  [<c01ca36a>] 0xc01ca36a
Jul 29 11:50:29 siwa kernel:  [delete_one_xattr+0/158] ? 0xc01cab2a
Jul 29 11:50:29 siwa kernel:  [<c01cab2a>] ? 0xc01cab2a
Jul 29 11:50:29 siwa kernel:  [reiserfs_delete_xattrs+21/63] 0xc01ca566
Jul 29 11:50:29 siwa kernel:  [<c01ca566>] 0xc01ca566
Jul 29 11:50:29 siwa kernel:  [reiserfs_evict_inode+77/213] 0xc01b3474
Jul 29 11:50:29 siwa kernel:  [<c01b3474>] 0xc01b3474
Jul 29 11:50:29 siwa kernel:  [smp_apic_timer_interrupt+110/124] ? 0xc01113e9
Jul 29 11:50:29 siwa kernel:  [<c01113e9>] ? 0xc01113e9
Jul 29 11:50:29 siwa kernel:  [apic_timer_interrupt+42/48] ? 0xc03cb5da
Jul 29 11:50:29 siwa kernel:  [<c03cb5da>] ? 0xc03cb5da
Jul 29 11:50:29 siwa kernel:  [evict+23/121] 0xc017f0d7
Jul 29 11:50:29 siwa kernel:  [<c017f0d7>] 0xc017f0d7
Jul 29 11:50:29 siwa kernel:  [iput+386/432] 0xc017f8a7
Jul 29 11:50:29 siwa kernel:  [<c017f8a7>] 0xc017f8a7
Jul 29 11:50:29 siwa kernel:  [shrink_dcache_for_umount_subtree+326/369] 0xc017e44c
Jul 29 11:50:29 siwa kernel:  [<c017e44c>] 0xc017e44c
Jul 29 11:50:29 siwa kernel:  [shrink_dcache_for_umount+60/72] 0xc017e4b3
Jul 29 11:50:29 siwa kernel:  [<c017e4b3>] 0xc017e4b3
Jul 29 11:50:29 siwa kernel:  [generic_shutdown_super+22/182] 0xc0170c0a
Jul 29 11:50:29 siwa kernel:  [<c0170c0a>] 0xc0170c0a
Jul 29 11:50:29 siwa kernel:  [kill_block_super+29/89] 0xc0170cc7
Jul 29 11:50:29 siwa kernel:  [<c0170cc7>] 0xc0170cc7
Jul 29 11:50:29 siwa kernel:  [reiserfs_kill_sb+125/128] 0xc01b9f4f
Jul 29 11:50:29 siwa kernel:  [<c01b9f4f>] 0xc01b9f4f
Jul 29 11:50:29 siwa kernel:  [deactivate_locked_super+26/59] 0xc0170d79
Jul 29 11:50:29 siwa kernel:  [<c0170d79>] 0xc0170d79
Jul 29 11:50:29 siwa kernel:  [deactivate_super+50/55] 0xc0171533
Jul 29 11:50:29 siwa kernel:  [<c0171533>] 0xc0171533
Jul 29 11:50:29 siwa kernel:  [mntput_no_expire+266/272] 0xc0182518
Jul 29 11:50:29 siwa kernel:  [<c0182518>] 0xc0182518
Jul 29 11:50:29 siwa kernel:  [sys_umount+622/659] 0xc0182987
Jul 29 11:50:29 siwa kernel:  [<c0182987>] 0xc0182987
Jul 29 11:50:29 siwa kernel:  [sys_oldumount+13/15] 0xc01829b9
Jul 29 11:50:29 siwa kernel:  [<c01829b9>] 0xc01829b9
Jul 29 11:50:29 siwa kernel:  [sysenter_do_call+18/38] 0xc0102694
Jul 29 11:50:29 siwa kernel:  [<c0102694>] 0xc0102694
Jul 29 11:50:29 siwa kernel: Code: 89 d0 5a 5b 5e 5f 5d c3 55 89 e5 57 56 be c3 ff ff ff 53 83 ec 1c 89 45 d8 89 55 dc 8b 58 10 8b 83 78 01 00 00 8b b8 9c 00 00 00 <8b> 47 20 85 c0 0f 84 1e 01 00 00 83 c0 1c e8 7e 04 20 00 8b 83 
Jul 29 11:50:29 siwa kernel: EIP: [open_xa_dir+35/342]  SS:ESP 0068:f5c73dc0
Jul 29 11:50:29 siwa kernel: EIP: [<c01c9c04>]  SS:ESP 0068:f5c73dc0
Jul 29 11:50:29 siwa kernel: CR2: 0000000000000020
Jul 29 11:50:29 siwa kernel: ---[ end trace 1f53dd572da395bf ]---


/etc/mtab thinks /home is still mounted, but /proc/mounts shows that it is not.


In the wake of this a "shutdown -r now" appears to have no effect.  I had to Alt-SysRq-S-U-B to reboot.

As the system was booting I noticed some errors which scrolled past, but were copied in /var/log/daemon.log


Jul 29 11:45:21 siwa udevd-work[2426]: kernel-provided name 'dm-2' and NAME= 'mapper/vg--siwa-lv_home' disagree, please use SYMLINK+= or change the kernel to provide the proper name 
Jul 29 11:45:22 siwa udevd-work[2427]: kernel-provided name 'dm-9' and NAME= 'mapper/vg--siwa-pvmove0' disagree, please use SYMLINK+= or change the kernel to provide the proper name 
Jul 29 11:45:22 siwa udevd-work[2426]: kernel-provided name 'dm-9' and NAME= 'mapper/vg--siwa-pvmove0' disagree, please use SYMLINK+= or change the kernel to provide the proper name 
Jul 29 11:48:39 siwa udevd-work[3504]: kernel-provided name 'dm-9' and NAME= 'mapper/vg--siwa-pvmove0' disagree, please use SYMLINK+= or change the kernel to provide the proper name 
Jul 29 11:48:40 siwa udevd-work[3504]: kernel-provided name 'dm-2' and NAME= 'mapper/vg--siwa-lv_home' disagree, please use SYMLINK+= or change the kernel to provide the proper name 





Something about LVM is quite buggered relative to a year ago.
Comment 3 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2013-01-17 23:44:17 UTC
please retest on .97 or newer