Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 194969 - sys-fs/lvm2 should include warning about pvmove and dm with kernel-2.6
Summary: sys-fs/lvm2 should include warning about pvmove and dm with kernel-2.6
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Doug Goldstein (RETIRED)
URL: http://tldp.org/HOWTO/LVM-HOWTO/lvm2f...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-07 09:05 UTC by Volker Wegert
Modified: 2007-11-03 22:33 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Volker Wegert 2007-10-07 09:05:14 UTC
pvmove appears to be unreliable when used on a system with kernel 2.6.x and in combination with software RAID / device mapper. This appears to be a known problem; the FAQ (see URL) notes this combination as "not being supported", and I have found numerous posts on mailing lists describing issues related to this problem. 

However, I only found out about this problem *after* I lost three volumes. This incompatibility is not mentioned in the pvmove man pages, it is not displayed when emerging the sys-fs/lvm2 package and it is not mentioned in any of the files in /usr/share/doc/lvm2-$version. The file LVM2/debian/control in the LVM2 CVS repository contains a warning ("LVM2 is currently stable, but has some unimplemented features (most notably, pvmove and e2fsadm).  It is not yet recommended for production use."). I'd strongly suggest to add a warning at least to the .ebuild files to keep others from running into the same trap

Reproducible: Sometimes

Steps to Reproduce:
Hard to reproduce. Seems to happen when running multiple pvmoves sequentially. Safe way for me was to move one volume, then reboot system. Repeat as necessary.

Actual Results:  
Volume being mostly shredded, superblock defective. rebuilding the superblock lead to a filesystem containing fragments of other file systems, but not the original data.
Comment 1 Alasdair Kergon 2007-10-07 11:03:24 UTC
Wow!  That information *is* years out-of-date!

Plenty of people are using 2.6 with raid/device-mapper perfectly satisfactorily.

There are some version combinations that don't play well together, and different versions exhibit different problems in systems under memory pressure.

If you hit a specific problem, you need to search for information about it or report specific details of it.
Comment 2 Volker Wegert 2007-10-07 14:22:23 UTC
I've used the combination for years without problems, too, until I used pvmove for the first time yesterday. Interesting enough, the trouble started with the oom-killer shooting down about every process including syslog, ntpd, screen and pvmove - and this while almost nothing else was running on the system, should have been plenty of free mem. Most of the logs of what happened afterwards are lost, and so is my svn volume. (The third volume was a temporary volume, nothing of importance lost there.) The "critical operation" as I recall it was running pvmove -v -n lv-svn frompv topv (worked as intended), then pvmove -v -n -lv-somethingelse frompv topv. This one complained about "device-mapper: ioctl: error adding target to table", then stopped dead in its tracks, process status being shown as D+ by ps. Same with a pvdisplay process that happened to be running in another session using watch. I've let the system sit for an hour, waiting for it to recover. I then tried to bring down most of the processes gracefully (was possible) and umount or remount-ro the filesystems (not possible, umount/mount also getting stuck in status D+). After several more hours of waiting and no change, I had to sync and restart the system the hard way using SysRq.

I'm not sure whether this is Gentoo specific or whether this discussion should be continued elsewhere - please advise.
Comment 3 Alasdair Kergon 2007-10-07 15:23:52 UTC
Well the starting point is always to report the relevant versions of things you are running e.g. which kernel & which userspace device-mapper/lvm2 versions?
Comment 4 Volker Wegert 2007-10-07 15:28:28 UTC
zathras ~ # emerge --info
Portage 2.1.3.9 (default-linux/amd64/2006.0, gcc-3.4.6, glibc-2.5-r4, 2.6.22-gentoo-r8 x86_64)
=================================================================
System uname: 2.6.22-gentoo-r8 x86_64 AMD Athlon(tm) 64 Processor 3200+
Timestamp of tree: Fri, 05 Oct 2007 02:00:10 +0000
app-shells/bash:     3.2_p17
dev-java/java-config: 1.3.7, 2.0.33-r1
dev-lang/python:     2.4.4-r5
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.9-r2
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.17-r1
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.21
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LANG="de_DE.UTF-8"
LC_ALL="de_DE.UTF-8"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="alsa amd64 apache2 bash-completion berkdb bitmap-fonts cli cracklib crypt cups doc dri eds emboss encode foomaticdb fortran gif gpm gstreamer gtk gtk2 iconv imlib ipv6 isdnlog jpeg lzw lzw-tiff midi mp3 mpeg mudflap ncurses nls nptl nptlonly opengl openmp pam pcre perl png pppd python qt3 qt4 quicktime readline reflection samba sdl session slp spell spl ssl tcpd tiff truetype-fonts type1-fonts unicode usb xorg xpm xv zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY

[I--] [  ] sys-fs/lvm2-2.02.10 (0)
[I--] [  ] sys-fs/mdadm-2.6.2 (0)

Comment 5 Alasdair Kergon 2007-10-07 16:09:10 UTC
check that kernel carefully - pvmove is broken in 2.6.22 (it'll hang till you do pvmove --abort) - you need an update (2.6.22.2 I think - look for the dm patch)

also the userspace lvm2 package there is quite old and you'd be better upgrading
Comment 6 Volker Wegert 2007-10-07 17:18:39 UTC
Forgot to mention that - pvmove --abort decided to get stuck in state D+, too. The lvm2 version I've got installed is the last one that's not flagged as ~amd64, so I thought it would be safer not to upgrade these.

Hopefully I won't be shifting any data around, and I can't risk losing more data and uptime on this machine just to hunt down this bug. If there's anything "safe" I can do to help you, please tell me, otherwise feel free to close this bug with or without adding a message to whatever file. My primary intention was to prevent others from running into the same trouble - my SVN repos are probably terminally destroyed, so I have to revert to the last backup. If the information in CVS is outdated, I was probably riding the wrong train anyway... :-)
Comment 7 Alasdair Kergon 2007-10-07 18:00:35 UTC
Did you try vgcfgrestore to recover stuff?
pvmove doesn't actually delete data - it copies it, then only when the copy has completed successfully does it drop the reference to the old location, but until you overwrite that you can undo the move with vgcfgrestore (though losing any data that changed after the move).
Comment 8 Volker Wegert 2007-10-07 18:14:45 UTC
Through user error, the backup file was on the temporary partition that was shredded. I still have the original PV, it's just no longer part of the VG. Is there any way I could try to recover the data (gpart is masked -amd64 :-( - any other tool?) or can I as well give up trying?
Comment 9 Volker Wegert 2007-10-07 20:26:39 UTC
OK, got it now - I was able to use some old config file from /etc/lvm/archive. This settles the issue for me, unless someone wants to find out why the processes locked up in D+, causing the entire system to become unusable in the first place.
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2007-10-08 22:25:57 UTC
volker: I have used pvmove on my ~amd64 fine. Could you upgrade to the ~amd64 device-mapper, udev, and lvm2 and test again?
Comment 11 Volker Wegert 2007-11-03 22:33:36 UTC
I don't have any volumes left to move, and the secondary volume group is already disassembled and gone. I'll keep an eye on this whenever I have to pvmove data around again.