Summary: | sys-fs/lvm2 should include warning about pvmove and dm with kernel-2.6 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Volker Wegert <gentoo> |
Component: | [OLD] Core system | Assignee: | Doug Goldstein (RETIRED) <cardoe> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | agk, robbat2 |
Priority: | High | ||
Version: | 2006.1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://tldp.org/HOWTO/LVM-HOWTO/lvm2faq.html | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Volker Wegert
2007-10-07 09:05:14 UTC
Wow! That information *is* years out-of-date! Plenty of people are using 2.6 with raid/device-mapper perfectly satisfactorily. There are some version combinations that don't play well together, and different versions exhibit different problems in systems under memory pressure. If you hit a specific problem, you need to search for information about it or report specific details of it. I've used the combination for years without problems, too, until I used pvmove for the first time yesterday. Interesting enough, the trouble started with the oom-killer shooting down about every process including syslog, ntpd, screen and pvmove - and this while almost nothing else was running on the system, should have been plenty of free mem. Most of the logs of what happened afterwards are lost, and so is my svn volume. (The third volume was a temporary volume, nothing of importance lost there.) The "critical operation" as I recall it was running pvmove -v -n lv-svn frompv topv (worked as intended), then pvmove -v -n -lv-somethingelse frompv topv. This one complained about "device-mapper: ioctl: error adding target to table", then stopped dead in its tracks, process status being shown as D+ by ps. Same with a pvdisplay process that happened to be running in another session using watch. I've let the system sit for an hour, waiting for it to recover. I then tried to bring down most of the processes gracefully (was possible) and umount or remount-ro the filesystems (not possible, umount/mount also getting stuck in status D+). After several more hours of waiting and no change, I had to sync and restart the system the hard way using SysRq. I'm not sure whether this is Gentoo specific or whether this discussion should be continued elsewhere - please advise. Well the starting point is always to report the relevant versions of things you are running e.g. which kernel & which userspace device-mapper/lvm2 versions? zathras ~ # emerge --info Portage 2.1.3.9 (default-linux/amd64/2006.0, gcc-3.4.6, glibc-2.5-r4, 2.6.22-gentoo-r8 x86_64) ================================================================= System uname: 2.6.22-gentoo-r8 x86_64 AMD Athlon(tm) 64 Processor 3200+ Timestamp of tree: Fri, 05 Oct 2007 02:00:10 +0000 app-shells/bash: 3.2_p17 dev-java/java-config: 1.3.7, 2.0.33-r1 dev-lang/python: 2.4.4-r5 dev-python/pycrypto: 2.0.1-r6 sys-apps/baselayout: 1.12.9-r2 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.17-r1 sys-devel/gcc-config: 1.3.16 sys-devel/libtool: 1.5.24 virtual/os-headers: 2.6.21 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" LANG="de_DE.UTF-8" LC_ALL="de_DE.UTF-8" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="alsa amd64 apache2 bash-completion berkdb bitmap-fonts cli cracklib crypt cups doc dri eds emboss encode foomaticdb fortran gif gpm gstreamer gtk gtk2 iconv imlib ipv6 isdnlog jpeg lzw lzw-tiff midi mp3 mpeg mudflap ncurses nls nptl nptlonly opengl openmp pam pcre perl png pppd python qt3 qt4 quicktime readline reflection samba sdl session slp spell spl ssl tcpd tiff truetype-fonts type1-fonts unicode usb xorg xpm xv zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY [I--] [ ] sys-fs/lvm2-2.02.10 (0) [I--] [ ] sys-fs/mdadm-2.6.2 (0) check that kernel carefully - pvmove is broken in 2.6.22 (it'll hang till you do pvmove --abort) - you need an update (2.6.22.2 I think - look for the dm patch) also the userspace lvm2 package there is quite old and you'd be better upgrading Forgot to mention that - pvmove --abort decided to get stuck in state D+, too. The lvm2 version I've got installed is the last one that's not flagged as ~amd64, so I thought it would be safer not to upgrade these. Hopefully I won't be shifting any data around, and I can't risk losing more data and uptime on this machine just to hunt down this bug. If there's anything "safe" I can do to help you, please tell me, otherwise feel free to close this bug with or without adding a message to whatever file. My primary intention was to prevent others from running into the same trouble - my SVN repos are probably terminally destroyed, so I have to revert to the last backup. If the information in CVS is outdated, I was probably riding the wrong train anyway... :-) Did you try vgcfgrestore to recover stuff? pvmove doesn't actually delete data - it copies it, then only when the copy has completed successfully does it drop the reference to the old location, but until you overwrite that you can undo the move with vgcfgrestore (though losing any data that changed after the move). Through user error, the backup file was on the temporary partition that was shredded. I still have the original PV, it's just no longer part of the VG. Is there any way I could try to recover the data (gpart is masked -amd64 :-( - any other tool?) or can I as well give up trying? OK, got it now - I was able to use some old config file from /etc/lvm/archive. This settles the issue for me, unless someone wants to find out why the processes locked up in D+, causing the entire system to become unusable in the first place. volker: I have used pvmove on my ~amd64 fine. Could you upgrade to the ~amd64 device-mapper, udev, and lvm2 and test again? I don't have any volumes left to move, and the secondary volume group is already disassembled and gone. I'll keep an eye on this whenever I have to pvmove data around again. |