When I try to utilize pvmove to move some logical volumes to a luks-device on an md-raid (raid0 of two disks, an (at the moment) degraded raid1 on top of that), the system becomes unresponsive after some time: Disk activity stops, user sessions freeze, login attempts freeze after typing the user name. SSH-sessions are kept alive (only the session, not the bash), kernel responds to magic-sysrq-keys, cursor keeps blinking in terminal. After a reboot everything works fine at the beginning, pvmove automatically restarts and crashes again after some time. When I abort pvmove (pvmove --abort), the system stays stable. Reproducible: Always Steps to Reproduce: 0. [unknown, maybe there is a correlation with cryptsetup or md] 1. pvcreate /dev/sda1 /dev/sdb1 2. vgcreate vg_test /dev/sda1 /dev/sdb1 3. lvcreate --size 40G --name test /dev/sda1 4. pvmove /dev/sda1 5. wait Versions: sys-fs/cryptsetup-1.0.5-r1 sys-fs/mdadm-2.6.4-r1 sys-kernel/gentoo-sources-2.6.25-r7 The system runs stable, achieves uptimes of more than 30 days under constant load.
Created attachment 161562 [details] emerge -v --info
Created attachment 161563 [details] kernel config
Created attachment 161564 [details] cryptsetup luksDump /dev/md1
Created attachment 161566 [details] mdadm --misc --detail /dev/md0 /dev/md1
I can do it without luks, both on md-raid and real hardware RAID (3ware). I can't test a full luks setup at the moment, so I advise you to try and test without luks.
Today I tried it again and watched it closely. It proceeded until it hit 99.6% then it stopped and the load average increased slowly until the system became unresponsive. Killing the pvmove process didn't help.
I just finished the job. After the last trial there were only a few extends belonging to /var/cache/edb and /usr/portage left which I moved one after the other. I'm not really willing to try and reproduce it. I had my share of magic-sysrq-reboots. Maybe I'll try it on a virtual machine sometime but not now. That's why I won't bother if you put it on WORKSFORME, TEST-REQUEST or whatever. Oh, by the way; some additions to the results of my test: The process doesn't hog memory, it doesn't need a lot of CPU-time nor does it produce load until it freezes. The system was actually quiet responsive. I could open new shells, start top, uptime, everything. Only reboot, shutdown and everything which is a symlink to lvm froze. Maybe there is some kind of deadlock?
I think I'm having this same problem on 2.6.24-gentoo-r5. I'm trying to migrate data off a 1.1TB RAID array. This server has about 80 logical volumes on 32TB total space and it's under constant Samba use, so I didn't expect pvmove to be fast. But whenever I start or resume pvmove, my system load starts climbing steadily, and Samba shares become slower and slower to respond until they stop responding at all. I can usually kill most Samba instances, but a few always remain and won't die, and while the load average goes down a bit, it starts climbing again. Even if I manage to kill pvmove, which also temporarily lowers the load average, I eventually have to hard reboot the system.
The only workaround I see at the moment (and I'm not even sure it actually works, maybe I just had luck) is to move one logical volume after the other: pvmove -n lv1 pv1 pv2 You can also try to abort and resume pvmove every five minutes or so: pvmove --background pv1 pv2 while pvmove --background; do sleep 300 pvmove --abort sleep 5 done (or something similar, haven't tested it)
Can you guys try different kernels? agk: any ideas on this one?
I am having much better luck starting pvmove in the background. It might also be because there has been much less Samba activity over the past day; I'll probably know tonight, once the users get back to pounding the server, if that's making things better or if it's the background start. I only ever do one pvmove at a time, by the way. Seems prudent.
There are other reports like this one, e.g. http://markmail.org/message/cpvzorkpdalsdack Maybe this is just another symptom of http://bugzilla.kernel.org/show_bug.cgi?id=12309
possibly related: http://souja.net/2009/03/reigning-in-lvm-pvmove-memory-leakage.html
Please test 2.02.56-r1.