Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 233186 - sys-fs/lvm2-2.02.36: pvmove crashes system
Summary: sys-fs/lvm2-2.02.36: pvmove crashes system
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical with 1 vote (vote)
Assignee: Robin Johnson
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-28 19:08 UTC by Florian Philipp
Modified: 2009-11-30 01:00 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge -v --info (emerge_info,9.47 KB, text/plain)
2008-07-28 19:11 UTC, Florian Philipp
Details
kernel config (kernel_config,44.79 KB, text/plain)
2008-07-28 19:12 UTC, Florian Philipp
Details
cryptsetup luksDump /dev/md1 (cryptsetup,841 bytes, text/plain)
2008-07-28 19:13 UTC, Florian Philipp
Details
mdadm --misc --detail /dev/md0 /dev/md1 (mdadm,1.40 KB, text/plain)
2008-07-28 19:18 UTC, Florian Philipp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Philipp 2008-07-28 19:08:46 UTC
When I try to utilize pvmove to move some logical volumes to a luks-device on an md-raid (raid0 of two disks, an (at the moment) degraded raid1 on top of that), the system becomes unresponsive after some time:
Disk activity stops, user sessions freeze, login attempts freeze after typing the user name. SSH-sessions are kept alive (only the session, not the bash), kernel responds to magic-sysrq-keys, cursor keeps blinking in terminal.

After a reboot everything works fine at the beginning, pvmove automatically restarts and crashes again after some time.

When I abort pvmove (pvmove --abort), the system stays stable.

Reproducible: Always

Steps to Reproduce:
0. [unknown, maybe there is a correlation with cryptsetup or md]
1. pvcreate /dev/sda1 /dev/sdb1
2. vgcreate vg_test /dev/sda1 /dev/sdb1
3. lvcreate --size 40G --name test /dev/sda1 
4. pvmove /dev/sda1
5. wait




Versions:
sys-fs/cryptsetup-1.0.5-r1
sys-fs/mdadm-2.6.4-r1
sys-kernel/gentoo-sources-2.6.25-r7

The system runs stable, achieves uptimes of more than 30 days under constant load.
Comment 1 Florian Philipp 2008-07-28 19:11:47 UTC
Created attachment 161562 [details]
emerge -v --info
Comment 2 Florian Philipp 2008-07-28 19:12:22 UTC
Created attachment 161563 [details]
kernel config
Comment 3 Florian Philipp 2008-07-28 19:13:58 UTC
Created attachment 161564 [details]
cryptsetup luksDump /dev/md1
Comment 4 Florian Philipp 2008-07-28 19:18:01 UTC
Created attachment 161566 [details]
mdadm --misc --detail /dev/md0 /dev/md1
Comment 5 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-07-28 21:10:41 UTC
I can do it without luks, both on md-raid and real hardware RAID (3ware). I can't test a full luks setup at the moment, so I advise you to try and test without luks.
Comment 6 Florian Philipp 2008-08-11 17:32:10 UTC
Today I tried it again and watched it closely. It proceeded until it hit 99.6% then it stopped and the load average increased slowly until the system became unresponsive. Killing the pvmove process didn't help.
Comment 7 Florian Philipp 2008-08-11 17:59:03 UTC
I just finished the job. After the last trial there were only a few extends belonging to /var/cache/edb and /usr/portage left which I moved one after the other.

I'm not really willing to try and reproduce it. I had my share of magic-sysrq-reboots. Maybe I'll try it on a virtual machine sometime but not now. That's why I won't bother if you put it on WORKSFORME, TEST-REQUEST or whatever.

Oh, by the way; some additions to the results of my test:
The process doesn't hog memory, it doesn't need a lot of CPU-time nor does it produce load until it freezes.

The system was actually quiet responsive. I could open new shells, start top, uptime, everything. Only reboot, shutdown and everything which is a symlink to lvm froze. Maybe there is some kind of deadlock?
Comment 8 John Andrews 2008-08-15 20:50:19 UTC
I think I'm having this same problem on 2.6.24-gentoo-r5. I'm trying to migrate data off a 1.1TB RAID array. This server has about 80 logical volumes on 32TB total space and it's under constant Samba use, so I didn't expect pvmove to be fast. But whenever I start or resume pvmove, my system load starts climbing steadily, and Samba shares become slower and slower to respond until they stop responding at all. I can usually kill most Samba instances, but a few always remain and won't die, and while the load average goes down a bit, it starts climbing again. Even if I manage to kill pvmove, which also temporarily lowers the load average, I eventually have to hard reboot the system. 
Comment 9 Florian Philipp 2008-08-15 21:39:00 UTC
The only workaround I see at the moment (and I'm not even sure it actually works, maybe I just had luck) is to move one logical volume after the other:
pvmove -n lv1 pv1 pv2

You can also try to abort and resume pvmove every five minutes or so:
pvmove --background pv1 pv2
while pvmove --background; do
sleep 300
pvmove --abort
sleep 5
done
(or something similar, haven't tested it)
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-08-15 22:25:22 UTC
Can you guys try different kernels?

agk: any ideas on this one?
Comment 11 John Andrews 2008-08-18 13:15:09 UTC
I am having much better luck starting pvmove in the background. It might also be because there has been much less Samba activity over the past day; I'll probably know tonight, once the users get back to pounding the server, if that's making things better or if it's the background start.

I only ever do one pvmove at a time, by the way. Seems prudent.
Comment 12 Carsten Lohrke (RETIRED) gentoo-dev 2009-01-20 22:57:27 UTC
There are other reports like this one, e.g. 

http://markmail.org/message/cpvzorkpdalsdack

Maybe this is just another symptom of

http://bugzilla.kernel.org/show_bug.cgi?id=12309
Comment 13 Caleb Tennis (RETIRED) gentoo-dev 2009-03-21 12:44:15 UTC
possibly related:

http://souja.net/2009/03/reigning-in-lvm-pvmove-memory-leakage.html
Comment 14 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2009-11-30 01:00:32 UTC
Please test 2.02.56-r1.