| Summary: | sys-fs/lvm2-2.02.36: pvmove crashes system | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Florian Philipp <f_philipp> |
| Component: | [OLD] Core system | Assignee: | Robin Johnson <robbat2> |
| Status: | RESOLVED TEST-REQUEST | ||
| Severity: | critical | CC: | agk, andrews, cardoe |
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | AMD64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: |
emerge -v --info
kernel config cryptsetup luksDump /dev/md1 mdadm --misc --detail /dev/md0 /dev/md1 |
||
|
Description
Florian Philipp
2008-07-28 19:08:46 UTC
Created attachment 161562 [details]
emerge -v --info
Created attachment 161563 [details]
kernel config
Created attachment 161564 [details]
cryptsetup luksDump /dev/md1
Created attachment 161566 [details]
mdadm --misc --detail /dev/md0 /dev/md1
I can do it without luks, both on md-raid and real hardware RAID (3ware). I can't test a full luks setup at the moment, so I advise you to try and test without luks. Today I tried it again and watched it closely. It proceeded until it hit 99.6% then it stopped and the load average increased slowly until the system became unresponsive. Killing the pvmove process didn't help. I just finished the job. After the last trial there were only a few extends belonging to /var/cache/edb and /usr/portage left which I moved one after the other. I'm not really willing to try and reproduce it. I had my share of magic-sysrq-reboots. Maybe I'll try it on a virtual machine sometime but not now. That's why I won't bother if you put it on WORKSFORME, TEST-REQUEST or whatever. Oh, by the way; some additions to the results of my test: The process doesn't hog memory, it doesn't need a lot of CPU-time nor does it produce load until it freezes. The system was actually quiet responsive. I could open new shells, start top, uptime, everything. Only reboot, shutdown and everything which is a symlink to lvm froze. Maybe there is some kind of deadlock? I think I'm having this same problem on 2.6.24-gentoo-r5. I'm trying to migrate data off a 1.1TB RAID array. This server has about 80 logical volumes on 32TB total space and it's under constant Samba use, so I didn't expect pvmove to be fast. But whenever I start or resume pvmove, my system load starts climbing steadily, and Samba shares become slower and slower to respond until they stop responding at all. I can usually kill most Samba instances, but a few always remain and won't die, and while the load average goes down a bit, it starts climbing again. Even if I manage to kill pvmove, which also temporarily lowers the load average, I eventually have to hard reboot the system. The only workaround I see at the moment (and I'm not even sure it actually works, maybe I just had luck) is to move one logical volume after the other: pvmove -n lv1 pv1 pv2 You can also try to abort and resume pvmove every five minutes or so: pvmove --background pv1 pv2 while pvmove --background; do sleep 300 pvmove --abort sleep 5 done (or something similar, haven't tested it) Can you guys try different kernels? agk: any ideas on this one? I am having much better luck starting pvmove in the background. It might also be because there has been much less Samba activity over the past day; I'll probably know tonight, once the users get back to pounding the server, if that's making things better or if it's the background start. I only ever do one pvmove at a time, by the way. Seems prudent. There are other reports like this one, e.g. http://markmail.org/message/cpvzorkpdalsdack Maybe this is just another symptom of http://bugzilla.kernel.org/show_bug.cgi?id=12309 possibly related: http://souja.net/2009/03/reigning-in-lvm-pvmove-memory-leakage.html Please test 2.02.56-r1. |