Summary: | sys-kernel/vanilla-sources-2.6.36 and above: sys-fs/lvm2's pvmove causes deadlock | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Xake <kanelxake> |
Component: | [OLD] Core system | Assignee: | The Gentoo Linux Hardened Kernel Team (OBSOLETE) <hardened-kernel+disabled> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | CC: | ago, hardened, kernel, klondike |
Priority: | High | Keywords: | UPSTREAM |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | output from sysrq-W (list of blocked processes) |
Description
Xake
2011-01-03 10:45:57 UTC
Created attachment 258727 [details]
output from sysrq-W (list of blocked processes)
A while back talking to klondike on #gentoo-hardened he suggested me turning on loglevel 8 and posting this. I had never the time to finish the bugreport back then, but here it is.
seems like it is possible to get a "hang" in single mode too. First boot, and run "init 1" in a console, then "pvmove" to resume the move, then ctrl-c which gives back the prompt and then run "pvmove --abort" and watch for apparent non existing output. However this time around emergency sync responds after about one-two minutes, so it may be that IO just becomes so slow the system becomes unusable. After 20 minutes no further respons from pvmove. pvmove hangs vanilla kernel too, tested 2.6.36.2 some days ago and 2.6.37-rc8+ this morning. Sometimes it hangs right away, sometimes it hangs after a couple of minutes, sometimes it hangs at 100% and sometimes it does not hang at all. And what seems to hangs is only parts of disk IO, if I have ssh running it still works and I may be able to run processes like top, and if I already have processes running it is seldom they hang until I close them or try to save something in them. So seems like everything that does more then just read from disk is no-go. After speaking with klondike we have a suspicion this may be a race that happens because I am moving a mounted and used filesystem. I am going to do some more testing related to this when I get the time. (In reply to comment #1) > Created an attachment (id=258727) [details] > output from sysrq-W (list of blocked processes) > > A while back talking to klondike on #gentoo-hardened he suggested me turning on > loglevel 8 and posting this. I had never the time to finish the bugreport back > then, but here it is. > Looking at your list of blocked (deadlocked?) processes, the major players are jbd2/dm-2-8 and flush-253:2. The rest, I suspect, are blocked on I/O to the locked device. This looks like an upstream bug with vanilla 2.6.36 https://bugzilla.kernel.org/show_bug.cgi?id=25632 I'm not 100% sure because its is the same kernel threads, but different call traces. I'm going to try to reproduce. Looking for a way to reporoduce this I tried the following today: First I found myself two empty harddisk, the smaller on about 160GB, the bigger being 250GB. I created one partition with the size of 1GB, and a second partition being a vg with 14 lvs on 10GB each on the smaller disk. The vg also includes the other disk, but nothing allocated on it. In a empty directory I created a mountpoint for each and mounted them all and ran "for d in * ; do tar xaf linux-2.6.36.tar.bz2 -C $d ; done" When it was finished I ran in one console: "for d in * ; do make -C $d/linux-2.6.36 allyesconfig -j4 && make -C $d/linux-2.6.36 -j4 & echo ; done" In the next console: "sync" and in the third console: "pvmove <partition of small harddisk>" Currently that system has the following problems: The compilation stopped (wc reports 96 running make processes, but no output from them, or the multiple cc1 processes), sync does not return and pvmove has stopped report progress after about 7%. I can login to consoles (so that part of the former problem could be because of me running the system from the partitions I tried to move), and also "touch ~/test" returns at once, but "touch test1/test" does not return. Neither does "ls -l test1" or pvs. iotop, "bwm-ng -i disk" reports no activity on any disk or process that is included in the vg, and nearly no activity on the other disks. top shows nothing taking resources, however it reports a load of about 81.65 and rising (checking back now it reaches 95.0). So with other word my system works fine, except for that everything trying to do any io with regards to that volumegroup stops responding (other volumegroups still works fine). I can write to the 1GB partiton on the same disk as the vg, so it seems like it is only the vg that is locked, not the entire disk. This with a 2.6.36-r7 hardened kernel. I will try the same with vanilla. My next step is to change around in this to see what is the minimal needed thing that needs to be done to get this to hang, for example see if it hang even if I only start the compilations and so on, to see if I can reduce this. xake, please focus testing on vanilla if you can because it narrows it down and makes it more likely upstream will take it seriously. I tried to reproduce this on a virtual machine and could not. I think the issues requires heavy I/O --- even the upstream bug suggests that. This probably increases the likelihood of hitting some race condition. The fact that you can still interact with the drive but not the vg is important and further confirms that the issues is between the jbd and flushing on the dm. (In reply to comment #6) > xake, please focus testing on vanilla if you can because it narrows it down and > makes it more likely upstream will take it seriously. > Yeah, am currently building 2.6.37 from git to use. Also enabled a bounch of the debug options in the kernel with regards to locks to try to find out what ths is coming from, and when I hit this problem this morning while try to emergency sync the system (shutdown did not work probably as it tries to kill stuff that did not want to die) I got a message about a possible spinlock/deadlock and I will try to fetch that message over netconsole from vanilla kernel if possible. So I retried with tag v2.6.37 compiled from git. Started the same scenario as before (one kernel compilation per lv, 14 in total, while sync and pvmove). After having it running for some time, I had to abort, and did by "killall make" followed by "pvmove --abort". Now I am back in a situation where everything works except stuff like "touch test1/test" or other stuff touching any lv involved in the compilation, and no apparent activity on disks. Just hitting sysrq+s spawn the following interesting message: [16771.737229] SysRq : Emergency Sync [16771.737639] [16771.737640] ======================================================= [16771.737666] [ INFO: possible circular locking dependency detected ] [16771.737680] 2.6.37 #1 [16771.737687] ------------------------------------------------------- [16771.737700] kworker/0:0/25366 is trying to acquire lock: [16771.737711] (&type->s_umount_key#23){++++..}, at: [<ffffffff8118e1de>] iterate_supers+0x5e/0xd0 [16771.737749] [16771.737750] but task is already holding lock: [16771.737764] ((work)#2){+.+...}, at: [<ffffffff810b2926>] process_one_work+0x126/0x670 [16771.737799] [16771.737799] which lock already depends on the new lock. [16771.737800] [16771.737820] [16771.737821] the existing dependency chain (in reverse order) is: [16771.737838] [16771.737839] -> #3 ((work)#2){+.+...}: [16771.737870] [<ffffffff810d3018>] lock_acquire+0x98/0x1e0 [16771.737888] [<ffffffff810b297c>] process_one_work+0x17c/0x670 [16771.737906] [<ffffffff810b4b21>] worker_thread+0x161/0x340 [16771.737923] [<ffffffff810b9b36>] kthread+0x96/0xa0 [16771.737940] [<ffffffff8103fed4>] kernel_thread_helper+0x4/0x10 [16771.737959] [16771.737959] -> #2 (events){+.+.+.}: [16771.737987] [<ffffffff810d3018>] lock_acquire+0x98/0x1e0 [16771.738004] [<ffffffff810b30d9>] start_flush_work+0x119/0x1b0 [16771.738022] [<ffffffff810b3a00>] flush_work+0x20/0x40 [16771.738039] [<ffffffff810b5873>] schedule_on_each_cpu+0xe3/0x120 [16771.738057] [<ffffffff81146240>] lru_add_drain_all+0x10/0x20 [16771.738076] [<ffffffff811b8b48>] invalidate_bdev+0x28/0x40 [16771.738094] [<ffffffff8125bf22>] ext4_put_super+0x1a2/0x360 [16771.738113] [<ffffffff8118d0cd>] generic_shutdown_super+0x6d/0x100 [16771.738134] [<ffffffff8118d18c>] kill_block_super+0x2c/0x50 [16771.738152] [<ffffffff8118d4d5>] deactivate_locked_super+0x45/0x60 [16771.738169] [<ffffffff8118e075>] deactivate_super+0x45/0x60 [16771.738186] [<ffffffff811a8dc6>] mntput_no_expire+0x86/0xe0 [16771.738205] [<ffffffff811a98aa>] sys_umount+0x6a/0x360 [16771.738221] [<ffffffff8103f07b>] system_call_fastpath+0x16/0x1b [16771.738240] [16771.738241] -> #1 (&type->s_lock_key){+.+...}: [16771.738268] [<ffffffff810d3018>] lock_acquire+0x98/0x1e0 [16771.738285] [<ffffffff8184976b>] mutex_lock_nested+0x4b/0x380 [16771.738303] [<ffffffff8118cee2>] lock_super+0x22/0x30 [16771.738319] [<ffffffff8125c7b8>] ext4_remount+0x48/0x4a0 [16771.738338] [<ffffffff8118e431>] do_remount_sb+0x91/0x130 [16771.738496] [<ffffffff811aa5b2>] do_mount+0x4c2/0x7c0 [16771.738513] [<ffffffff811aabbb>] sys_mount+0x8b/0xe0 [16771.738529] [<ffffffff8103f07b>] system_call_fastpath+0x16/0x1b [16771.738547] [16771.738548] -> #0 (&type->s_umount_key#23){++++..}: [16771.738580] [<ffffffff810d2588>] __lock_acquire+0x1a78/0x1e10 [16771.738598] [<ffffffff810d3018>] lock_acquire+0x98/0x1e0 [16771.738615] [<ffffffff81849f2f>] down_read+0x2f/0x50 [16771.738631] [<ffffffff8118e1de>] iterate_supers+0x5e/0xd0 [16771.738648] [<ffffffff811b61bb>] sync_filesystems+0x1b/0x20 [16771.738666] [<ffffffff811b61d3>] do_sync_work+0x13/0x40 [16771.738682] [<ffffffff810b2987>] process_one_work+0x187/0x670 [16771.738699] [<ffffffff810b4b21>] worker_thread+0x161/0x340 [16771.738716] [<ffffffff810b9b36>] kthread+0x96/0xa0 [16771.738732] [<ffffffff8103fed4>] kernel_thread_helper+0x4/0x10 [16771.738750] [16771.738751] other info that might help us debug this: [16771.738752] [16771.739069] 2 locks held by kworker/0:0/25366: [16771.739370] #0: (events){+.+.+.}, at: [<ffffffff810b2926>] process_one_work+0x126/0x670 [16771.739736] #1: ((work)#2){+.+...}, at: [<ffffffff810b2926>] process_one_work+0x126/0x670 [16771.740165] [16771.740166] stack backtrace: [16771.741001] Pid: 25366, comm: kworker/0:0 Tainted: G W 2.6.37 #1 [16771.741426] Call Trace: [16771.741845] [<ffffffff810ce167>] print_circular_bug+0xe7/0xf0 [16771.742258] [<ffffffff810d2588>] __lock_acquire+0x1a78/0x1e10 [16771.742670] [<ffffffff810d3018>] lock_acquire+0x98/0x1e0 [16771.743073] [<ffffffff8118e1de>] ? iterate_supers+0x5e/0xd0 [16771.743476] [<ffffffff811b6290>] ? sync_one_sb+0x0/0x20 [16771.743869] [<ffffffff81849f2f>] down_read+0x2f/0x50 [16771.744262] [<ffffffff8118e1de>] ? iterate_supers+0x5e/0xd0 [16771.744656] [<ffffffff8184b6b0>] ? _raw_spin_unlock+0x30/0x60 [16771.745055] [<ffffffff8118e1de>] iterate_supers+0x5e/0xd0 [16771.745452] [<ffffffff811b61bb>] sync_filesystems+0x1b/0x20 [16771.745844] [<ffffffff811b61d3>] do_sync_work+0x13/0x40 [16771.746239] [<ffffffff810b2987>] process_one_work+0x187/0x670 [16771.746632] [<ffffffff810b2926>] ? process_one_work+0x126/0x670 [16771.747026] [<ffffffff811b61c0>] ? do_sync_work+0x0/0x40 [16771.747421] [<ffffffff810b4b21>] worker_thread+0x161/0x340 [16771.747813] [<ffffffff810b49c0>] ? worker_thread+0x0/0x340 [16771.748205] [<ffffffff810b9b36>] kthread+0x96/0xa0 [16771.748594] [<ffffffff8103fed4>] kernel_thread_helper+0x4/0x10 [16771.748985] [<ffffffff810804ff>] ? finish_task_switch+0x6f/0x100 [16771.749375] [<ffffffff8184be80>] ? restore_args+0x0/0x30 [16771.749762] [<ffffffff810b9aa0>] ? kthread+0x0/0xa0 [16771.750153] [<ffffffff8103fed0>] ? kernel_thread_helper+0x0/0x10 What do you want me to do? I can try bisect? As can take some time and poking between breakage I really want to look for other possible faster ways of hunting this bug if what is given here is not enought. I've reported this upstream. Xake, are you still encountering this? I haven't done anymore testing. (In reply to comment #10) > Xake, are you still encountering this? I haven't done anymore testing. I have encountered it with one of the 2.6.38-rc's, cannot remember if it was with 5 or 6, as I had to do some reorganization among my PVs. Have not had time to look into this for a while. Xake on teh gentoo-user list this issue was commented, this bit may be interesting for you: > Looks like an issue with heavy I/O, affecting the LVM layer trying to lock the filesystem. > But I wonder if he's not running into a known issue (which can easily be worked around) where pvmove has a memory-leak with the reporting. (eg. the bit that checks the progress every 5 seconds, reducing that to every 5 minutes significantly reduces that) However, I do believe this (mem-leak) was fixed. (In reply to comment #12) > Xake on teh gentoo-user list this issue was commented, this bit may be > interesting for you: > > Looks like an issue with heavy I/O, affecting the LVM layer trying to lock the filesystem. > > > But I wonder if he's not running into a known issue (which can easily be worked around) where pvmove has a memory-leak with the reporting. (eg. the bit that checks the progress every 5 seconds, reducing that to every 5 minutes significantly reduces that) However, I do believe this (mem-leak) was fixed. Links please. (In reply to comment #12) > > But I wonder if he's not running into a known issue (which can easily be worked around) where pvmove has a memory-leak with the reporting. (eg. the bit that checks the progress every 5 seconds, reducing that to every 5 minutes significantly reduces that) However, I do believe this (mem-leak) was fixed. I am pretty sure this is not it since for me it does not matter how long pvmove has been running. Issuing pvmove can hang before the first report is done, and I have some memories of sync hanging when I did not even have pvmove running on (just the move itself running in background). this version is not anymore in the tree, so if it is reproducible keep this as opened, otherwise please close. Please reopen with the correct version of package if any of you will reproduce this bug in the future. |