md devices (raid0 and raid5) hang quite regularly and all the processes accessing the corresponding mount points are in D state until reboot. This has happened on i386/raid0 and amd64/raid5 with different machines. On both machines, 2.6.20-r* worked without any problems. Is this known? Is there any patch for it? There is a similar problem reported in 2.6.23.1 http://article.gmane.org/gmane.linux.raid/17131
Created attachment 135286 [details] sysreq-T on raid5 hang
The patch referred to on the mailing list was committed tonight (11/5) and as of yet is not in a git snapshot. I fixed the patch to apply to gentoo-sources-2.6.23-r1. Could you apply this patch, recompile and install your kernel and let me know if it fixes your issue. apply this patch by: 1. go to /usr/src/linux or whereever your linux sources reside 2. type: patch -p1 < fix-misapplied-biofill-op.patch 3. rebuild and install your kernel as normal
Created attachment 135287 [details, diff] fix-misapplied-biofill-op.patch
This patch is for 2.6.23.x. I would rather solve the issue with 2.6.22.x and I do not see how this patch could be easily backported. I have also mentioned raid0, but this is a mistake. There is no problem with raid0. Both machines use raid5. I will test with 2.6.23.x and report.
I have tested 2.6.23-gentoo-r1 and biofill patch. It does not solve the problem.
Ok, let's start by determining if the latest patching has addressed the problem. I have just committed git-sources-2.6.24_rc1-r15 to the tree. Once it hits the mirrors, can you please test with that kernel? This snapshot has the latter patches for raid5.
I have tested git-sources-2.6.24_rc1-r15. This one seems to work, at least after writing 200k files, while 22 or 23 stopped at few k files. md device is also being reconstructed to put a bit more stress on. I will continue to run the checks to be sure...
It does not work so well. Now the behavior is different. After a couple of hours, md0_raid5 thread is at 100% cpu with plenty of messages (traces are different): BUG: soft lockup - CPU#0 stuck for 11s! [md0_raid5:4270] Pid: 4270, comm: md0_raid5 Not tainted (2.6.24-rc1-git15 #1) EIP: 0060:[<f88b212e>] EFLAGS: 00000202 CPU: 0 EIP is at xor_sse_5+0x12e/0x3a8 [xor] EAX: 0000000e EBX: c4f05200 ECX: c4f02200 EDX: c4f07200 ESI: c4f04200 EDI: c4f03200 EBP: c498fcb0 ESP: c498fcac DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 80050033 CR2: b7ef7000 CR3: 0056c000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<f88b2a59>] xor_blocks+0x7d/0x85 [xor] [<f897e125>] async_xor+0x125/0x1a2 [async_xor] [<f897e1f4>] async_xor_zero_sum+0x52/0xba [async_xor] [<f9983d75>] ops_run_check+0x92/0xc7 [raid456] [<c013dc02>] lock_release_holdtime+0x25/0x43 [<f9984c29>] handle_stripe5+0xe7f/0x10df [raid456] [<f9983e2c>] handle_stripe5+0x82/0x10df [raid456] [<c013dc02>] lock_release_holdtime+0x25/0x43 [<c013e58a>] __lock_acquire+0x3a6/0x609 [<f998648d>] handle_stripe+0xc08/0xc36 [raid456] [<c013e58a>] __lock_acquire+0x3a6/0x609 [<c013dc02>] lock_release_holdtime+0x25/0x43 [<c013dc02>] lock_release_holdtime+0x25/0x43 [<f99867df>] raid5d+0x324/0x346 [raid456] [<f99867ec>] raid5d+0x331/0x346 [raid456] [<c031aa53>] md_thread+0xb4/0xd6 [<c03a0c4b>] _spin_lock_irqsave+0x54/0x5d [<c031aa5e>] md_thread+0xbf/0xd6 [<c0136aba>] autoremove_wake_function+0x0/0x33 [<c031a99f>] md_thread+0x0/0xd6 [<c0136a05>] kthread+0x38/0x5f [<c01369cd>] kthread+0x0/0x5f [<c0104ab3>] kernel_thread_helper+0x7/0x10 =======================
please attach your .config and the output of emerge --info
Created attachment 135373 [details] emerge --info
Created attachment 135375 [details] .config
Can you perform the following from your kernel source directory and post the results here: Using the same gcc version and kernel from the trace. make CONFIG_DEBUG_INFO=y crypto/xor.o gdb crypto/xor.o list *xor_sse_5+0x12e
(gdb) list *xor_sse_5+0x12e 0x12e is in xor_sse_5 (include/asm/xor_32.h:783). 778 because we modify p4 and p5 there, but we can't mark them 779 as read/write, otherwise we'd overflow the 10-asm-operands 780 limit of GCC < 3.1. */ 781 __asm__ ("" : "+r" (p4), "+r" (p5)); 782 783 __asm__ __volatile__ ( 784 #undef BLOCK 785 #define BLOCK(i) \ 786 PF1(i) \ 787 PF1(i+2) \
I have tried once more with git kernel to write. This time the processes hung like with 2.6.22, for example: Nov 7 00:00:46 f9pc18 pdflush D c2c22a98 0 183 2 Nov 7 00:00:46 f9pc18 00155589 00000086 00000145 c2c22a98 00000002 c3fdac78 c0520ed8 c056bb00 Nov 7 00:00:46 f9pc18 c056bb00 c30d0f40 c30d1084 c2c6db00 00000000 f89612bd 00000246 f89612b3 Nov 7 00:00:46 f9pc18 000000ff 00000000 00000000 00000145 c3b9050c c3b90400 c3b904ac c3b904b4 Nov 7 00:00:46 f9pc18 Call Trace: Nov 7 00:00:46 f9pc18 [<f89612bd>] unplug_slaves+0xe0/0xfb [raid456] Nov 7 00:00:46 f9pc18 [<f89612b3>] unplug_slaves+0xd6/0xfb [raid456] Nov 7 00:00:46 f9pc18 [<f896205a>] get_active_stripe+0x1e6/0x432 [raid456] Nov 7 00:00:46 f9pc18 [<c013dc02>] lock_release_holdtime+0x25/0x43 Nov 7 00:00:46 f9pc18 [<c03a0c4b>] _spin_lock_irqsave+0x54/0x5d All the other D processes hang in the same place (unplug_slaves) It it helps: (gdb) list *unplug_slaves+0xe0 0x2bd is in unplug_slaves (drivers/md/raid5.c:3197). 3192 rdev_dec_pending(rdev, mddev); 3193 rcu_read_lock(); 3194 } 3195 } 3196 rcu_read_unlock(); 3197 } 3198 3199 static void raid5_unplug_device(struct request_queue *q) 3200 { 3201 mddev_t *mddev = q->queuedata;
Created attachment 135535 [details, diff] clearing of biofill operations patch Some on the mailing list reported this as fixing the issue. Can you apply to a clean gentoo-sources-2.6.23-gentoo-r1 and post the results.
So far, so good. Within 5h of writing there are no problems. I will fill ~600GB (1M files) in a day or so and report if something goes wrong. If this patch is OK, is there a possibility to backport it to 2.6.22? This kernel is still widely used.
Well, I am always too quick. After 6h, 120k files, 170GB written, md0_raid5 is at 100% cpu and all the other md-accessing processes are in the D state Nov 9 14:00:30 f9pc18 ======================= Nov 9 14:00:30 f9pc18 md0_raid5 R running 0 4126 2 Nov 9 14:00:30 f9pc18 xfsbufd S c2e48270 0 4760 2 Nov 9 14:00:30 f9pc18 e333bf8c 00000086 00000046 c2e48270 00000001 c059cd10 c050eddc c0559e80 Nov 9 14:00:30 f9pc18 c0559e80 c2e48270 c2e483b0 c2c6de80 00000000 00000046 c059cd00 00000296 Nov 9 14:00:30 f9pc18 c059cd00 f740b360 00000296 e333bf9c c059cd00 c012c76e 00000000 00000296 Nov 9 14:00:30 f9pc18 Call Trace: Nov 9 14:00:30 f9pc18 [<c012c76e>] __mod_timer+0x92/0x9c Nov 9 14:00:30 f9pc18 [<c039734f>] schedule_timeout+0x70/0x8d Nov 9 14:00:30 f9pc18 [<c0211e04>] xfs_buf_delwri_split+0xc5/0xcf Nov 9 14:00:30 f9pc18 [<c012c587>] process_timeout+0x0/0x5 Nov 9 14:00:30 f9pc18 [<c0211fac>] xfsbufd+0x58/0xec Nov 9 14:00:30 f9pc18 [<c0211f54>] xfsbufd+0x0/0xec Nov 9 14:00:30 f9pc18 [<c0134f31>] kthread+0x38/0x5f Nov 9 14:00:30 f9pc18 [<c0134ef9>] kthread+0x0/0x5f Nov 9 14:00:30 f9pc18 [<c0104a5f>] kernel_thread_helper+0x7/0x10 Nov 9 14:00:30 f9pc18 =======================
One more annoying thing. After the reset, xfs_repair on md device oopsed right at the beginning in something like get_next_stripe. (I did not catch the log). md can still be mounted and used, but I would say it is not really safe. There might be some corruption bug in raid5 or xfs code...
The thread at http://marc.info/?l=linux-raid&m=119502458615538&w=2 indicates two upcoming patches to fix a problem which appears to be similar to yours. They indicate that the problem does not occur in 2.6.22. Not sure if you tested that kernel.
maybe related: http://bugzilla.kernel.org/show_bug.cgi?id=9419
I have found today some time to test gentoo-sources-2.6.23-r3. raid5 still hangs...
Created attachment 137635 [details] dmesg with sysreq-t
The latest vanilla kernel rc release contains two patches which might be related to your problem. One is a biofill patch and the other is to fix an unending write sequence. Could you please test with vanilla-sources-2.6.24_rc5 and post the result.
Please reopen when you've had a chance to test with the latest development kernel as requested in comment #23.
I did have a chance today to check vanilla-sources-2.6.24_rc8. The problem is still there, but it occurs much latter (after 350k files instead of 100k).
Created attachment 141735 [details] dmesg with sysreq-t
I have found the fix from Neil Brown on http://thread.gmane.org/gmane.linux.raid/17738 so I will try with that and report the results.
Neil's patches work. No troubles for 1TB, 1M files. I guess we have to wait for 2.6.24.1.
It looks like these patches have made the mainline tree. They should be in git-sources-2.6.24-r16 which does not exist yet. But as soon I as see git snapshots, I commit the ebuild. So when you have a moment can you test git-sources-2.6.24-r16 when its available and post the results. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1ec4a9398dc05061b6258061676fede733458893 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c5d79adba7ced41d7ac097c2ab74759d10522dd5 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=29ac4aa3fc68a86279aca50f20df4d614de2e204 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6ed3003c19a96fe18edf8179c4be6fe14abbebbc
I tried (remotely) to boot git-sources-2.6.24-r16 but it panicked. I will not be able to do the tests for 2 weeks due to absence.
I did some tests with various kernels. gentoo-sources-2.6.24-r3 still does not work properly. git-sources-2.6.25_rc3-r4 works OK, and as I have seen, the md code is the same as in vanilla-sources-2.6.25_rc3. So it seems that 2.6.25 will be OK, although it would be nice if md patches could be backported to 2.6.24 or maybe even 2.6.23.
Can you test with the latest gentoo-sources, which is 2.6.25-r1 as of this writing.
Please reopen if there is still a problem with the latest 2.6.25 kernel.
Sorry, I forgot to report. 2.6.25 works fine. The heavily loaded server with 2.6.25-gentoo-r1 is up 9 days without a single problem.