Summary: | sys-kernel/gentoo-sources-3.2.1-r2 rcu_sched detected stall on CPU 2 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Kevin Lyles <kevinlyles> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | anton.kochkov, mlspamcb |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=412551 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
emerge --info gentoo-sources
My kernel config |
Description
Kevin Lyles
2012-02-08 02:03:52 UTC
Created attachment 301201 [details]
emerge --info gentoo-sources
I'm not entirely sure this is the right place to submit this, but it seems that kernel.org only handles bug reports for the vanilla kernel. Please let me know if there's somewhere more appropriate and I'll head there instead. I'll attach the kernel config shortly. I'm happy to provide additional information, but I'm not sure what would be useful. Created attachment 301203 [details]
My kernel config
Please test with vanilla sources 3.2.9 and then bring to http://bugilla.kernel.org and post the url here. (In reply to comment #4) > Please test with vanilla sources 3.2.9 and then bring to > http://bugilla.kernel.org and post the url here. I've not been able to reproduce this on the vanilla kernel. I'm not sure if it's just rare or if it's specific to the gentoo patchset. I'll try gentoo-sources again and see what happens. I got another stall today (on gentoo-sources-3.2.12), after about 13.5 hours uptime. I had the same lack of symptoms as last time, and I was not doing anything to stress the system. The log is as follows: Apr 17 20:58:43 localhost kernel: Call Trace: Apr 17 20:58:43 localhost kernel: <IRQ> [<ffffffff810d54a2>] check_cpu_stall.clone.4 1+0x92/0xf0 Apr 17 20:58:43 localhost kernel: [<ffffffff810a45e0>] ? tick_nohz_handler+0xe0/0xe0 Apr 17 20:58:43 localhost kernel: [<ffffffff810d5923>] __rcu_pending+0x33/0x1c0 Apr 17 20:58:43 localhost kernel: [<ffffffff8113edfa>] ? __d_free+0x4a/0x70 Apr 17 20:58:43 localhost kernel: [<ffffffff810d5e33>] rcu_check_callbacks+0x103/0x1a 0 Apr 17 20:58:43 localhost kernel: [<ffffffff81085053>] update_process_times+0x43/0x80 Apr 17 20:58:43 localhost kernel: [<ffffffff810a463f>] tick_sched_timer+0x5f/0xb0 Apr 17 20:58:43 localhost kernel: [<ffffffff810985f5>] __run_hrtimer.clone.33+0x55/0x110 Apr 17 20:58:43 localhost kernel: [<ffffffff81098e3f>] hrtimer_interrupt+0xdf/0x210 Apr 17 20:58:43 localhost kernel: [<ffffffff8104f474>] smp_apic_timer_interrupt+0x64/0xa0 Apr 17 20:58:43 localhost kernel: [<ffffffff815d710b>] apic_timer_interrupt+0x6b/0x70 Apr 17 20:58:43 localhost kernel: <EOI> [<ffffffff81181936>] ? pid_revalidate+0x76/0xf0 Apr 17 20:58:43 localhost kernel: [<ffffffff8118192c>] ? pid_revalidate+0x6c/0xf0 Apr 17 20:58:43 localhost kernel: [<ffffffff81136be6>] do_lookup+0x236/0x3b0 Apr 17 20:58:43 localhost kernel: [<ffffffff81136e9f>] link_path_walk+0x13f/0x870 Apr 17 20:58:43 localhost kernel: [<ffffffff811394d6>] path_openat+0xb6/0x3f0 Apr 17 20:58:43 localhost kernel: [<ffffffff81139924>] do_filp_open+0x44/0xa0 Apr 17 20:58:43 localhost kernel: [<ffffffff8130f49d>] ? strncpy_from_user+0x2d/0x40 Apr 17 20:58:43 localhost kernel: [<ffffffff81145ca4>] ? alloc_fd+0xf4/0x150 Apr 17 20:58:43 localhost kernel: [<ffffffff8112987c>] do_sys_open+0xfc/0x1d0 Apr 17 20:58:43 localhost kernel: [<ffffffff811281a6>] ? filp_close+0x56/0x80 Apr 17 20:58:43 localhost kernel: [<ffffffff8112996b>] sys_open+0x1b/0x20 Apr 17 20:58:43 localhost kernel: [<ffffffff815d667b>] system_call_fastpath+0x16/0x1b (In reply to comment #4) > Please test with vanilla sources 3.2.9 and then bring to > http://bugilla.kernel.org and post the url here. It finally happend on vanilla-3.2.12 today. I added my output to the existing bug at https://bugzilla.kernel.org/show_bug.cgi?id=43028 I get the problem too. It is related to IPv6 on f.e. wireless. Within 30 minutes my system completely hangs when using IPv6 over a wifi connection. Here are some relevant Kernel patches: http://patchwork.ozlabs.org/patch/149020/ It is presen in kernels 3.1.0 an up seems to be solved in in Kernel 3.3.1 https://bugzilla.kernel.org/show_bug.cgi?id=42780 Some testing has been done in Arch linux: https://bugs.archlinux.org/task/26847 Please add this to applicable kernels. Kernel 3.2.12 is my current kernel and since 3.2.x it has got pregressively worse. (I saw the messages in 3.1.x kernels, but it gave no ill side effects). (In reply to comment #8) > I get the problem too. > It is related to IPv6 on f.e. wireless. > Within 30 minutes my system completely hangs when using IPv6 over a wifi > connection. > > Here are some relevant Kernel patches: > > http://patchwork.ozlabs.org/patch/149020/ > > > It is presen in kernels 3.1.0 an up seems to be solved in in Kernel 3.3.1 > https://bugzilla.kernel.org/show_bug.cgi?id=42780 > > Some testing has been done in Arch linux: > https://bugs.archlinux.org/task/26847 > > > Please add this to applicable kernels. I think we have different bugs, or at least very different symptoms. My stalls have no ill effects (that I can see) even on 3.2.12, and I'm not using wireless. I have IPv6 enabled via tunneling, but it only gets used for a few sites. The biggest reason I think they're different is that my stalls are all for 0 jiffies and seem to be one-time events, while yours are repeated and progressively longer. The kernel.org bugs we each linked to follow the same pattern. Should I create a new bug? (In reply to comment #11) > Should I create a new bug? I would, yes. If it turns out later it is the same issue, they'll just mark one as a duplicate of the other. Done: Bug:413727 Is this still an issue with 3.5.2 or greater (as available)? (In reply to comment #14) > Is this still an issue with 3.5.2 or greater (as available)? It's hard to tell -- I've been up on 3.5.3 for about 4 days without it happening, but it's been that long with some of the versions it did happen with, too. I'll post again as soon as I see one or by Friday next week to say one way or the other. I've yet to see it after 12.5 days of uptime, so it's at least a lot harder to trigger now. I'm comfortable with closing this if you are. Thanks, will do. *** Bug 412551 has been marked as a duplicate of this bug. *** |