Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 402625 - sys-kernel/gentoo-sources-3.2.1-r2 rcu_sched detected stall on CPU 2
Summary: sys-kernel/gentoo-sources-3.2.1-r2 rcu_sched detected stall on CPU 2
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
: 412551 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-02-08 02:03 UTC by Kevin Lyles
Modified: 2013-04-13 22:39 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info gentoo-sources (emerge info,5.49 KB, text/plain)
2012-02-08 02:05 UTC, Kevin Lyles
Details
My kernel config (kernel config,52.97 KB, text/plain)
2012-02-08 02:11 UTC, Kevin Lyles
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Lyles 2012-02-08 02:03:52 UTC
I got a kernel message about rcu_sched detecting a stall.  There were no apparent ill effects from said stall, but I figured it should be reported since the kernel complained about it.  The system's been up on this kernel for about 3 days 9 hours, and I believe this is the first time it's happened.

Reproducible: Didn't try

Steps to Reproduce:
Run the 3.2.1-gentoo-r2 kernel long enough

Actual Results:  
Kernel error message

Expected Results:  
No error

The full kernel message:
INFO: rcu_sched detected stall on CPU 2 (t=0 jiffies)
Pid: 0, comm: swapper/2 Tainted: P           O 3.2.1-gentoo-r2 #1
Call Trace:
 <IRQ>  [<ffffffff810d50d2>] check_cpu_stall.clone.41+0x92/0xf0
 [<ffffffff810d5553>] __rcu_pending+0x33/0x1c0
 [<ffffffff810d59fd>] rcu_check_callbacks+0x9d/0x1a0
 [<ffffffff81084dd3>] update_process_times+0x43/0x80
 [<ffffffff810a43bf>] tick_sched_timer+0x5f/0xb0
 [<ffffffff81098375>] __run_hrtimer.clone.33+0x55/0x110
 [<ffffffff81098bbf>] hrtimer_interrupt+0xdf/0x210
 [<ffffffff8104f2a4>] smp_apic_timer_interrupt+0x64/0xa0
 [<ffffffff815d4a4b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffff8137f036>] ? acpi_safe_halt+0x22/0x35
 [<ffffffff8137f030>] ? acpi_safe_halt+0x1c/0x35
 [<ffffffff8137f064>] acpi_idle_do_entry+0x1b/0x2b
 [<ffffffff8137f420>] acpi_idle_enter_c1+0x59/0xb3
 [<ffffffff8145bb32>] cpuidle_idle_call+0x92/0xe0
 [<ffffffff8109a2c1>] ? atomic_notifier_call_chain+0x11/0x20
 [<ffffffff810322b5>] cpu_idle+0xc5/0x110
 [<ffffffff815ccfb8>] start_secondary+0xe7/0xee
Comment 1 Kevin Lyles 2012-02-08 02:05:40 UTC
Created attachment 301201 [details]
emerge --info gentoo-sources
Comment 2 Kevin Lyles 2012-02-08 02:08:49 UTC
I'm not entirely sure this is the right place to submit this, but it seems that kernel.org only handles bug reports for the vanilla kernel.  Please let me know if there's somewhere more appropriate and I'll head there instead.

I'll attach the kernel config shortly.  I'm happy to provide additional information, but I'm not sure what would be useful.
Comment 3 Kevin Lyles 2012-02-08 02:11:00 UTC
Created attachment 301203 [details]
My kernel config
Comment 4 Mike Pagano gentoo-dev 2012-03-04 21:35:39 UTC
Please test with vanilla sources 3.2.9 and then bring to http://bugilla.kernel.org and post the url here.
Comment 5 Kevin Lyles 2012-04-09 01:57:23 UTC
(In reply to comment #4)
> Please test with vanilla sources 3.2.9 and then bring to
> http://bugilla.kernel.org and post the url here.

I've not been able to reproduce this on the vanilla kernel.  I'm not sure if it's just rare or if it's specific to the gentoo patchset.

I'll try gentoo-sources again and see what happens.
Comment 6 Kevin Lyles 2012-04-18 02:09:37 UTC
I got another stall today (on gentoo-sources-3.2.12), after about 13.5 hours uptime.  I had the same lack of symptoms as last time, and I was not doing anything to stress the system.  The log is as follows:

Apr 17 20:58:43 localhost kernel: Call Trace:
Apr 17 20:58:43 localhost kernel: <IRQ>  [<ffffffff810d54a2>] check_cpu_stall.clone.4
1+0x92/0xf0
Apr 17 20:58:43 localhost kernel: [<ffffffff810a45e0>] ? tick_nohz_handler+0xe0/0xe0
Apr 17 20:58:43 localhost kernel: [<ffffffff810d5923>] __rcu_pending+0x33/0x1c0
Apr 17 20:58:43 localhost kernel: [<ffffffff8113edfa>] ? __d_free+0x4a/0x70
Apr 17 20:58:43 localhost kernel: [<ffffffff810d5e33>] rcu_check_callbacks+0x103/0x1a
0
Apr 17 20:58:43 localhost kernel: [<ffffffff81085053>] update_process_times+0x43/0x80
Apr 17 20:58:43 localhost kernel: [<ffffffff810a463f>] tick_sched_timer+0x5f/0xb0
Apr 17 20:58:43 localhost kernel: [<ffffffff810985f5>] __run_hrtimer.clone.33+0x55/0x110
Apr 17 20:58:43 localhost kernel: [<ffffffff81098e3f>] hrtimer_interrupt+0xdf/0x210
Apr 17 20:58:43 localhost kernel: [<ffffffff8104f474>] smp_apic_timer_interrupt+0x64/0xa0
Apr 17 20:58:43 localhost kernel: [<ffffffff815d710b>] apic_timer_interrupt+0x6b/0x70
Apr 17 20:58:43 localhost kernel: <EOI>  [<ffffffff81181936>] ? pid_revalidate+0x76/0xf0
Apr 17 20:58:43 localhost kernel: [<ffffffff8118192c>] ? pid_revalidate+0x6c/0xf0
Apr 17 20:58:43 localhost kernel: [<ffffffff81136be6>] do_lookup+0x236/0x3b0
Apr 17 20:58:43 localhost kernel: [<ffffffff81136e9f>] link_path_walk+0x13f/0x870
Apr 17 20:58:43 localhost kernel: [<ffffffff811394d6>] path_openat+0xb6/0x3f0
Apr 17 20:58:43 localhost kernel: [<ffffffff81139924>] do_filp_open+0x44/0xa0
Apr 17 20:58:43 localhost kernel: [<ffffffff8130f49d>] ? strncpy_from_user+0x2d/0x40
Apr 17 20:58:43 localhost kernel: [<ffffffff81145ca4>] ? alloc_fd+0xf4/0x150
Apr 17 20:58:43 localhost kernel: [<ffffffff8112987c>] do_sys_open+0xfc/0x1d0
Apr 17 20:58:43 localhost kernel: [<ffffffff811281a6>] ? filp_close+0x56/0x80
Apr 17 20:58:43 localhost kernel: [<ffffffff8112996b>] sys_open+0x1b/0x20
Apr 17 20:58:43 localhost kernel: [<ffffffff815d667b>] system_call_fastpath+0x16/0x1b
Comment 7 Kevin Lyles 2012-04-22 22:26:37 UTC
(In reply to comment #4)
> Please test with vanilla sources 3.2.9 and then bring to
> http://bugilla.kernel.org and post the url here.

It finally happend on vanilla-3.2.12 today.  I added my output to the existing bug at https://bugzilla.kernel.org/show_bug.cgi?id=43028
Comment 8 Nico Baggus 2012-04-24 06:54:11 UTC
I get the problem too.
It is related to IPv6 on f.e. wireless.
Within 30 minutes my system completely hangs when using IPv6 over a wifi connection.

Here are some relevant Kernel patches:

http://patchwork.ozlabs.org/patch/149020/


It is presen in kernels 3.1.0 an up seems to be solved in in Kernel 3.3.1
https://bugzilla.kernel.org/show_bug.cgi?id=42780

Some testing has been done in Arch linux:
https://bugs.archlinux.org/task/26847


Please add this to applicable kernels.
Comment 9 Nico Baggus 2012-04-24 06:56:49 UTC
Kernel 3.2.12 is my current kernel and since 3.2.x it has got pregressively worse.
(I saw the messages in 3.1.x kernels, but it gave no ill side effects).
Comment 10 Kevin Lyles 2012-04-24 11:46:13 UTC
(In reply to comment #8)
> I get the problem too.
> It is related to IPv6 on f.e. wireless.
> Within 30 minutes my system completely hangs when using IPv6 over a wifi
> connection.
> 
> Here are some relevant Kernel patches:
> 
> http://patchwork.ozlabs.org/patch/149020/
> 
> 
> It is presen in kernels 3.1.0 an up seems to be solved in in Kernel 3.3.1
> https://bugzilla.kernel.org/show_bug.cgi?id=42780
> 
> Some testing has been done in Arch linux:
> https://bugs.archlinux.org/task/26847
> 
> 
> Please add this to applicable kernels.

I think we have different bugs, or at least very different symptoms.  My stalls have no ill effects (that I can see) even on 3.2.12, and I'm not using wireless.  I have IPv6 enabled via tunneling, but it only gets used for a few sites.

The biggest reason I think they're different is that my stalls are all for 0 jiffies and seem to be one-time events, while yours are repeated and progressively longer.  The kernel.org bugs we each linked to follow the same pattern.
Comment 11 Nico Baggus 2012-04-24 15:27:04 UTC
Should I create a new bug?
Comment 12 Kevin Lyles 2012-05-02 23:03:20 UTC
(In reply to comment #11)
> Should I create a new bug?

I would, yes.  If it turns out later it is the same issue, they'll just mark one as a duplicate of the other.
Comment 13 Nico Baggus 2012-05-03 00:38:43 UTC
Done:
Bug:413727
Comment 14 Mike Pagano gentoo-dev 2012-08-23 14:06:28 UTC
Is this still an issue with 3.5.2 or greater (as available)?
Comment 15 Kevin Lyles 2012-09-13 23:02:31 UTC
(In reply to comment #14)
> Is this still an issue with 3.5.2 or greater (as available)?

It's hard to tell -- I've been up on 3.5.3 for about 4 days without it happening, but it's been that long with some of the versions it did happen with, too.  I'll post again as soon as I see one or by Friday next week to say one way or the other.
Comment 16 Kevin Lyles 2012-09-22 03:54:57 UTC
I've yet to see it after 12.5 days of uptime, so it's at least a lot harder to trigger now.  I'm comfortable with closing this if you are.
Comment 17 Mike Pagano gentoo-dev 2012-10-10 12:48:11 UTC
Thanks, will do.
Comment 18 Anthony Basile gentoo-dev 2013-04-13 22:39:59 UTC
*** Bug 412551 has been marked as a duplicate of this bug. ***