Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 453628

Summary: sys-kernel/gentoo-sources-3.6.11 panics caused by smp_irq_move_cleanup_interrupt
Product: Gentoo Linux Reporter: Erik Quaeghebeur <gentoo>
Component: [OLD] Core systemAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED TEST-REQUEST    
Severity: normal CC: kripton, tomwij
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
URL: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=94777fc51b3ad85ff9f705ddf7cdd0eb3bbad5a6
Whiteboard: linux-3.7
Package list:
Runtime testing required: ---

Description Erik Quaeghebeur 2013-01-23 07:56:16 UTC
I'm getting a kernel panic from time to time (about weekly on a daily used office laptop). Two shots of the panic messages can be found at https://www.dropbox.com/sh/lqfsr0b0n6p67vd/evR8ZSJlcA . The first line is:

IP: [<ffffffff8182554d>] smp_irq_move_cleanup_interrupt+0xad/0x130


Possibly related, but I'm unsure:

https://patchwork.kernel.org/patch/1600651/
https://bugzilla.redhat.com/show_bug.cgi?id=869341
http://permalink.gmane.org/gmane.linux.redhat.fedora.extras.cvs/890603
Comment 1 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-01-23 08:38:17 UTC
Looks like that patch should fix it from what I read.

Checking into the latest kernel sources, the fix seems present:

>54168ed7f arch/x86/kernel/io_apic_32.c
>(Ingo Molnar      2008-08-20 09:07:45 +0200 2247) cfg = irq_cfg(irq);
>94777fc51 arch/x86/kernel/apic/io_apic.c
>(Dimitri Sivanich 2012-10-16 07:50:21 -0500 2248) if (!cfg)
>94777fc51 arch/x86/kernel/apic/io_apic.c
>(Dimitri Sivanich 2012-10-16 07:50:21 -0500 2249)     continue;
>94777fc51 arch/x86/kernel/apic/io_apic.c
>(Dimitri Sivanich 2012-10-16 07:50:21 -0500 2250) 
>239007b84 arch/x86/kernel/apic/io_apic.c
>(Thomas Gleixner  2009-11-17 16:46:45 +0100 2251) raw_spin_lock(&desc->lock);

This was introduced with the following commit:

commit 94777fc51b3ad85ff9f705ddf7cdd0eb3bbad5a6
Author: Dimitri Sivanich <sivanich@sgi.com>
Date:   Tue Oct 16 07:50:21 2012 -0500

>    x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
>    
>    Posting this patch to fix an issue concerning sparse irq's that
>    I raised a while back.  There was discussion about adding
>    refcounting to sparse irqs (to fix other potential race
>    conditions), but that does not appear to have been addressed
>    yet.  This covers the only issue of this type that I've
>    encountered in this area.
>    
>    A NULL pointer dereference can occur in
>    smp_irq_move_cleanup_interrupt() if we haven't yet setup the
>    irq_cfg pointer in the irq_desc.irq_data.chip_data.
>    
>    In create_irq_nr() there is a window where we have set
>    vector_irq in __assign_irq_vector(), but not yet called
>    irq_set_chip_data() to set the irq_cfg pointer.
>    
>    Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during
>    this time, smp_irq_move_cleanup_interrupt() will attempt to
>    process the aforementioned irq, but panic when accessing
>    irq_cfg.
>    
>    Only continue processing the irq if irq_cfg is non-NULL.
>    
>    Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
>    Cc: Suresh Siddha <suresh.b.siddha@intel.com>
>    Cc: Joerg Roedel <joerg.roedel@amd.com>
>    Cc: Yinghai Lu <yinghai@kernel.org>
>    Cc: Alexander Gordeev <agordeev@redhat.com>
>    Link: http://lkml.kernel.org/r/20121016125021.GA22935@sgi.com
>    Signed-off-by: Ingo Molnar <mingo@kernel.org>

Which was committed between 3.7_rc2 and 3.7_rc3 so any 3.7 kernel should fix this for you. You can try unstable kernel versions using the following command:

`echo "sys-kernel/gentoo-sources" > /etc/portage/package.accept_keywords`

Or you can try to apply the patch by downloading it and running the following command:

`cd /usr/src/linux && patch -p1 < /path/to/patch`
Comment 2 Erik Quaeghebeur 2013-01-24 09:32:59 UTC
(In reply to comment #1)
> Looks like that patch should fix it from what I read.
> 
> [...]
> 
> Which was committed between 3.7_rc2 and 3.7_rc3 so any 3.7 kernel should fix
> this for you.

Ok, I'm running =sys-kernel/gentoo-sources-3.7.3.

Perhaps marking this 'Resolved Test-Request' is appropriate, as my test is rather open-ended (I don't know how to trigger the panic). In case I get the kernel panic again, I'll come back to this bug and we can reopen it.