Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 296319 - sys-kernel/vanilla-sources-2.6.31.* bug in MSI/IO-APIC
Summary: sys-kernel/vanilla-sources-2.6.31.* bug in MSI/IO-APIC
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://marc.info/?l=linux-scsi&m=1263...
Whiteboard: linux-2.6.31,linux-2.6.32
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-09 14:16 UTC by Oleg Gawriloff
Modified: 2010-02-05 10:20 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg from 2.6.32.2 (2.6.32.2,35.39 KB, text/plain)
2009-12-30 16:04 UTC, Oleg Gawriloff
Details
dmesg 2.6.30.10 (2.6.30.10,35.40 KB, text/plain)
2009-12-30 16:08 UTC, Oleg Gawriloff
Details
printk debug patch (patch,5.22 KB, patch)
2010-01-12 22:07 UTC, George Kadianakis (RETIRED)
Details | Diff
dmesg from 2.6.32.3 with patched lpfc_init (dmesg-printk.txt,36.14 KB, text/plain)
2010-01-13 10:17 UTC, Oleg Gawriloff
Details
enables MSI-X/MSI interrupts in lpfc (patch,726 bytes, patch)
2010-01-14 00:22 UTC, George Kadianakis (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Gawriloff 2009-12-09 14:16:10 UTC
When using any vanilla prior to 2.6.31.* (checked on 2.6.30.*, 2.6.29.*) our lpfc/qla2xxx FC card is sitting on MSI interrupt handler:
            CPU0       CPU1       CPU2       CPU3
   0:        218          0          0          0   IO-APIC-edge      timer
   1:          0          0          0          0   IO-APIC-edge      i8042
   2:          0          0          0          0    XT-PIC-XT        cascade
  14:          0          0          0          0   IO-APIC-edge      ide0
  16:         21    5857196  402342061   85258815   IO-APIC-fasteoi   eth0
  17:         10          6          7    3907371   IO-APIC-fasteoi
  19:        561    4505353        557    4167122   IO-APIC-fasteoi   ahci
  23:        641        398        386        507   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5
  54:          0          0          0          0   PCI-MSI-edge      pciehp
  55:          0          0          0          0   PCI-MSI-edge      pciehp
  57:    9741654    9266266    8019744    9105905   PCI-MSI-edge      lpfc
 NMI:          0          0          0          0   Non-maskable interrupts
 LOC: 1183406355 1183406217 1183405963 1183406031   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 RES:     492712     399006     397685     331608   Rescheduling interrupts
 CAL:        249        361        357        350   Function call interrupts
 TLB:     462643     494800     517256     624827   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 ERR:          0
 MIS:          0

With 2.6.31.6 its on IO-APIC sharing it with eth0 which is unacceptable.
            CPU0       CPU1       CPU2       CPU3
   0:         36          0          0          0   IO-APIC-edge      timer
   1:          0          0          0          0   IO-APIC-edge      i8042
   2:          0          0          0          0    XT-PIC-XT        cascade
  14:          0          0          0          0   IO-APIC-edge      ide0
  16:       1101        890       1093        881   IO-APIC-fasteoi   lpfc, eth0
  19:        483       2769        490       2778   IO-APIC-fasteoi   ahci
  23:          9         13         10         11   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb                             4, uhci_hcd:usb5
 NMI:          0          0          0          0   Non-maskable interrupts
 LOC:       6089       5077       8483       5303   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 CNT:          0          0          0          0   Performance counter interrupts
 PND:          0          0          0          0   Performance pending work
 RES:        402        468        498        598   Rescheduling interrupts
 CAL:        226        340        368        344   Function call interrupts
 TLB:        834       1235       1523       1753   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          1          1          1          1   Machine check polls
 ERR:          0
 MIS:          0


Reproducible: Always
Comment 1 George Kadianakis (RETIRED) gentoo-dev 2009-12-30 15:57:51 UTC
Please attach the dmesg output of a 2.6.31* and a 2.6.30* kernel.
Comment 2 Oleg Gawriloff 2009-12-30 16:04:16 UTC
Created attachment 214630 [details]
dmesg from 2.6.32.2
Comment 3 Oleg Gawriloff 2009-12-30 16:08:55 UTC
Created attachment 214632 [details]
dmesg 2.6.30.10
Comment 4 Oleg Gawriloff 2009-12-30 16:09:31 UTC
(In reply to comment #1)

Problem persist in 2.6.32.* too.
Comment 5 George Kadianakis (RETIRED) gentoo-dev 2009-12-30 16:36:24 UTC
Use "lspci -vt" to find your device's bridge and paste us the contents of
/sys/bus/pci/devices/<device>/msi_bus.


Comment 6 Oleg Gawriloff 2009-12-30 18:36:13 UTC
(In reply to comment #5)
> Use "lspci -vt" to find your device's bridge and paste us the contents of
> /sys/bus/pci/devices/<device>/msi_bus.
On 2.6.30.10:
-[0000:00]-+-00.0  Intel Corporation 5000X Chipset Memory Controller Hub
           +-04.0-[10]----00.0  Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter

barzog@albatros2 ~ $ cat /sys/bus/pci/devices/0000\:00\:04.0/msi_bus
1
10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)
        Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 57
        Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
        Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 2000 [size=256]
        [virtual] Expansion ROM at c2000000 [disabled] [size=256K]
        Capabilities: [58] Power Management version 2
        Capabilities: [60] MSI: Enable+ Count=1/16 Maskable- 64bit+
        Capabilities: [44] Express Endpoint, MSI 00
        Kernel driver in use: lpfc
        Kernel modules: lpfc


On 2.6.32.2:
barzog@albatros2 ~ $ cat /sys/bus/pci/devices/0000\:00\:04.0/msi_bus
1

10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)
        Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
        Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 2000 [size=256]
        [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
        Capabilities: [58] Power Management version 2
        Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [44] Express Endpoint, MSI 00
        Kernel driver in use: lpfc
        Kernel modules: lpfc

I've checked on another server where we have some tg3 cards on MSI, with 2.6.31.[ and above they too use IOAPIC.
Comment 7 George Kadianakis (RETIRED) gentoo-dev 2010-01-12 14:30:53 UTC
Hey,

wanna try using the noapic and nolapic kernel parameters to see if it's a race condition problem?
Comment 8 Oleg Gawriloff 2010-01-12 17:45:22 UTC
(In reply to comment #7)

> wanna try using the noapic and nolapic kernel parameters to see if it's a race
> condition problem?

With noapic and 2.6.32.2:
   7:       9136          0          0          0    XT-PIC-XT        lpfc, eth0
10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)
        Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 7
        Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
        Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 2000 [size=256]
        [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
        Capabilities: [58] Power Management version 2
        Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [44] Express Endpoint, MSI 00
        Kernel driver in use: lpfc
        Kernel modules: lpfc

With nolapic and 2.6.32.2:
   7:       8148    XT-PIC-XT        lpfc, eth0
10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)
        Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 7
        Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
        Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 2000 [size=256]
        [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
        Capabilities: [58] Power Management version 2
        Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [44] Express Endpoint, MSI 00
        Kernel driver in use: lpfc
        Kernel modules: lpfc

With nolapic and noapic and 2.6.32.2:
   7:      62707    XT-PIC-XT        lpfc, eth0
10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)
        Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 7
        Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
        Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 2000 [size=256]
        [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
        Capabilities: [58] Power Management version 2
        Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [44] Express Endpoint, MSI 00
        Kernel driver in use: lpfc
        Kernel modules: lpfc

With new kernel 2.6.32.3 (comparing with 2.6.32.2) no changes:
  16:        608        502        602        511   IO-APIC-fasteoi   lpfc, eth0
10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)
        Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
        Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 2000 [size=256]
        [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
        Capabilities: [58] Power Management version 2
        Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [44] Express Endpoint, MSI 00
        Kernel driver in use: lpfc
        Kernel modules: lpfc
Comment 9 George Kadianakis (RETIRED) gentoo-dev 2010-01-12 22:07:26 UTC
Created attachment 216293 [details, diff]
printk debug patch

Try applying the attached patch to your lpfc_init.c in a 2.6.32 kernel.
It was a bit hastily made but it 'should' compile fine.

After applying the patch, compile the kernel and boot it. Then attach here your dmesg :)
Comment 10 Oleg Gawriloff 2010-01-13 10:17:40 UTC
Created attachment 216342 [details]
dmesg from 2.6.32.3 with patched lpfc_init

lpfc builded as module
Comment 11 George Kadianakis (RETIRED) gentoo-dev 2010-01-14 00:22:47 UTC
Created attachment 216438 [details, diff]
enables MSI-X/MSI interrupts in lpfc

You might want to try the attached patch which I also submitted upstream.
Comment 12 Oleg Gawriloff 2010-01-22 23:22:06 UTC
I've also reported this in kernel bugzilla. Pls see http://bugzilla.kernel.org/show_bug.cgi?id=14877
They wrote smth that I cannot answer:
This is committed to the scsi tree without a cc:stable, so it won't get
backported into 2.6.32.x and might not make it into 2.6.33 either.

Was that all intentional?

George, pls step in.
Comment 13 Mike Pagano gentoo-dev 2010-02-04 18:29:02 UTC
(In reply to comment #11)
> Created an attachment (id=216438) [details]
> enables MSI-X/MSI interrupts in lpfc
> 
> You might want to try the attached patch which I also submitted upstream.
> 

Can someone please test this patch and report the results.
Comment 14 Oleg Gawriloff 2010-02-04 20:42:47 UTC
(In reply to comment #13)

> Can someone please test this patch and report the results.
I will test it tomorrow and post result here. But I've talked with kernel devs at http://bugzilla.kernel.org/show_bug.cgi?id=14877 and they sad that disabling default use of MSI on lpfc was intentional 'cause it causes instability on some systems and that recommended way of activating MSI is through lpfc module params. I've together with James Smart tested it yesterday and it seems that lpfc can not activate its MSI capabilities through module params. James sad that it will look at the code.

Comment 15 Oleg Gawriloff 2010-02-05 10:15:19 UTC
(In reply to comment #13)
> Can someone please test this patch and report the results.
Tested. All works perfectly as intended.

10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02)
        Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 57
        Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
        Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 2000 [size=256]
        [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
        Capabilities: [58] Power Management version 2
        Capabilities: [60] MSI: Enable+ Count=1/16 Maskable- 64bit+
        Capabilities: [44] Express Endpoint, MSI 00
        Kernel driver in use: lpfc
        Kernel modules: lpfc

barzog@albatros2 ~ $ cat /proc/interrupts  | grep lpfc
  57:       1125       1100       1124        953   PCI-MSI-edge      lpfc

barzog@albatros2 ~ $ dmesg | grep lpfc
lpfc 0000:10:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16
lpfc 0000:10:00.0: setting latency timer to 64
lpfc 0000:10:00.0: irq 57 for MSI/MSI-X
lpfc 0000:10:00.0: 0:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 x0 x0 0

Comment 16 George Kadianakis (RETIRED) gentoo-dev 2010-02-05 10:20:28 UTC
(In reply to comment #15)
> (In reply to comment #13)
> > Can someone please test this patch and report the results.
> Tested. All works perfectly as intended.
> 
> 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host
> Adapter (rev 02)
>         Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host
> Adapter
>         Flags: bus master, fast devsel, latency 0, IRQ 57
>         Memory at dfb01000 (64-bit, non-prefetchable) [size=4K]
>         Memory at dfb00000 (64-bit, non-prefetchable) [size=256]
>         I/O ports at 2000 [size=256]
>         [virtual] Expansion ROM at c0000000 [disabled] [size=256K]
>         Capabilities: [58] Power Management version 2
>         Capabilities: [60] MSI: Enable+ Count=1/16 Maskable- 64bit+
>         Capabilities: [44] Express Endpoint, MSI 00
>         Kernel driver in use: lpfc
>         Kernel modules: lpfc
> 
> barzog@albatros2 ~ $ cat /proc/interrupts  | grep lpfc
>   57:       1125       1100       1124        953   PCI-MSI-edge      lpfc
> 
> barzog@albatros2 ~ $ dmesg | grep lpfc
> lpfc 0000:10:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16
> lpfc 0000:10:00.0: setting latency timer to 64
> lpfc 0000:10:00.0: irq 57 for MSI/MSI-X
> lpfc 0000:10:00.0: 0:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 x0 x0 0
> 

Okay, that's good to know.
The fact that the patch won't be included 'till 2.6.34 is intentional indeed (like the kernel bugzilla entry mentions.).

Thanks for reporting and testing this bug, Oleg!