When using any vanilla prior to 2.6.31.* (checked on 2.6.30.*, 2.6.29.*) our lpfc/qla2xxx FC card is sitting on MSI interrupt handler: CPU0 CPU1 CPU2 CPU3 0: 218 0 0 0 IO-APIC-edge timer 1: 0 0 0 0 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC-XT cascade 14: 0 0 0 0 IO-APIC-edge ide0 16: 21 5857196 402342061 85258815 IO-APIC-fasteoi eth0 17: 10 6 7 3907371 IO-APIC-fasteoi 19: 561 4505353 557 4167122 IO-APIC-fasteoi ahci 23: 641 398 386 507 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5 54: 0 0 0 0 PCI-MSI-edge pciehp 55: 0 0 0 0 PCI-MSI-edge pciehp 57: 9741654 9266266 8019744 9105905 PCI-MSI-edge lpfc NMI: 0 0 0 0 Non-maskable interrupts LOC: 1183406355 1183406217 1183405963 1183406031 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts RES: 492712 399006 397685 331608 Rescheduling interrupts CAL: 249 361 357 350 Function call interrupts TLB: 462643 494800 517256 624827 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts ERR: 0 MIS: 0 With 2.6.31.6 its on IO-APIC sharing it with eth0 which is unacceptable. CPU0 CPU1 CPU2 CPU3 0: 36 0 0 0 IO-APIC-edge timer 1: 0 0 0 0 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC-XT cascade 14: 0 0 0 0 IO-APIC-edge ide0 16: 1101 890 1093 881 IO-APIC-fasteoi lpfc, eth0 19: 483 2769 490 2778 IO-APIC-fasteoi ahci 23: 9 13 10 11 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb 4, uhci_hcd:usb5 NMI: 0 0 0 0 Non-maskable interrupts LOC: 6089 5077 8483 5303 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts CNT: 0 0 0 0 Performance counter interrupts PND: 0 0 0 0 Performance pending work RES: 402 468 498 598 Rescheduling interrupts CAL: 226 340 368 344 Function call interrupts TLB: 834 1235 1523 1753 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 1 1 1 1 Machine check polls ERR: 0 MIS: 0 Reproducible: Always
Please attach the dmesg output of a 2.6.31* and a 2.6.30* kernel.
Created attachment 214630 [details] dmesg from 2.6.32.2
Created attachment 214632 [details] dmesg 2.6.30.10
(In reply to comment #1) Problem persist in 2.6.32.* too.
Use "lspci -vt" to find your device's bridge and paste us the contents of /sys/bus/pci/devices/<device>/msi_bus.
(In reply to comment #5) > Use "lspci -vt" to find your device's bridge and paste us the contents of > /sys/bus/pci/devices/<device>/msi_bus. On 2.6.30.10: -[0000:00]-+-00.0 Intel Corporation 5000X Chipset Memory Controller Hub +-04.0-[10]----00.0 Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter barzog@albatros2 ~ $ cat /sys/bus/pci/devices/0000\:00\:04.0/msi_bus 1 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02) Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter Flags: bus master, fast devsel, latency 0, IRQ 57 Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] Memory at dfb00000 (64-bit, non-prefetchable) [size=256] I/O ports at 2000 [size=256] [virtual] Expansion ROM at c2000000 [disabled] [size=256K] Capabilities: [58] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/16 Maskable- 64bit+ Capabilities: [44] Express Endpoint, MSI 00 Kernel driver in use: lpfc Kernel modules: lpfc On 2.6.32.2: barzog@albatros2 ~ $ cat /sys/bus/pci/devices/0000\:00\:04.0/msi_bus 1 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02) Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] Memory at dfb00000 (64-bit, non-prefetchable) [size=256] I/O ports at 2000 [size=256] [virtual] Expansion ROM at c0000000 [disabled] [size=256K] Capabilities: [58] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [44] Express Endpoint, MSI 00 Kernel driver in use: lpfc Kernel modules: lpfc I've checked on another server where we have some tg3 cards on MSI, with 2.6.31.[ and above they too use IOAPIC.
Hey, wanna try using the noapic and nolapic kernel parameters to see if it's a race condition problem?
(In reply to comment #7) > wanna try using the noapic and nolapic kernel parameters to see if it's a race > condition problem? With noapic and 2.6.32.2: 7: 9136 0 0 0 XT-PIC-XT lpfc, eth0 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02) Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter Flags: bus master, fast devsel, latency 0, IRQ 7 Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] Memory at dfb00000 (64-bit, non-prefetchable) [size=256] I/O ports at 2000 [size=256] [virtual] Expansion ROM at c0000000 [disabled] [size=256K] Capabilities: [58] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [44] Express Endpoint, MSI 00 Kernel driver in use: lpfc Kernel modules: lpfc With nolapic and 2.6.32.2: 7: 8148 XT-PIC-XT lpfc, eth0 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02) Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter Flags: bus master, fast devsel, latency 0, IRQ 7 Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] Memory at dfb00000 (64-bit, non-prefetchable) [size=256] I/O ports at 2000 [size=256] [virtual] Expansion ROM at c0000000 [disabled] [size=256K] Capabilities: [58] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [44] Express Endpoint, MSI 00 Kernel driver in use: lpfc Kernel modules: lpfc With nolapic and noapic and 2.6.32.2: 7: 62707 XT-PIC-XT lpfc, eth0 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02) Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter Flags: bus master, fast devsel, latency 0, IRQ 7 Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] Memory at dfb00000 (64-bit, non-prefetchable) [size=256] I/O ports at 2000 [size=256] [virtual] Expansion ROM at c0000000 [disabled] [size=256K] Capabilities: [58] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [44] Express Endpoint, MSI 00 Kernel driver in use: lpfc Kernel modules: lpfc With new kernel 2.6.32.3 (comparing with 2.6.32.2) no changes: 16: 608 502 602 511 IO-APIC-fasteoi lpfc, eth0 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02) Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] Memory at dfb00000 (64-bit, non-prefetchable) [size=256] I/O ports at 2000 [size=256] [virtual] Expansion ROM at c0000000 [disabled] [size=256K] Capabilities: [58] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/16 Maskable- 64bit+ Capabilities: [44] Express Endpoint, MSI 00 Kernel driver in use: lpfc Kernel modules: lpfc
Created attachment 216293 [details, diff] printk debug patch Try applying the attached patch to your lpfc_init.c in a 2.6.32 kernel. It was a bit hastily made but it 'should' compile fine. After applying the patch, compile the kernel and boot it. Then attach here your dmesg :)
Created attachment 216342 [details] dmesg from 2.6.32.3 with patched lpfc_init lpfc builded as module
Created attachment 216438 [details, diff] enables MSI-X/MSI interrupts in lpfc You might want to try the attached patch which I also submitted upstream.
I've also reported this in kernel bugzilla. Pls see http://bugzilla.kernel.org/show_bug.cgi?id=14877 They wrote smth that I cannot answer: This is committed to the scsi tree without a cc:stable, so it won't get backported into 2.6.32.x and might not make it into 2.6.33 either. Was that all intentional? George, pls step in.
(In reply to comment #11) > Created an attachment (id=216438) [details] > enables MSI-X/MSI interrupts in lpfc > > You might want to try the attached patch which I also submitted upstream. > Can someone please test this patch and report the results.
(In reply to comment #13) > Can someone please test this patch and report the results. I will test it tomorrow and post result here. But I've talked with kernel devs at http://bugzilla.kernel.org/show_bug.cgi?id=14877 and they sad that disabling default use of MSI on lpfc was intentional 'cause it causes instability on some systems and that recommended way of activating MSI is through lpfc module params. I've together with James Smart tested it yesterday and it seems that lpfc can not activate its MSI capabilities through module params. James sad that it will look at the code.
(In reply to comment #13) > Can someone please test this patch and report the results. Tested. All works perfectly as intended. 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter (rev 02) Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host Adapter Flags: bus master, fast devsel, latency 0, IRQ 57 Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] Memory at dfb00000 (64-bit, non-prefetchable) [size=256] I/O ports at 2000 [size=256] [virtual] Expansion ROM at c0000000 [disabled] [size=256K] Capabilities: [58] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/16 Maskable- 64bit+ Capabilities: [44] Express Endpoint, MSI 00 Kernel driver in use: lpfc Kernel modules: lpfc barzog@albatros2 ~ $ cat /proc/interrupts | grep lpfc 57: 1125 1100 1124 953 PCI-MSI-edge lpfc barzog@albatros2 ~ $ dmesg | grep lpfc lpfc 0000:10:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16 lpfc 0000:10:00.0: setting latency timer to 64 lpfc 0000:10:00.0: irq 57 for MSI/MSI-X lpfc 0000:10:00.0: 0:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 x0 x0 0
(In reply to comment #15) > (In reply to comment #13) > > Can someone please test this patch and report the results. > Tested. All works perfectly as intended. > > 10:00.0 Fibre Channel: Emulex Corporation Zephyr LightPulse Fibre Channel Host > Adapter (rev 02) > Subsystem: Emulex Corporation Zephyr LightPulse Fibre Channel Host > Adapter > Flags: bus master, fast devsel, latency 0, IRQ 57 > Memory at dfb01000 (64-bit, non-prefetchable) [size=4K] > Memory at dfb00000 (64-bit, non-prefetchable) [size=256] > I/O ports at 2000 [size=256] > [virtual] Expansion ROM at c0000000 [disabled] [size=256K] > Capabilities: [58] Power Management version 2 > Capabilities: [60] MSI: Enable+ Count=1/16 Maskable- 64bit+ > Capabilities: [44] Express Endpoint, MSI 00 > Kernel driver in use: lpfc > Kernel modules: lpfc > > barzog@albatros2 ~ $ cat /proc/interrupts | grep lpfc > 57: 1125 1100 1124 953 PCI-MSI-edge lpfc > > barzog@albatros2 ~ $ dmesg | grep lpfc > lpfc 0000:10:00.0: PCI->APIC IRQ transform: INT A -> IRQ 16 > lpfc 0000:10:00.0: setting latency timer to 64 > lpfc 0000:10:00.0: irq 57 for MSI/MSI-X > lpfc 0000:10:00.0: 0:1303 Link Up Event x1 received Data: x1 xf7 x10 x0 x0 x0 0 > Okay, that's good to know. The fact that the patch won't be included 'till 2.6.34 is intentional indeed (like the kernel bugzilla entry mentions.). Thanks for reporting and testing this bug, Oleg!