Summary: | forcedeth / MCP55 Issues - freezes on boot after detecting NIC | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Alex Howells (RETIRED) <astinus> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | aabdulla |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://bugzilla.kernel.org/show_bug.cgi?id=9308 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
working boot sequence from a 2021M chassis
gentoo 2.6.23 kernel configuration debian kernel configuration debug patch |
Description
Alex Howells (RETIRED)
2007-10-30 21:26:22 UTC
I have just booted up one of our bigger chassis, its a 2U variant with 8 drives; motherboards are virtually identical except these have 2 x CPU sockets, chipsets are *meant* to be the same, except this one boots and my 1U ones don't: http://rafb.net/p/3SHEVf44.html Above log is a snippet from the boot process of a working system :) Obviously it'd be nice if the 1U boxes started looking like that! Let me know how I can help you guys debug it. Just a random idea before we dig further, have you tried disabling CONFIG_PCI_MMCONFIG? How about booting with acpi=off? Also and for what it's worth, these are the hardware: http://www.supermicro.com/Aplus/system/1U/1011/AS-1011M-T2.cfm <-- Fails. http://www.supermicro.com/Aplus/system/1U/1021/AS-1021M-T2+V.cfm <-- Fails. http://www.supermicro.com/Aplus/system/2U/2021/AS-2021M-T2R+V.cfm <-- Works. All our cold-swap chassis which work fine are Gigabyte GA-M61PM-S2 (rev 2.0) boards for the most part :) (In reply to comment #2) > Just a random idea before we dig further, have you tried disabling > CONFIG_PCI_MMCONFIG? How about booting with acpi=off? Just checked out both of these options, acpi=off definitely fails and compiling without CONFIG_PCI_MMCONFIG also fails. As an aside, I have tried 'noapic' and 'irqpoll' too, just on the off chance... None of the usual suspects seem to make my box boot properly. Tomorrow I'm going to give it a shot with a standard x86 kernel to try and isolate whether this issue is x86_64 specific or not :) Created attachment 134747 [details]
working boot sequence from a 2021M chassis
Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Created attachment 134749 [details]
gentoo 2.6.23 kernel configuration
Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Created attachment 134750 [details]
debian kernel configuration
Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Other random suggestions: try compiling forcedeth as a module and seeing if the dma_64bit=0, msi=0 and/or msix=0 parameters help. (In reply to comment #8) > Other random suggestions: try compiling forcedeth as a module and seeing if the > dma_64bit=0, msi=0 and/or msix=0 parameters help. Thanks for the suggestions. Removing the [*] entirely makes the system boot fine, except it won't be able to get a DHCP lease and thus cannot mount the NFS mount and continue booting :) Therefore this seems to suggest it's definitely a problem with forcedeth, as opposed to whatever might be loading straight after it. . . Compiling as a module and even letting genkernel do it's thing and using an initrd yields much the same result - the system boots but I can't use nfsroot. I'll try those other parameters in a second, when I arrive at work. :-) > > I'll try those other parameters in a second, when I arrive at work. :-) > When I arrived at work the box was booted, and very roughly judging by uptime, I'd have to say it hung as stated above for 2-3 hours. dmesg output as requested: Linux version 2.6.23-gentoo (root@archangel.0wn3d.us) (gcc version 4.1.2 (Gentoo 4.1.2 p1.0.1)) #2 SMP Tue Oct 30 21:55:45 GMT 2007 Command line: console=ttyS0,115200 acpi=off ip=dhcp root=/dev/nfs nfsroot=80.68.87.201:/mirror/bootstrap/netboot/gentoo-amd64,tcp,nolock,nfsvers=3 BOOT_IMAGE=pxelinux.misc/vmlinuz-2.6.23-gentoo BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dffd0000 (usable) BIOS-e820: 00000000dffd0000 - 00000000dffde000 (ACPI data) BIOS-e820: 00000000dffde000 - 00000000e0000000 (ACPI NVS) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000200000000 (usable) Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 917456) 1 entries of 3200 used Entering add_active_range(0, 1048576, 2097152) 2 entries of 3200 used end_pfn_map = 2097152 DMI present. Scanning NUMA topology in Northbridge 24 CPU has 2 num_cores No NUMA configuration found Faking a node at 0000000000000000-0000000200000000 Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 917456) 1 entries of 3200 used Entering add_active_range(0, 1048576, 2097152) 2 entries of 3200 used Bootmem setup node 0 0000000000000000-0000000200000000 Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 2097152 Movable zone start PFN for each node early_node_map[3] active PFN ranges 0: 0 -> 159 0: 256 -> 917456 0: 1048576 -> 2097152 On node 0 totalpages: 1965935 DMA zone: 56 pages used for memmap DMA zone: 1674 pages reserved DMA zone: 2269 pages, LIFO batch:0 DMA32 zone: 14280 pages used for memmap DMA32 zone: 899080 pages, LIFO batch:31 Normal zone: 14336 pages used for memmap Normal zone: 1034240 pages, LIFO batch:31 Movable zone: 0 pages used for memmap Nvidia board detected. Ignoring ACPI timer override. If you got timer trouble try acpi_use_timer_override Intel MultiProcessor Specification v1.4 MPTABLE: OEM ID: nVidia MPTABLE: Product ID: MCP55 MPTABLE: APIC at: 0xFEE00000 Processor #0 (Bootup-CPU) Processor #1 I/O APIC #2 at 0xFEC00000. Setting APIC routing to flat Processors: 2 swsusp: Registered nosave memory region: 000000000009f000 - 00000000000a0000 swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000e4000 swsusp: Registered nosave memory region: 00000000000e4000 - 0000000000100000 swsusp: Registered nosave memory region: 00000000dffd0000 - 00000000dffde000 swsusp: Registered nosave memory region: 00000000dffde000 - 00000000e0000000 swsusp: Registered nosave memory region: 00000000e0000000 - 00000000fec00000 swsusp: Registered nosave memory region: 00000000fec00000 - 00000000fec01000 swsusp: Registered nosave memory region: 00000000fec01000 - 00000000fee00000 swsusp: Registered nosave memory region: 00000000fee00000 - 00000000fef00000 swsusp: Registered nosave memory region: 00000000fef00000 - 00000000ff780000 swsusp: Registered nosave memory region: 00000000ff780000 - 0000000100000000 Allocating PCI resources starting at e2000000 (gap: e0000000:1ec00000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 35048 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 1935589 Policy zone: Normal Kernel command line: console=ttyS0,115200 acpi=off ip=dhcp root=/dev/nfs nfsroot=80.68.87.201:/mirror/bootstrap/netboot/gentoo-amd64,tcp,nolock,nfsvers=3 BOOT_IMAGE=pxelinux.misc/vmlinuz-2.6.23-gentoo Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Marking TSC unstable due to TSCs unsynchronized time.c: Detected 2613.450 MHz processor. Console: colour VGA+ 80x25 console [ttyS0] enabled Checking aperture... CPU 0: aperture @ 7fc4000000 size 32 MB Aperture too small (32 MB) No AGP bridge found Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ 4000000 Memory: 7676228k/8388608k available (3400k kernel code, 187512k reserved, 1911k data, 328k init) Calibrating delay using timer specific routine.. 5231.04 BogoMIPS (lpj=10462093) Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 0/0 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 SMP alternatives: switching to UP code ExtINT not setup in hardware but reported by MP table Using local APIC timer interrupts. result 12564682 Detected 12.564 MHz APIC timer. SMP alternatives: switching to SMP code Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 5227.15 BogoMIPS (lpj=10454309) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 AMD Processor model unknown stepping 03 Brought up 2 CPUs NET: Registered protocol family 16 PCI: Using configuration type 1 ACPI: Interpreter disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI: disabled SCSI subsystem initialized libata version 2.21 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Probing PCI hardware PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge - 0000:00:06.0 PCI: Using IRQ router default [10de/0364] at 0000:00:01.0 PCI->APIC IRQ transform: 0000:00:01.1[A] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:02.0[A] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:02.1[B] -> IRQ 11 PCI->APIC IRQ transform: 0000:00:05.0[A] -> IRQ 5 PCI->APIC IRQ transform: 0000:00:05.1[B] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:05.2[C] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:08.0[A] -> IRQ 11 PCI->APIC IRQ transform: 0000:00:09.0[A] -> IRQ 5 PCI->APIC IRQ transform: 0000:01:06.0[A] -> IRQ 10 PCI-DMA: Disabling AGP. PCI-DMA: aperture base @ 4000000 size 65536 KB PCI-DMA: using GART IOMMU. PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture PCI: Bridge: 0000:00:06.0 IO window: d000-dfff MEM window: fea00000-feafffff PREFETCH window: f0000000-f7ffffff PCI: Bridge: 0000:00:0a.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0b.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0c.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0d.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0e.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0f.0 IO window: e000-efff MEM window: feb00000-febfffff PREFETCH window: fc000000-fdffffff PCI: Setting latency timer of device 0000:00:06.0 to 64 PCI: Setting latency timer of device 0000:00:0a.0 to 64 PCI: Setting latency timer of device 0000:00:0b.0 to 64 PCI: Setting latency timer of device 0000:00:0c.0 to 64 PCI: Setting latency timer of device 0000:00:0d.0 to 64 PCI: Setting latency timer of device 0000:00:0e.0 to 64 PCI: Setting latency timer of device 0000:00:0f.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) TCP established hash table entries: 1048576 (order: 12, 25165824 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 1048576 bind 65536) TCP reno registered Total HugeTLB memory allocated, 0 Installing knfsd (copyright (C) 1996 okir@monad.swb.de). io scheduler noop registered io scheduler deadline registered io scheduler cfq registered (default) Boot video device is 0000:01:06.0 PCI: Setting latency timer of device 0000:00:0a.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0a.0:pcie00] PCI: Setting latency timer of device 0000:00:0b.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0b.0:pcie00] PCI: Setting latency timer of device 0000:00:0c.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0c.0:pcie00] PCI: Setting latency timer of device 0000:00:0d.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0d.0:pcie00] PCI: Setting latency timer of device 0000:00:0e.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0e.0:pcie00] PCI: Setting latency timer of device 0000:00:0f.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0f.0:pcie00] Real Time Clock Driver v1.12ac Linux agpgart interface v0.102 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: module loaded Intel(R) PRO/1000 Network Driver - version 7.3.20-k2 Copyright (c) 1999-2006 Intel Corporation. e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI e100: Copyright(c) 1999-2006 Intel Corporation forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> netconsole: not configured, aborting Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx NFORCE-MCP55: IDE controller at PCI slot 0000:00:04.0 NFORCE-MCP55: chipset revision 161 NFORCE-MCP55: not 100% native mode: will probe irqs later NFORCE-MCP55: 0000:00:04.0 (rev a1) UDMA133 controller ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:DMA Probing IDE interface ide0... hdb: CD-224E-N, ATAPI CD/DVD-ROM drive hdb: selected mode 0x42 ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdb: ATAPI 24X CD-ROM drive, 256kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 sata_nv 0000:00:05.0: version 3.5 PCI: Setting latency timer of device 0000:00:05.0 to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0x000000000001c480 ctl 0x000000000001c402 bmdma 0x000000000001bc00 irq 5 ata2: SATA max UDMA/133 cmd 0x000000000001c080 ctl 0x000000000001c002 bmdma 0x000000000001bc08 irq 5 ata1: SATA link down (SStatus 0 SControl 300) ata2: SATA link down (SStatus 0 SControl 300) PCI: Setting latency timer of device 0000:00:05.1 to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0x000000000001b880 ctl 0x000000000001b802 bmdma 0x000000000001b080 irq 10 ata4: SATA max UDMA/133 cmd 0x000000000001b480 ctl 0x000000000001b402 bmdma 0x000000000001b088 irq 10 ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) PCI: Setting latency timer of device 0000:00:05.2 to 64 scsi4 : sata_nv scsi5 : sata_nv ata5: SATA max UDMA/133 cmd 0x000000000001b000 ctl 0x000000000001ac02 bmdma 0x000000000001a480 irq 10 ata6: SATA max UDMA/133 cmd 0x000000000001a880 ctl 0x000000000001a802 bmdma 0x000000000001a488 irq 10 ata5: SATA link down (SStatus 0 SControl 300) ata6: SATA link down (SStatus 0 SControl 300) Fusion MPT base driver 3.04.05 Copyright (c) 1999-2007 LSI Logic Corporation Fusion MPT SPI Host driver 3.04.05 ieee1394: raw1394: /dev/raw1394 device initialized PCI: Setting latency timer of device 0000:00:02.1 to 64 ehci_hcd 0000:00:02.1: EHCI Host Controller ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:02.1: debug port 1 PCI: cache line size of 64 is not supported by device 0000:00:02.1 ehci_hcd 0000:00:02.1: irq 11, io mem 0xfe9fac00 ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 10 ports detected ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver PCI: Setting latency timer of device 0000:00:02.0 to 64 ohci_hcd 0000:00:02.0: OHCI Host Controller ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:00:02.0: irq 10, io mem 0xfe9fb000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 10 ports detected USB Universal Host Controller Interface driver v3.0 usbcore: registered new interface driver usblp Initializing USB Mass Storage driver... usbcore: registered new interface driver usb-storage USB Mass Storage support registered. PNP: No PS/2 controller found. Probing ports directly. serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com usbcore: registered new interface driver usbhid drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver oprofile: using NMI interrupt. TCP cubic registered NET: Registered protocol family 1 NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 powernow-k8: Found 1 AMD Processor model unknown processors (2 cpu cores) (version 2.00.00) powernow-k8: MP systems not supported by PSB BIOS structure powernow-k8: MP systems not supported by PSB BIOS structure eth1: no link during initialization. ADDRCONF(NETDEV_UP): eth1: link is not ready Sending DHCP requests ., OK IP-Config: Got DHCP answer from 89.16.168.148, my address is 89.16.168.221 IP-Config: Complete: device=eth0, addr=89.16.168.221, mask=255.255.255.128, gw=89.16.168.129, host=89.16.168.221, domain=office.bytemark.co.uk, nis-domain=(none), bootserver=89.16.168.148, rootserver=80.68.87.201, rootpath= Looking up port of RPC 100003/3 on 80.68.87.201 Looking up port of RPC 100005/3 on 80.68.87.201 VFS: Mounted root (nfs filesystem) readonly. Freeing unused kernel memory: 328k freed eth0: no IPv6 routers present (In reply to comment #9) > (In reply to comment #8) > > Other random suggestions: try compiling forcedeth as a module and seeing if the > > dma_64bit=0, msi=0 and/or msix=0 parameters help. dma_64bit=0 <<-- fails msi=0 <<-- fails msix=0 <<-- fails msi=0 + msix=0 <<-- fails Looks like all of those fail. 2 more things to try: In drivers/net/forcedeth.c, near the top of the file, you see: #if 0 #define dprintk printk change that to "#if 1" Secondly, compile your kernel with CONFIG_MAGIC_SYSRQ, and when the freeze happens, press some of the sysrq keys like: alt+sysrq+t alt+sysrq+b Does the system respond in any way? (In reply to comment #12) > 2 more things to try: [.snip...] > Secondly, compile your kernel with CONFIG_MAGIC_SYSRQ, and when the freeze > happens, press some of the sysrq keys like: > alt+sysrq+t > alt+sysrq+b > > Does the system respond in any way? Already had this one enabled as it's mighty useful with the serial line ;) Unfortunately neither of these has any effect. Going to make that change to forcedeth.c now and will update again shortly. (In reply to comment #12) > In drivers/net/forcedeth.c, near the top of the file, you see: > > #if 0 > #define dprintk printk > > change that to "#if 1" > Seems to cause a warning during compile: CC drivers/net/forcedeth.o drivers/net/forcedeth.c: In function ‘nv_probe’: drivers/net/forcedeth.c:5038: warning: format ‘%ld’ expects type ‘long int’, but argument 5 has type ‘resource_size_t’ Boot sequence is the same except there's a bit more output: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 forcedeth: using HIGHDMA 0000:00:08.0: link timer on. 0000:00:08.0: mgmt unit is running. mac in use 40000000. Does that help at all? With timing information as requested: [ 35.309834] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. [ 35.317568] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 [ 35.323305] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 [ 35.332026] forcedeth: using HIGHDMA [ 35.335620] 0000:00:08.0: link timer on. [ 35.339537] 0000:00:08.0: mgmt unit is running. mac in use 40000000. Now I'll wait until it gets past this point and hopefully we'll know how long it took to eventually try and boot ;) (In reply to comment #15) > With timing information as requested: > > [ 35.309834] forcedeth.c: Reverse Engineered nForce ethernet driver. Version > 0.60. > [ 35.317568] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 > [ 35.323305] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 > (level, low) -> IRQ 23 > [ 35.332026] forcedeth: using HIGHDMA > [ 35.335620] 0000:00:08.0: link timer on. > [ 35.339537] 0000:00:08.0: mgmt unit is running. mac in use 40000000. > > Now I'll wait until it gets past this point and hopefully we'll know how long > it took to eventually try and boot ;) Never seemed to get past this point this time, or at least, it's still there! Created attachment 135255 [details, diff]
debug patch
OK. Kill that session then, and apply this patch.
It adds in more debug messages, please see where it stops now.
(In reply to comment #17) > Created an attachment (id=135255) [edit] > debug patch > > OK. Kill that session then, and apply this patch. > It adds in more debug messages, please see where it stops now. [ 35.440104] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. [ 35.447865] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 [ 35.453601] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 [ 35.462315] forcedeth: using HIGHDMA [ 35.465909] 0000:00:08.0: link timer on. [ 35.469825] 0000:00:08.0: mgmt unit is running. mac in use 40000000. [ 35.476160] in loop, i=0 [ 40.518953] in loop, i=1 [ 45.560230] in loop, i=2 [ 50.601507] in loop, i=3 [ 55.642784] in loop, i=4 [ 60.684062] in loop, i=5 [ 65.725339] in loop, i=6 [ 70.766616] in loop, i=7 [ 75.807894] in loop, i=8 [ 80.849171] in loop, i=9 [ 85.890448] in loop, i=10 ... I presume this continues for quite some time........ Probably explains why it eventually boots if left to it's own devices forever :) I altered the number of loop iterations from 5000 to zero, which caused the system to boot perfectly happy; along the way it generated about 30,000 lines of output judging by my serial console log, most of which was like: [ 45.962209] 000:<7>eth0: nv_nic_irq_optimized [ 45.968062] 00 00 5e 00 01 96 00 30 48 60 8d 54 08 00 45 00 [ 45.973786] 010:<7>eth0: nv_nic_irq_optimized [ 45.978341] 00 94 26 fa 40 00 40 06 69 64 59 10 a8 e8 50 44 [ 45.984070] 020:<7>eth0: nv_nic_irq_optimized [ 45.988623] 57 c9 92 21 00 6f 78 13 c1 df ff c0 43 c7 80 18 [ 45.994352] 030:<7>eth0: nv_nic_irq_optimized [ 45.998905] 00 2e aa 8c 00 00 01 01 08 0a ff fe e7 5a 02 ff [ 46.027898] 000: 00 30 48 60 8d 54 00 1c 58 09 86 e4 08 00 45 00 [ 46.033988] 010: 00 34 e0 81 40 00 3c 06 b4 3c 50 44 57 c9 59 10 [ 46.040266] 020: a8 e8 00 6f 92 21 ff c0 43 c7 78 13 c2 3f 80 10 [ 46.046545] 030: 00 2e 2e 73 00 00 01 01 08 0a 02 ff a3 51 ff fe [ 46.053018] 000: 00 30 48 60 8d 54 00 1c 58 09 86 e4 08 00 45 00 [ 46.059119] 010: 00 54 e0 82 40 00 3c 06 b4 1b 50 44 57 c9 59 10 [ 46.065406] 020: a8 e8 00 6f 92 21 ff c0 43 c7 78 13 c2 3f 80 18 [ 46.071694] 030: 00 2e 2d 91 00 00 01 01 08 0a 02 ff a3 51 ff fe Eventually it ended up on my network boot environment, yipppeee! Disabling the printk made it boot up just fine as well, albeit without the 30,000 lines of spam ;) My kernel is configured to grab a DHCP lease on boot, and then mount an nfsroot - so I can only assume the NIC is working okay! OK. please now open an upstream bug against 2.6.24-rc1 at http://bugzilla.kernel.org against forcedeth, titled "forcedeth causes 7 hour boot delay" Give a brief description of the problem, and the dmesg from comment #14. Leave it at that - I will add the details in a comment immediately after you file it. Please post the new bug URL here when done. upstream patch is merged into netdev tree fixed in gentoo-sources-2.6.23-r3, thanks for your help digging into this one |