Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 197561 - forcedeth / MCP55 Issues - freezes on boot after detecting NIC
Summary: forcedeth / MCP55 Issues - freezes on boot after detecting NIC
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High major with 1 vote (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://bugzilla.kernel.org/show_bug.c...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-30 21:26 UTC by Alex Howells (RETIRED)
Modified: 2007-11-27 18:34 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
working boot sequence from a 2021M chassis (3SHEVf44.txt,1.56 KB, text/plain)
2007-10-30 22:04 UTC, Alex Howells (RETIRED)
Details
gentoo 2.6.23 kernel configuration (tkGpQ130.txt,31.15 KB, text/plain)
2007-10-30 22:05 UTC, Alex Howells (RETIRED)
Details
debian kernel configuration (wPOILN44.txt,55.53 KB, text/plain)
2007-10-30 22:05 UTC, Alex Howells (RETIRED)
Details
debug patch (forcedeth.patch,1.60 KB, patch)
2007-11-05 16:30 UTC, Daniel Drake (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Howells (RETIRED) gentoo-dev 2007-10-30 21:26:22 UTC
I am experiencing a complete freeze upon boot with a certain network interface card; its based on the nForce chipset, of which  we have 5-6 different variants in production use and operate a network boot setup with tftpd-hpa + nfsroot.

All of the other variants on nForce plus other stuff (VIA, e1000) work great - it's only this Supermicro AS1011M-T2+ (and family) which don't work at all. Weirdly enough our old network boot environment based on Debian and soon to be deprecated boots up fine, so I've included its kernel .config for reference.

I have tried kernels from old -> new, inc. gentoo-sources and vanilla-sources. Also grabbed kernels direct from kernel.org - everything from 2.6.16 through .24 has been tested in various forms - all fail with the same error :(

[.snip.... kernel boot sequence]
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: module loaded
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2
Copyright (c) 1999-2006 Intel Corporation.
e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23
forcedeth: using HIGHDMA

                < FREEZES FOREVER AT THIS POINT >

My kernel is monolithic, the kernels sourced from Debian are not - including configurations from both in the hope it helps :)

Gentoo Kernel (2.6.23)  :  http://rafb.net/p/tkGpQ130.html
Debian Kernel           :  http://rafb.net/p/wPOILN44.html

Portage 2.1.3.16 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.18-5-amd64 x86_64)
=================================================================
System uname: 2.6.18-5-amd64 x86_64 Intel(R) Xeon(R) CPU 5110 @ 1.60GHz
Timestamp of tree: Mon, 29 Oct 2007 04:50:01 +0000
app-shells/bash:     3.2_p17
dev-lang/python:     2.4.4-r6
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.9-r2
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.5, 1.9.6-r2, 1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.24
virtual/os-headers:  2.6.22-r2
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-mtune=k8 -O2 -pipe -fforce-addr"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-mtune=k8 -O2 -pipe -fforce-addr"
DISTDIR="/usr/portage/distfiles"
FEATURES="collision-protect distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox"
GENTOO_MIRRORS="http://80.68.87.201/gentoo"
LANG="en_GB.UTF-8"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://80.68.87.201/gentoo-portage"
USE="amd64 bash-completion berkdb bzip2 crypt gdbm ipv6 ncurses nls nptl nptlonly pam pcre perl python readline snmp ssl tcpd unicode zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x   ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3        trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64      mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis        sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Reproducible: Always

Steps to Reproduce:
Comment 1 Alex Howells (RETIRED) gentoo-dev 2007-10-30 21:43:35 UTC
I have just booted up one of our bigger chassis, its a 2U variant with 8 drives; motherboards are virtually identical except these have 2 x CPU sockets, chipsets are *meant* to be the same, except this one boots and my 1U ones don't:

    http://rafb.net/p/3SHEVf44.html

Above log is a snippet from the boot process of a working system :)  Obviously it'd be nice if the 1U boxes started looking like that!

Let me know how I can help you guys debug it.
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2007-10-30 21:44:07 UTC
Just a random idea before we dig further, have you tried disabling CONFIG_PCI_MMCONFIG? How about booting with acpi=off?
Comment 3 Alex Howells (RETIRED) gentoo-dev 2007-10-30 21:49:59 UTC
Also and for what it's worth, these are the hardware:

http://www.supermicro.com/Aplus/system/1U/1011/AS-1011M-T2.cfm     <-- Fails.
http://www.supermicro.com/Aplus/system/1U/1021/AS-1021M-T2+V.cfm   <-- Fails.
http://www.supermicro.com/Aplus/system/2U/2021/AS-2021M-T2R+V.cfm  <-- Works.

All our cold-swap chassis which work fine are Gigabyte GA-M61PM-S2 (rev 2.0) boards for the most part :)
Comment 4 Alex Howells (RETIRED) gentoo-dev 2007-10-30 21:58:16 UTC
(In reply to comment #2)
> Just a random idea before we dig further, have you tried disabling
> CONFIG_PCI_MMCONFIG? How about booting with acpi=off?

Just checked out both of these options, acpi=off definitely fails and compiling without CONFIG_PCI_MMCONFIG also fails.

As an aside, I have tried 'noapic' and 'irqpoll' too, just on the off chance... None of the usual suspects seem to make my box boot properly.

Tomorrow I'm going to give it a shot with a standard x86 kernel to try and isolate whether this issue is x86_64 specific or not :)
Comment 5 Alex Howells (RETIRED) gentoo-dev 2007-10-30 22:04:36 UTC
Created attachment 134747 [details]
working boot sequence from a 2021M chassis

Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Comment 6 Alex Howells (RETIRED) gentoo-dev 2007-10-30 22:05:06 UTC
Created attachment 134749 [details]
gentoo 2.6.23 kernel configuration

Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Comment 7 Alex Howells (RETIRED) gentoo-dev 2007-10-30 22:05:28 UTC
Created attachment 134750 [details]
debian kernel configuration

Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Comment 8 Duane Griffin 2007-10-31 01:33:00 UTC
Other random suggestions: try compiling forcedeth as a module and seeing if the dma_64bit=0, msi=0 and/or msix=0 parameters help.
Comment 9 Alex Howells (RETIRED) gentoo-dev 2007-10-31 09:37:42 UTC
(In reply to comment #8)
> Other random suggestions: try compiling forcedeth as a module and seeing if the
> dma_64bit=0, msi=0 and/or msix=0 parameters help.

Thanks for the suggestions.

Removing the [*] entirely makes the system boot fine, except it won't be able to get a DHCP lease and thus cannot mount the NFS mount and continue booting :)  Therefore this seems to suggest it's definitely a problem with forcedeth, as opposed to whatever might be loading straight after it. . .

Compiling as a module and even letting genkernel do it's thing and using an initrd yields much the same result - the system boots but I can't use nfsroot.

I'll try those other parameters in a second, when I arrive at work. :-)
Comment 10 Alex Howells (RETIRED) gentoo-dev 2007-10-31 10:46:35 UTC
> 
> I'll try those other parameters in a second, when I arrive at work. :-)
> 

When I arrived at work the box was booted, and very roughly judging by uptime, I'd have to say it hung as stated above for 2-3 hours.  dmesg output as requested:

Linux version 2.6.23-gentoo (root@archangel.0wn3d.us) (gcc version 4.1.2 (Gentoo 4.1.2 p1.0.1)) #2 SMP Tue Oct 30 21:55:45 GMT 2007
Command line: console=ttyS0,115200 acpi=off ip=dhcp root=/dev/nfs nfsroot=80.68.87.201:/mirror/bootstrap/netboot/gentoo-amd64,tcp,nolock,nfsvers=3 BOOT_IMAGE=pxelinux.misc/vmlinuz-2.6.23-gentoo 
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000dffd0000 (usable)
 BIOS-e820: 00000000dffd0000 - 00000000dffde000 (ACPI data)
 BIOS-e820: 00000000dffde000 - 00000000e0000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000200000000 (usable)
Entering add_active_range(0, 0, 159) 0 entries of 3200 used
Entering add_active_range(0, 256, 917456) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 2097152) 2 entries of 3200 used
end_pfn_map = 2097152
DMI present.
Scanning NUMA topology in Northbridge 24
CPU has 2 num_cores
No NUMA configuration found
Faking a node at 0000000000000000-0000000200000000
Entering add_active_range(0, 0, 159) 0 entries of 3200 used
Entering add_active_range(0, 256, 917456) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 2097152) 2 entries of 3200 used
Bootmem setup node 0 0000000000000000-0000000200000000
Zone PFN ranges:
  DMA             0 ->     4096
  DMA32        4096 ->  1048576
  Normal    1048576 ->  2097152
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0:        0 ->      159
    0:      256 ->   917456
    0:  1048576 ->  2097152
On node 0 totalpages: 1965935
  DMA zone: 56 pages used for memmap
  DMA zone: 1674 pages reserved
  DMA zone: 2269 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 899080 pages, LIFO batch:31
  Normal zone: 14336 pages used for memmap
  Normal zone: 1034240 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
Nvidia board detected. Ignoring ACPI timer override.
If you got timer trouble try acpi_use_timer_override
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: nVidia   MPTABLE: Product ID: MCP55        MPTABLE: APIC at: 0xFEE00000
Processor #0 (Bootup-CPU)
Processor #1
I/O APIC #2 at 0xFEC00000.
Setting APIC routing to flat
Processors: 2
swsusp: Registered nosave memory region: 000000000009f000 - 00000000000a0000
swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000e4000
swsusp: Registered nosave memory region: 00000000000e4000 - 0000000000100000
swsusp: Registered nosave memory region: 00000000dffd0000 - 00000000dffde000
swsusp: Registered nosave memory region: 00000000dffde000 - 00000000e0000000
swsusp: Registered nosave memory region: 00000000e0000000 - 00000000fec00000
swsusp: Registered nosave memory region: 00000000fec00000 - 00000000fec01000
swsusp: Registered nosave memory region: 00000000fec01000 - 00000000fee00000
swsusp: Registered nosave memory region: 00000000fee00000 - 00000000fef00000
swsusp: Registered nosave memory region: 00000000fef00000 - 00000000ff780000
swsusp: Registered nosave memory region: 00000000ff780000 - 0000000100000000
Allocating PCI resources starting at e2000000 (gap: e0000000:1ec00000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 35048 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 1935589
Policy zone: Normal
Kernel command line: console=ttyS0,115200 acpi=off ip=dhcp root=/dev/nfs nfsroot=80.68.87.201:/mirror/bootstrap/netboot/gentoo-amd64,tcp,nolock,nfsvers=3 BOOT_IMAGE=pxelinux.misc/vmlinuz-2.6.23-gentoo 
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Marking TSC unstable due to TSCs unsynchronized
time.c: Detected 2613.450 MHz processor.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Checking aperture...
CPU 0: aperture @ 7fc4000000 size 32 MB
Aperture too small (32 MB)
No AGP bridge found
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 7676228k/8388608k available (3400k kernel code, 187512k reserved, 1911k data, 328k init)
Calibrating delay using timer specific routine.. 5231.04 BogoMIPS (lpj=10462093)
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0/0 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
SMP alternatives: switching to UP code
ExtINT not setup in hardware but reported by MP table
Using local APIC timer interrupts.
result 12564682
Detected 12.564 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 5227.15 BogoMIPS (lpj=10454309)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
AMD Processor model unknown stepping 03
Brought up 2 CPUs
NET: Registered protocol family 16
PCI: Using configuration type 1
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
SCSI subsystem initialized
libata version 2.21 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - 0000:00:06.0
PCI: Using IRQ router default [10de/0364] at 0000:00:01.0
PCI->APIC IRQ transform: 0000:00:01.1[A] -> IRQ 10
PCI->APIC IRQ transform: 0000:00:02.0[A] -> IRQ 10
PCI->APIC IRQ transform: 0000:00:02.1[B] -> IRQ 11
PCI->APIC IRQ transform: 0000:00:05.0[A] -> IRQ 5
PCI->APIC IRQ transform: 0000:00:05.1[B] -> IRQ 10
PCI->APIC IRQ transform: 0000:00:05.2[C] -> IRQ 10
PCI->APIC IRQ transform: 0000:00:08.0[A] -> IRQ 11
PCI->APIC IRQ transform: 0000:00:09.0[A] -> IRQ 5
PCI->APIC IRQ transform: 0000:01:06.0[A] -> IRQ 10
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ 4000000 size 65536 KB
PCI-DMA: using GART IOMMU.
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
PCI: Bridge: 0000:00:06.0
  IO window: d000-dfff
  MEM window: fea00000-feafffff
  PREFETCH window: f0000000-f7ffffff
PCI: Bridge: 0000:00:0a.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0b.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0c.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0d.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0e.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:0f.0
  IO window: e000-efff
  MEM window: feb00000-febfffff
  PREFETCH window: fc000000-fdffffff
PCI: Setting latency timer of device 0000:00:06.0 to 64
PCI: Setting latency timer of device 0000:00:0a.0 to 64
PCI: Setting latency timer of device 0000:00:0b.0 to 64
PCI: Setting latency timer of device 0000:00:0c.0 to 64
PCI: Setting latency timer of device 0000:00:0d.0 to 64
PCI: Setting latency timer of device 0000:00:0e.0 to 64
PCI: Setting latency timer of device 0000:00:0f.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
TCP established hash table entries: 1048576 (order: 12, 25165824 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 1048576 bind 65536)
TCP reno registered
Total HugeTLB memory allocated, 0
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Boot video device is 0000:01:06.0
PCI: Setting latency timer of device 0000:00:0a.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0a.0:pcie00]
PCI: Setting latency timer of device 0000:00:0b.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0b.0:pcie00]
PCI: Setting latency timer of device 0000:00:0c.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0c.0:pcie00]
PCI: Setting latency timer of device 0000:00:0d.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0d.0:pcie00]
PCI: Setting latency timer of device 0000:00:0e.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0e.0:pcie00]
PCI: Setting latency timer of device 0000:00:0f.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:0f.0:pcie00]
Real Time Clock Driver v1.12ac
Linux agpgart interface v0.102
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: module loaded
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2
Copyright (c) 1999-2006 Intel Corporation.
e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
PCI: Setting latency timer of device 0000:00:08.0 to 64
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0
PCI: Setting latency timer of device 0000:00:09.0 to 64
forcedeth: using HIGHDMA
eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
netconsole: not configured, aborting
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE-MCP55: IDE controller at PCI slot 0000:00:04.0
NFORCE-MCP55: chipset revision 161
NFORCE-MCP55: not 100% native mode: will probe irqs later
NFORCE-MCP55: 0000:00:04.0 (rev a1) UDMA133 controller
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:DMA
Probing IDE interface ide0...
hdb: CD-224E-N, ATAPI CD/DVD-ROM drive
hdb: selected mode 0x42
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdb: ATAPI 24X CD-ROM drive, 256kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
sata_nv 0000:00:05.0: version 3.5
PCI: Setting latency timer of device 0000:00:05.0 to 64
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0x000000000001c480 ctl 0x000000000001c402 bmdma 0x000000000001bc00 irq 5
ata2: SATA max UDMA/133 cmd 0x000000000001c080 ctl 0x000000000001c002 bmdma 0x000000000001bc08 irq 5
ata1: SATA link down (SStatus 0 SControl 300)
ata2: SATA link down (SStatus 0 SControl 300)
PCI: Setting latency timer of device 0000:00:05.1 to 64
scsi2 : sata_nv
scsi3 : sata_nv
ata3: SATA max UDMA/133 cmd 0x000000000001b880 ctl 0x000000000001b802 bmdma 0x000000000001b080 irq 10
ata4: SATA max UDMA/133 cmd 0x000000000001b480 ctl 0x000000000001b402 bmdma 0x000000000001b088 irq 10
ata3: SATA link down (SStatus 0 SControl 300)
ata4: SATA link down (SStatus 0 SControl 300)
PCI: Setting latency timer of device 0000:00:05.2 to 64
scsi4 : sata_nv
scsi5 : sata_nv
ata5: SATA max UDMA/133 cmd 0x000000000001b000 ctl 0x000000000001ac02 bmdma 0x000000000001a480 irq 10
ata6: SATA max UDMA/133 cmd 0x000000000001a880 ctl 0x000000000001a802 bmdma 0x000000000001a488 irq 10
ata5: SATA link down (SStatus 0 SControl 300)
ata6: SATA link down (SStatus 0 SControl 300)
Fusion MPT base driver 3.04.05
Copyright (c) 1999-2007 LSI Logic Corporation
Fusion MPT SPI Host driver 3.04.05
ieee1394: raw1394: /dev/raw1394 device initialized
PCI: Setting latency timer of device 0000:00:02.1 to 64
ehci_hcd 0000:00:02.1: EHCI Host Controller
ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:02.1: debug port 1
PCI: cache line size of 64 is not supported by device 0000:00:02.1
ehci_hcd 0000:00:02.1: irq 11, io mem 0xfe9fac00
ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 10 ports detected
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
PCI: Setting latency timer of device 0000:00:02.0 to 64
ohci_hcd 0000:00:02.0: OHCI Host Controller
ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:02.0: irq 10, io mem 0xfe9fb000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 10 ports detected
USB Universal Host Controller Interface driver v3.0
usbcore: registered new interface driver usblp
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com
usbcore: registered new interface driver usbhid
drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
oprofile: using NMI interrupt.
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
powernow-k8: Found 1 AMD Processor model unknown processors (2 cpu cores) (version 2.00.00)
powernow-k8: MP systems not supported by PSB BIOS structure
powernow-k8: MP systems not supported by PSB BIOS structure
eth1: no link during initialization.
ADDRCONF(NETDEV_UP): eth1: link is not ready
Sending DHCP requests ., OK
IP-Config: Got DHCP answer from 89.16.168.148, my address is 89.16.168.221
IP-Config: Complete:
      device=eth0, addr=89.16.168.221, mask=255.255.255.128, gw=89.16.168.129,
     host=89.16.168.221, domain=office.bytemark.co.uk, nis-domain=(none),
     bootserver=89.16.168.148, rootserver=80.68.87.201, rootpath=
Looking up port of RPC 100003/3 on 80.68.87.201
Looking up port of RPC 100005/3 on 80.68.87.201
VFS: Mounted root (nfs filesystem) readonly.
Freeing unused kernel memory: 328k freed
eth0: no IPv6 routers present
Comment 11 Alex Howells (RETIRED) gentoo-dev 2007-10-31 11:57:04 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Other random suggestions: try compiling forcedeth as a module and seeing if the
> > dma_64bit=0, msi=0 and/or msix=0 parameters help.

dma_64bit=0        <<-- fails
msi=0              <<-- fails
msix=0             <<-- fails
msi=0 + msix=0     <<-- fails

Looks like all of those fail.
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2007-11-02 12:36:03 UTC
2 more things to try:

In drivers/net/forcedeth.c, near the top of the file, you see:

#if 0
#define dprintk			printk

change that to "#if 1"

Secondly, compile your kernel with CONFIG_MAGIC_SYSRQ, and when the freeze happens, press some of the sysrq keys like:
alt+sysrq+t
alt+sysrq+b

Does the system respond in any way?
Comment 13 Alex Howells (RETIRED) gentoo-dev 2007-11-05 11:39:43 UTC
(In reply to comment #12)
> 2 more things to try:
[.snip...]
> Secondly, compile your kernel with CONFIG_MAGIC_SYSRQ, and when the freeze
> happens, press some of the sysrq keys like:
> alt+sysrq+t
> alt+sysrq+b
> 
> Does the system respond in any way?

Already had this one enabled as it's mighty useful with the serial line ;)  Unfortunately neither of these has any effect.

Going to make that change to forcedeth.c now and will update again shortly.

Comment 14 Alex Howells (RETIRED) gentoo-dev 2007-11-05 12:20:24 UTC
(In reply to comment #12)
> In drivers/net/forcedeth.c, near the top of the file, you see:
> 
> #if 0
> #define dprintk                 printk
> 
> change that to "#if 1"
> 

Seems to cause a warning during compile:

  CC      drivers/net/forcedeth.o
drivers/net/forcedeth.c: In function ‘nv_probe’:
drivers/net/forcedeth.c:5038: warning: format ‘%ld’ expects type ‘long int’, but argument 5 has type ‘resource_size_t’

Boot sequence is the same except there's a bit more output:

forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23
ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23
forcedeth: using HIGHDMA
0000:00:08.0: link timer on.
0000:00:08.0: mgmt unit is running. mac in use 40000000.

Does that help at all?
Comment 15 Alex Howells (RETIRED) gentoo-dev 2007-11-05 12:45:24 UTC
With timing information as requested:

[   35.309834] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
[   35.317568] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23
[   35.323305] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23
[   35.332026] forcedeth: using HIGHDMA
[   35.335620] 0000:00:08.0: link timer on.
[   35.339537] 0000:00:08.0: mgmt unit is running. mac in use 40000000.

Now I'll wait until it gets past this point and hopefully we'll know how long it took to eventually try and boot ;)
Comment 16 Alex Howells (RETIRED) gentoo-dev 2007-11-05 16:17:56 UTC
(In reply to comment #15)
> With timing information as requested:
> 
> [   35.309834] forcedeth.c: Reverse Engineered nForce ethernet driver. Version
> 0.60.
> [   35.317568] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23
> [   35.323305] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23
> (level, low) -> IRQ 23
> [   35.332026] forcedeth: using HIGHDMA
> [   35.335620] 0000:00:08.0: link timer on.
> [   35.339537] 0000:00:08.0: mgmt unit is running. mac in use 40000000.
> 
> Now I'll wait until it gets past this point and hopefully we'll know how long
> it took to eventually try and boot ;)

Never seemed to get past this point this time, or at least, it's still there!
Comment 17 Daniel Drake (RETIRED) gentoo-dev 2007-11-05 16:30:48 UTC
Created attachment 135255 [details, diff]
debug patch

OK. Kill that session then, and apply this patch.
It adds in more debug messages, please see where it stops now.
Comment 18 Alex Howells (RETIRED) gentoo-dev 2007-11-05 16:47:11 UTC
(In reply to comment #17)
> Created an attachment (id=135255) [edit]
> debug patch
> 
> OK. Kill that session then, and apply this patch.
> It adds in more debug messages, please see where it stops now.

[   35.440104] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
[   35.447865] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23
[   35.453601] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23
[   35.462315] forcedeth: using HIGHDMA
[   35.465909] 0000:00:08.0: link timer on.
[   35.469825] 0000:00:08.0: mgmt unit is running. mac in use 40000000.
[   35.476160] in loop, i=0
[   40.518953] in loop, i=1
[   45.560230] in loop, i=2
[   50.601507] in loop, i=3
[   55.642784] in loop, i=4
[   60.684062] in loop, i=5
[   65.725339] in loop, i=6
[   70.766616] in loop, i=7
[   75.807894] in loop, i=8
[   80.849171] in loop, i=9
[   85.890448] in loop, i=10
... I presume this continues for quite some time........

Probably explains why it eventually boots if left to it's own devices forever :) 
Comment 19 Alex Howells (RETIRED) gentoo-dev 2007-11-05 17:06:02 UTC
I altered the number of loop iterations from 5000 to zero, which caused the system to boot perfectly happy; along the way it generated about 30,000 lines of output judging by my serial console log, most of which was like:

[   45.962209] 000:<7>eth0: nv_nic_irq_optimized
[   45.968062]  00 00 5e 00 01 96 00 30 48 60 8d 54 08 00 45 00
[   45.973786] 010:<7>eth0: nv_nic_irq_optimized
[   45.978341]  00 94 26 fa 40 00 40 06 69 64 59 10 a8 e8 50 44
[   45.984070] 020:<7>eth0: nv_nic_irq_optimized
[   45.988623]  57 c9 92 21 00 6f 78 13 c1 df ff c0 43 c7 80 18
[   45.994352] 030:<7>eth0: nv_nic_irq_optimized
[   45.998905]  00 2e aa 8c 00 00 01 01 08 0a ff fe e7 5a 02 ff
[   46.027898] 000: 00 30 48 60 8d 54 00 1c 58 09 86 e4 08 00 45 00
[   46.033988] 010: 00 34 e0 81 40 00 3c 06 b4 3c 50 44 57 c9 59 10
[   46.040266] 020: a8 e8 00 6f 92 21 ff c0 43 c7 78 13 c2 3f 80 10
[   46.046545] 030: 00 2e 2e 73 00 00 01 01 08 0a 02 ff a3 51 ff fe
[   46.053018] 000: 00 30 48 60 8d 54 00 1c 58 09 86 e4 08 00 45 00
[   46.059119] 010: 00 54 e0 82 40 00 3c 06 b4 1b 50 44 57 c9 59 10
[   46.065406] 020: a8 e8 00 6f 92 21 ff c0 43 c7 78 13 c2 3f 80 18
[   46.071694] 030: 00 2e 2d 91 00 00 01 01 08 0a 02 ff a3 51 ff fe

Eventually it ended up on my network boot environment, yipppeee!

Disabling the printk made it boot up just fine as well, albeit without the 30,000 lines of spam ;)  My kernel is configured to grab a DHCP lease on boot, and then mount an nfsroot - so I can only assume the NIC is working okay!
Comment 20 Daniel Drake (RETIRED) gentoo-dev 2007-11-05 17:28:30 UTC
OK. please now open an upstream bug against 2.6.24-rc1 at http://bugzilla.kernel.org against forcedeth, titled "forcedeth causes 7 hour boot delay"

Give a brief description of the problem, and the dmesg from comment #14. Leave it at that - I will add the details in a comment immediately after you file it.

Please post the new bug URL here when done.
Comment 21 Alex Howells (RETIRED) gentoo-dev 2007-11-05 17:43:39 UTC
http://bugzilla.kernel.org/show_bug.cgi?id=9308
Comment 22 Daniel Drake (RETIRED) gentoo-dev 2007-11-27 15:02:46 UTC
upstream patch is merged into netdev tree
Comment 23 Daniel Drake (RETIRED) gentoo-dev 2007-11-27 18:34:13 UTC
fixed in gentoo-sources-2.6.23-r3, thanks for your help digging into this one