I am experiencing a complete freeze upon boot with a certain network interface card; its based on the nForce chipset, of which we have 5-6 different variants in production use and operate a network boot setup with tftpd-hpa + nfsroot. All of the other variants on nForce plus other stuff (VIA, e1000) work great - it's only this Supermicro AS1011M-T2+ (and family) which don't work at all. Weirdly enough our old network boot environment based on Debian and soon to be deprecated boots up fine, so I've included its kernel .config for reference. I have tried kernels from old -> new, inc. gentoo-sources and vanilla-sources. Also grabbed kernels direct from kernel.org - everything from 2.6.16 through .24 has been tested in various forms - all fail with the same error :( [.snip.... kernel boot sequence] RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: module loaded Intel(R) PRO/1000 Network Driver - version 7.3.20-k2 Copyright (c) 1999-2006 Intel Corporation. e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI e100: Copyright(c) 1999-2006 Intel Corporation forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 forcedeth: using HIGHDMA < FREEZES FOREVER AT THIS POINT > My kernel is monolithic, the kernels sourced from Debian are not - including configurations from both in the hope it helps :) Gentoo Kernel (2.6.23) : http://rafb.net/p/tkGpQ130.html Debian Kernel : http://rafb.net/p/wPOILN44.html Portage 2.1.3.16 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.18-5-amd64 x86_64) ================================================================= System uname: 2.6.18-5-amd64 x86_64 Intel(R) Xeon(R) CPU 5110 @ 1.60GHz Timestamp of tree: Mon, 29 Oct 2007 04:50:01 +0000 app-shells/bash: 3.2_p17 dev-lang/python: 2.4.4-r6 dev-python/pycrypto: 2.0.1-r6 sys-apps/baselayout: 1.12.9-r2 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.5, 1.9.6-r2, 1.10 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.3.16 sys-devel/libtool: 1.5.24 virtual/os-headers: 2.6.22-r2 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-mtune=k8 -O2 -pipe -fforce-addr" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-mtune=k8 -O2 -pipe -fforce-addr" DISTDIR="/usr/portage/distfiles" FEATURES="collision-protect distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox" GENTOO_MIRRORS="http://80.68.87.201/gentoo" LANG="en_GB.UTF-8" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://80.68.87.201/gentoo-portage" USE="amd64 bash-completion berkdb bzip2 crypt gdbm ipv6 ncurses nls nptl nptlonly pam pcre perl python readline snmp ssl tcpd unicode zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i810 mach64 mga neomagic nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS Reproducible: Always Steps to Reproduce:
I have just booted up one of our bigger chassis, its a 2U variant with 8 drives; motherboards are virtually identical except these have 2 x CPU sockets, chipsets are *meant* to be the same, except this one boots and my 1U ones don't: http://rafb.net/p/3SHEVf44.html Above log is a snippet from the boot process of a working system :) Obviously it'd be nice if the 1U boxes started looking like that! Let me know how I can help you guys debug it.
Just a random idea before we dig further, have you tried disabling CONFIG_PCI_MMCONFIG? How about booting with acpi=off?
Also and for what it's worth, these are the hardware: http://www.supermicro.com/Aplus/system/1U/1011/AS-1011M-T2.cfm <-- Fails. http://www.supermicro.com/Aplus/system/1U/1021/AS-1021M-T2+V.cfm <-- Fails. http://www.supermicro.com/Aplus/system/2U/2021/AS-2021M-T2R+V.cfm <-- Works. All our cold-swap chassis which work fine are Gigabyte GA-M61PM-S2 (rev 2.0) boards for the most part :)
(In reply to comment #2) > Just a random idea before we dig further, have you tried disabling > CONFIG_PCI_MMCONFIG? How about booting with acpi=off? Just checked out both of these options, acpi=off definitely fails and compiling without CONFIG_PCI_MMCONFIG also fails. As an aside, I have tried 'noapic' and 'irqpoll' too, just on the off chance... None of the usual suspects seem to make my box boot properly. Tomorrow I'm going to give it a shot with a standard x86 kernel to try and isolate whether this issue is x86_64 specific or not :)
Created attachment 134747 [details] working boot sequence from a 2021M chassis Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Created attachment 134749 [details] gentoo 2.6.23 kernel configuration Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Created attachment 134750 [details] debian kernel configuration Not sure how long rafb.net preserves pastes, so I'll attach to the bug.
Other random suggestions: try compiling forcedeth as a module and seeing if the dma_64bit=0, msi=0 and/or msix=0 parameters help.
(In reply to comment #8) > Other random suggestions: try compiling forcedeth as a module and seeing if the > dma_64bit=0, msi=0 and/or msix=0 parameters help. Thanks for the suggestions. Removing the [*] entirely makes the system boot fine, except it won't be able to get a DHCP lease and thus cannot mount the NFS mount and continue booting :) Therefore this seems to suggest it's definitely a problem with forcedeth, as opposed to whatever might be loading straight after it. . . Compiling as a module and even letting genkernel do it's thing and using an initrd yields much the same result - the system boots but I can't use nfsroot. I'll try those other parameters in a second, when I arrive at work. :-)
> > I'll try those other parameters in a second, when I arrive at work. :-) > When I arrived at work the box was booted, and very roughly judging by uptime, I'd have to say it hung as stated above for 2-3 hours. dmesg output as requested: Linux version 2.6.23-gentoo (root@archangel.0wn3d.us) (gcc version 4.1.2 (Gentoo 4.1.2 p1.0.1)) #2 SMP Tue Oct 30 21:55:45 GMT 2007 Command line: console=ttyS0,115200 acpi=off ip=dhcp root=/dev/nfs nfsroot=80.68.87.201:/mirror/bootstrap/netboot/gentoo-amd64,tcp,nolock,nfsvers=3 BOOT_IMAGE=pxelinux.misc/vmlinuz-2.6.23-gentoo BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000dffd0000 (usable) BIOS-e820: 00000000dffd0000 - 00000000dffde000 (ACPI data) BIOS-e820: 00000000dffde000 - 00000000e0000000 (ACPI NVS) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000200000000 (usable) Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 917456) 1 entries of 3200 used Entering add_active_range(0, 1048576, 2097152) 2 entries of 3200 used end_pfn_map = 2097152 DMI present. Scanning NUMA topology in Northbridge 24 CPU has 2 num_cores No NUMA configuration found Faking a node at 0000000000000000-0000000200000000 Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 917456) 1 entries of 3200 used Entering add_active_range(0, 1048576, 2097152) 2 entries of 3200 used Bootmem setup node 0 0000000000000000-0000000200000000 Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 2097152 Movable zone start PFN for each node early_node_map[3] active PFN ranges 0: 0 -> 159 0: 256 -> 917456 0: 1048576 -> 2097152 On node 0 totalpages: 1965935 DMA zone: 56 pages used for memmap DMA zone: 1674 pages reserved DMA zone: 2269 pages, LIFO batch:0 DMA32 zone: 14280 pages used for memmap DMA32 zone: 899080 pages, LIFO batch:31 Normal zone: 14336 pages used for memmap Normal zone: 1034240 pages, LIFO batch:31 Movable zone: 0 pages used for memmap Nvidia board detected. Ignoring ACPI timer override. If you got timer trouble try acpi_use_timer_override Intel MultiProcessor Specification v1.4 MPTABLE: OEM ID: nVidia MPTABLE: Product ID: MCP55 MPTABLE: APIC at: 0xFEE00000 Processor #0 (Bootup-CPU) Processor #1 I/O APIC #2 at 0xFEC00000. Setting APIC routing to flat Processors: 2 swsusp: Registered nosave memory region: 000000000009f000 - 00000000000a0000 swsusp: Registered nosave memory region: 00000000000a0000 - 00000000000e4000 swsusp: Registered nosave memory region: 00000000000e4000 - 0000000000100000 swsusp: Registered nosave memory region: 00000000dffd0000 - 00000000dffde000 swsusp: Registered nosave memory region: 00000000dffde000 - 00000000e0000000 swsusp: Registered nosave memory region: 00000000e0000000 - 00000000fec00000 swsusp: Registered nosave memory region: 00000000fec00000 - 00000000fec01000 swsusp: Registered nosave memory region: 00000000fec01000 - 00000000fee00000 swsusp: Registered nosave memory region: 00000000fee00000 - 00000000fef00000 swsusp: Registered nosave memory region: 00000000fef00000 - 00000000ff780000 swsusp: Registered nosave memory region: 00000000ff780000 - 0000000100000000 Allocating PCI resources starting at e2000000 (gap: e0000000:1ec00000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 35048 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 1935589 Policy zone: Normal Kernel command line: console=ttyS0,115200 acpi=off ip=dhcp root=/dev/nfs nfsroot=80.68.87.201:/mirror/bootstrap/netboot/gentoo-amd64,tcp,nolock,nfsvers=3 BOOT_IMAGE=pxelinux.misc/vmlinuz-2.6.23-gentoo Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Marking TSC unstable due to TSCs unsynchronized time.c: Detected 2613.450 MHz processor. Console: colour VGA+ 80x25 console [ttyS0] enabled Checking aperture... CPU 0: aperture @ 7fc4000000 size 32 MB Aperture too small (32 MB) No AGP bridge found Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ 4000000 Memory: 7676228k/8388608k available (3400k kernel code, 187512k reserved, 1911k data, 328k init) Calibrating delay using timer specific routine.. 5231.04 BogoMIPS (lpj=10462093) Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 0/0 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 SMP alternatives: switching to UP code ExtINT not setup in hardware but reported by MP table Using local APIC timer interrupts. result 12564682 Detected 12.564 MHz APIC timer. SMP alternatives: switching to SMP code Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 5227.15 BogoMIPS (lpj=10454309) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU 1/1 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 AMD Processor model unknown stepping 03 Brought up 2 CPUs NET: Registered protocol family 16 PCI: Using configuration type 1 ACPI: Interpreter disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI: disabled SCSI subsystem initialized libata version 2.21 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Probing PCI hardware PCI: Probing PCI hardware (bus 00) PCI: Transparent bridge - 0000:00:06.0 PCI: Using IRQ router default [10de/0364] at 0000:00:01.0 PCI->APIC IRQ transform: 0000:00:01.1[A] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:02.0[A] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:02.1[B] -> IRQ 11 PCI->APIC IRQ transform: 0000:00:05.0[A] -> IRQ 5 PCI->APIC IRQ transform: 0000:00:05.1[B] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:05.2[C] -> IRQ 10 PCI->APIC IRQ transform: 0000:00:08.0[A] -> IRQ 11 PCI->APIC IRQ transform: 0000:00:09.0[A] -> IRQ 5 PCI->APIC IRQ transform: 0000:01:06.0[A] -> IRQ 10 PCI-DMA: Disabling AGP. PCI-DMA: aperture base @ 4000000 size 65536 KB PCI-DMA: using GART IOMMU. PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture PCI: Bridge: 0000:00:06.0 IO window: d000-dfff MEM window: fea00000-feafffff PREFETCH window: f0000000-f7ffffff PCI: Bridge: 0000:00:0a.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0b.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0c.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0d.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0e.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0f.0 IO window: e000-efff MEM window: feb00000-febfffff PREFETCH window: fc000000-fdffffff PCI: Setting latency timer of device 0000:00:06.0 to 64 PCI: Setting latency timer of device 0000:00:0a.0 to 64 PCI: Setting latency timer of device 0000:00:0b.0 to 64 PCI: Setting latency timer of device 0000:00:0c.0 to 64 PCI: Setting latency timer of device 0000:00:0d.0 to 64 PCI: Setting latency timer of device 0000:00:0e.0 to 64 PCI: Setting latency timer of device 0000:00:0f.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) TCP established hash table entries: 1048576 (order: 12, 25165824 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 1048576 bind 65536) TCP reno registered Total HugeTLB memory allocated, 0 Installing knfsd (copyright (C) 1996 okir@monad.swb.de). io scheduler noop registered io scheduler deadline registered io scheduler cfq registered (default) Boot video device is 0000:01:06.0 PCI: Setting latency timer of device 0000:00:0a.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0a.0:pcie00] PCI: Setting latency timer of device 0000:00:0b.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0b.0:pcie00] PCI: Setting latency timer of device 0000:00:0c.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0c.0:pcie00] PCI: Setting latency timer of device 0000:00:0d.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0d.0:pcie00] PCI: Setting latency timer of device 0000:00:0e.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0e.0:pcie00] PCI: Setting latency timer of device 0000:00:0f.0 to 64 assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0f.0:pcie00] Real Time Clock Driver v1.12ac Linux agpgart interface v0.102 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: module loaded Intel(R) PRO/1000 Network Driver - version 7.3.20-k2 Copyright (c) 1999-2006 Intel Corporation. e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI e100: Copyright(c) 1999-2006 Intel Corporation forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:08.0 PCI: Setting latency timer of device 0000:00:09.0 to 64 forcedeth: using HIGHDMA eth1: forcedeth.c: subsystem: 010de:cb84 bound to 0000:00:09.0 tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> netconsole: not configured, aborting Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx NFORCE-MCP55: IDE controller at PCI slot 0000:00:04.0 NFORCE-MCP55: chipset revision 161 NFORCE-MCP55: not 100% native mode: will probe irqs later NFORCE-MCP55: 0000:00:04.0 (rev a1) UDMA133 controller ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:DMA Probing IDE interface ide0... hdb: CD-224E-N, ATAPI CD/DVD-ROM drive hdb: selected mode 0x42 ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdb: ATAPI 24X CD-ROM drive, 256kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 sata_nv 0000:00:05.0: version 3.5 PCI: Setting latency timer of device 0000:00:05.0 to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0x000000000001c480 ctl 0x000000000001c402 bmdma 0x000000000001bc00 irq 5 ata2: SATA max UDMA/133 cmd 0x000000000001c080 ctl 0x000000000001c002 bmdma 0x000000000001bc08 irq 5 ata1: SATA link down (SStatus 0 SControl 300) ata2: SATA link down (SStatus 0 SControl 300) PCI: Setting latency timer of device 0000:00:05.1 to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0x000000000001b880 ctl 0x000000000001b802 bmdma 0x000000000001b080 irq 10 ata4: SATA max UDMA/133 cmd 0x000000000001b480 ctl 0x000000000001b402 bmdma 0x000000000001b088 irq 10 ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) PCI: Setting latency timer of device 0000:00:05.2 to 64 scsi4 : sata_nv scsi5 : sata_nv ata5: SATA max UDMA/133 cmd 0x000000000001b000 ctl 0x000000000001ac02 bmdma 0x000000000001a480 irq 10 ata6: SATA max UDMA/133 cmd 0x000000000001a880 ctl 0x000000000001a802 bmdma 0x000000000001a488 irq 10 ata5: SATA link down (SStatus 0 SControl 300) ata6: SATA link down (SStatus 0 SControl 300) Fusion MPT base driver 3.04.05 Copyright (c) 1999-2007 LSI Logic Corporation Fusion MPT SPI Host driver 3.04.05 ieee1394: raw1394: /dev/raw1394 device initialized PCI: Setting latency timer of device 0000:00:02.1 to 64 ehci_hcd 0000:00:02.1: EHCI Host Controller ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:02.1: debug port 1 PCI: cache line size of 64 is not supported by device 0000:00:02.1 ehci_hcd 0000:00:02.1: irq 11, io mem 0xfe9fac00 ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 10 ports detected ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver PCI: Setting latency timer of device 0000:00:02.0 to 64 ohci_hcd 0000:00:02.0: OHCI Host Controller ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:00:02.0: irq 10, io mem 0xfe9fb000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 10 ports detected USB Universal Host Controller Interface driver v3.0 usbcore: registered new interface driver usblp Initializing USB Mass Storage driver... usbcore: registered new interface driver usb-storage USB Mass Storage support registered. PNP: No PS/2 controller found. Probing ports directly. serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com usbcore: registered new interface driver usbhid drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver oprofile: using NMI interrupt. TCP cubic registered NET: Registered protocol family 1 NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 powernow-k8: Found 1 AMD Processor model unknown processors (2 cpu cores) (version 2.00.00) powernow-k8: MP systems not supported by PSB BIOS structure powernow-k8: MP systems not supported by PSB BIOS structure eth1: no link during initialization. ADDRCONF(NETDEV_UP): eth1: link is not ready Sending DHCP requests ., OK IP-Config: Got DHCP answer from 89.16.168.148, my address is 89.16.168.221 IP-Config: Complete: device=eth0, addr=89.16.168.221, mask=255.255.255.128, gw=89.16.168.129, host=89.16.168.221, domain=office.bytemark.co.uk, nis-domain=(none), bootserver=89.16.168.148, rootserver=80.68.87.201, rootpath= Looking up port of RPC 100003/3 on 80.68.87.201 Looking up port of RPC 100005/3 on 80.68.87.201 VFS: Mounted root (nfs filesystem) readonly. Freeing unused kernel memory: 328k freed eth0: no IPv6 routers present
(In reply to comment #9) > (In reply to comment #8) > > Other random suggestions: try compiling forcedeth as a module and seeing if the > > dma_64bit=0, msi=0 and/or msix=0 parameters help. dma_64bit=0 <<-- fails msi=0 <<-- fails msix=0 <<-- fails msi=0 + msix=0 <<-- fails Looks like all of those fail.
2 more things to try: In drivers/net/forcedeth.c, near the top of the file, you see: #if 0 #define dprintk printk change that to "#if 1" Secondly, compile your kernel with CONFIG_MAGIC_SYSRQ, and when the freeze happens, press some of the sysrq keys like: alt+sysrq+t alt+sysrq+b Does the system respond in any way?
(In reply to comment #12) > 2 more things to try: [.snip...] > Secondly, compile your kernel with CONFIG_MAGIC_SYSRQ, and when the freeze > happens, press some of the sysrq keys like: > alt+sysrq+t > alt+sysrq+b > > Does the system respond in any way? Already had this one enabled as it's mighty useful with the serial line ;) Unfortunately neither of these has any effect. Going to make that change to forcedeth.c now and will update again shortly.
(In reply to comment #12) > In drivers/net/forcedeth.c, near the top of the file, you see: > > #if 0 > #define dprintk printk > > change that to "#if 1" > Seems to cause a warning during compile: CC drivers/net/forcedeth.o drivers/net/forcedeth.c: In function ‘nv_probe’: drivers/net/forcedeth.c:5038: warning: format ‘%ld’ expects type ‘long int’, but argument 5 has type ‘resource_size_t’ Boot sequence is the same except there's a bit more output: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 forcedeth: using HIGHDMA 0000:00:08.0: link timer on. 0000:00:08.0: mgmt unit is running. mac in use 40000000. Does that help at all?
With timing information as requested: [ 35.309834] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. [ 35.317568] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 [ 35.323305] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 [ 35.332026] forcedeth: using HIGHDMA [ 35.335620] 0000:00:08.0: link timer on. [ 35.339537] 0000:00:08.0: mgmt unit is running. mac in use 40000000. Now I'll wait until it gets past this point and hopefully we'll know how long it took to eventually try and boot ;)
(In reply to comment #15) > With timing information as requested: > > [ 35.309834] forcedeth.c: Reverse Engineered nForce ethernet driver. Version > 0.60. > [ 35.317568] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 > [ 35.323305] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 > (level, low) -> IRQ 23 > [ 35.332026] forcedeth: using HIGHDMA > [ 35.335620] 0000:00:08.0: link timer on. > [ 35.339537] 0000:00:08.0: mgmt unit is running. mac in use 40000000. > > Now I'll wait until it gets past this point and hopefully we'll know how long > it took to eventually try and boot ;) Never seemed to get past this point this time, or at least, it's still there!
Created attachment 135255 [details, diff] debug patch OK. Kill that session then, and apply this patch. It adds in more debug messages, please see where it stops now.
(In reply to comment #17) > Created an attachment (id=135255) [edit] > debug patch > > OK. Kill that session then, and apply this patch. > It adds in more debug messages, please see where it stops now. [ 35.440104] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60. [ 35.447865] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 23 [ 35.453601] ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 23 (level, low) -> IRQ 23 [ 35.462315] forcedeth: using HIGHDMA [ 35.465909] 0000:00:08.0: link timer on. [ 35.469825] 0000:00:08.0: mgmt unit is running. mac in use 40000000. [ 35.476160] in loop, i=0 [ 40.518953] in loop, i=1 [ 45.560230] in loop, i=2 [ 50.601507] in loop, i=3 [ 55.642784] in loop, i=4 [ 60.684062] in loop, i=5 [ 65.725339] in loop, i=6 [ 70.766616] in loop, i=7 [ 75.807894] in loop, i=8 [ 80.849171] in loop, i=9 [ 85.890448] in loop, i=10 ... I presume this continues for quite some time........ Probably explains why it eventually boots if left to it's own devices forever :)
I altered the number of loop iterations from 5000 to zero, which caused the system to boot perfectly happy; along the way it generated about 30,000 lines of output judging by my serial console log, most of which was like: [ 45.962209] 000:<7>eth0: nv_nic_irq_optimized [ 45.968062] 00 00 5e 00 01 96 00 30 48 60 8d 54 08 00 45 00 [ 45.973786] 010:<7>eth0: nv_nic_irq_optimized [ 45.978341] 00 94 26 fa 40 00 40 06 69 64 59 10 a8 e8 50 44 [ 45.984070] 020:<7>eth0: nv_nic_irq_optimized [ 45.988623] 57 c9 92 21 00 6f 78 13 c1 df ff c0 43 c7 80 18 [ 45.994352] 030:<7>eth0: nv_nic_irq_optimized [ 45.998905] 00 2e aa 8c 00 00 01 01 08 0a ff fe e7 5a 02 ff [ 46.027898] 000: 00 30 48 60 8d 54 00 1c 58 09 86 e4 08 00 45 00 [ 46.033988] 010: 00 34 e0 81 40 00 3c 06 b4 3c 50 44 57 c9 59 10 [ 46.040266] 020: a8 e8 00 6f 92 21 ff c0 43 c7 78 13 c2 3f 80 10 [ 46.046545] 030: 00 2e 2e 73 00 00 01 01 08 0a 02 ff a3 51 ff fe [ 46.053018] 000: 00 30 48 60 8d 54 00 1c 58 09 86 e4 08 00 45 00 [ 46.059119] 010: 00 54 e0 82 40 00 3c 06 b4 1b 50 44 57 c9 59 10 [ 46.065406] 020: a8 e8 00 6f 92 21 ff c0 43 c7 78 13 c2 3f 80 18 [ 46.071694] 030: 00 2e 2d 91 00 00 01 01 08 0a 02 ff a3 51 ff fe Eventually it ended up on my network boot environment, yipppeee! Disabling the printk made it boot up just fine as well, albeit without the 30,000 lines of spam ;) My kernel is configured to grab a DHCP lease on boot, and then mount an nfsroot - so I can only assume the NIC is working okay!
OK. please now open an upstream bug against 2.6.24-rc1 at http://bugzilla.kernel.org against forcedeth, titled "forcedeth causes 7 hour boot delay" Give a brief description of the problem, and the dmesg from comment #14. Leave it at that - I will add the details in a comment immediately after you file it. Please post the new bug URL here when done.
http://bugzilla.kernel.org/show_bug.cgi?id=9308
upstream patch is merged into netdev tree
fixed in gentoo-sources-2.6.23-r3, thanks for your help digging into this one