Dell PowerEdge 1400SC, Server Works Chipset ServerWorks CNB20LE Fujitsu MAN318MP (18.2GB 10KRPM Ultra 160) Adaptec AIC-7899P U160 onboard dual channel SCSI Formatting large partitions when using the SMP kernel on the pentium3 Live CD results in hard lock up. This was reproduced several times using mkraid, mkreiserfs, mke2fs. Occurs with and without framebuffer. When running non-smp kernel, formats fine. Searched the Kernel mailing lists and googled on the symptoms but didn't see anything that caught my eye. Reproducible: Always Steps to Reproduce: 1. Boot pentium live CD with SMP mode and nodma=ide (serverworks bug aparently) 2. Create 3 partition on a SCSI drive, sda1=50M, sda2=512M, sda3=rest 3. format sda1 as ext2, mkswap on sda2, format sda3 as any filesystem and it will lock up. Rebooting and attempting to just format sda3 again results in another lockup. Actual Results: Console and network freezes, system must be manually rebooted. Expected Results: Format partition with out issue output of dmesg: Linux version 2.4.20-xfs-r3 (root@ToyRoom) (gcc version 3.2.1 20021207 (Gentoo Linux 3.2.1-20021207)) #1 SMP Wed Jul 23 02:32:07 Local time zone must be set-- see zic BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001fffe000 (usable) BIOS-e820: 000000001fffe000 - 0000000020000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) 511MB LOWMEM available. ACPI: have wakeup address 0xc0002000 found SMP MP-table at 000fe710 hm, page 000fe000 reserved twice. hm, page 000ff000 reserved twice. hm, page 000f0000 reserved twice. On node 0 totalpages: 131070 zone(0): 4096 pages. zone(1): 126974 pages. zone(2): 0 pages. Intel MultiProcessor Specification v1.4 Virtual Wire compatibility mode. OEM ID: DELL Product ID: POWEREDGE CE APIC at: 0xFEE00000 Processor #1 Pentium(tm) Pro APIC version 17 Processor #0 Pentium(tm) Pro APIC version 17 I/O APIC #2 Version 17 at 0xFEC00000. I/O APIC #3 Version 17 at 0xFEC01000. Processors: 2 Kernel command line: splash=silent vga=791 initrd=initrd2.1024 acpi=off root=/dev/ram0 init=/linuxrc nomce BOOT_IMAGE=smp nofb ide=nodma bootsplash: silent mode. ide_setup: ide=nodmaIDE: Prevented DMA Initializing CPU#0 Detected 931.032 MHz processor. Console: colour dummy device 80x25 Calibrating delay loop... 1854.66 BogoMIPS Memory: 509836k/524280k available (2064k kernel code, 14060k reserved, -2596k data, 128k init, 0k highmem) Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) Inode cache hash table entries: 32768 (order: 6, 262144 bytes) Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes) Page-cache hash table entries: 131072 (order: 7, 524288 bytes) Proc Config support by ptb@it.uc3m.es proc config counted 6977 bytes in names proc config counted 765 bytes in value handles CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. Checking for popad bug... OK. POSIX conformance testing by UNIFIX mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au) mtrr: detected mtrr type: Intel CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 CPU0: Intel Pentium III (Coppermine) stepping 0a per-CPU timeslice cutoff: 731.28 usecs. enabled ExtINT on CPU#0 ESR value before enabling vector: 00000040 ESR value after enabling vector: 00000000 Booting processor 1/0 eip 3000 Initializing CPU#1 masked ExtINT on CPU#1 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Calibrating delay loop... 1861.22 BogoMIPS CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 CPU1: Intel Pentium III (Coppermine) stepping 0a Total of 2 processors activated (3715.89 BogoMIPS). ENABLING IO-APIC IRQs Setting 2 in the phys_id_present_map ...changing IO-APIC physical APIC ID to 2 ... ok. Setting 3 in the phys_id_present_map ...changing IO-APIC physical APIC ID to 3 ... ok. init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-2, 2-5, 2-11, 2-13, 3-13 not connected. ..TIMER: vector=0x31 pin1=-1 pin2=0 ...trying to set up timer (IRQ0) through the 8259A ... ..... (found pin 0) ...works. number of MP IRQ sources: 39. number of IO-APIC #2 registers: 16. number of IO-APIC #3 registers: 16. testing the IO APIC....................... IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 .... register #01: 000F0011 ....... : max redirection entries: 000F ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 003 03 0 0 0 0 0 1 1 31 01 003 03 0 0 0 0 0 1 1 39 02 000 00 1 0 0 0 0 0 0 00 03 003 03 0 0 0 0 0 1 1 41 04 003 03 0 0 0 0 0 1 1 49 05 000 00 1 0 0 0 0 0 0 00 06 003 03 0 0 0 0 0 1 1 51 07 003 03 0 0 0 0 0 1 1 59 08 003 03 0 0 0 0 0 1 1 61 09 003 03 0 0 0 0 0 1 1 69 0a 003 03 1 1 0 1 0 1 1 71 0b 000 00 1 0 0 0 0 0 0 00 0c 003 03 0 0 0 0 0 1 1 79 0d 000 00 1 0 0 0 0 0 0 00 0e 003 03 0 0 0 0 0 1 1 81 0f 003 03 0 0 0 0 0 1 1 89 IO APIC #3...... .... register #00: 03000000 ....... : physical APIC id: 03 .... register #01: 000F0011 ....... : max redirection entries: 000F ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 0D000000 ....... : arbitration: 0D .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 003 03 1 1 0 1 0 1 1 91 01 003 03 1 1 0 1 0 1 1 99 02 003 03 1 1 0 1 0 1 1 A1 03 003 03 1 1 0 1 0 1 1 A9 04 003 03 1 1 0 1 0 1 1 B1 05 003 03 1 1 0 1 0 1 1 B9 06 003 03 1 1 0 1 0 1 1 C1 07 003 03 1 1 0 1 0 1 1 C9 08 003 03 1 1 0 1 0 1 1 D1 09 003 03 1 1 0 1 0 1 1 D9 0a 003 03 1 1 0 1 0 1 1 E1 0b 003 03 1 1 0 1 0 1 1 E9 0c 003 03 1 1 0 1 0 1 1 32 0d 000 00 1 0 0 0 0 0 0 00 0e 003 03 1 1 0 1 0 1 1 3A 0f 003 03 1 1 0 1 0 1 1 42 IRQ to pin mappings: IRQ0 -> 0:0 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9 IRQ10 -> 0:10 IRQ12 -> 0:12 IRQ14 -> 0:14 IRQ15 -> 0:15 IRQ16 -> 1:0 IRQ17 -> 1:1 IRQ18 -> 1:2 IRQ19 -> 1:3 IRQ20 -> 1:4 IRQ21 -> 1:5 IRQ22 -> 1:6 IRQ23 -> 1:7 IRQ24 -> 1:8 IRQ25 -> 1:9 IRQ26 -> 1:10 IRQ27 -> 1:11 IRQ28 -> 1:12 IRQ30 -> 1:14 IRQ31 -> 1:15 .................................... done. Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 930.9703 MHz. ..... host bus clock speed is 132.9956 MHz. cpu: 0, clocks: 1329956, slice: 443318 CPU0<T0:1329952,T1:886624,D:10,S:443318,C:1329956> cpu: 1, clocks: 1329956, slice: 443318 CPU1<T0:1329952,T1:443312,D:4,S:443318,C:1329956> checking TSC synchronization across CPUs: passed. Waiting on wait_init_idle (map = 0x2) All processors have done init_idle ACPI: Subsystem revision 20021212 ACPI: Disabled via command line (acpi=off) PCI: PCI BIOS revision 2.10 entry at 0xfc7ce, last bus=1 PCI: Using configuration type 1 PCI: Probing PCI hardware PCI: ACPI tables contain no PCI IRQ routing entries PCI: Probing PCI hardware (bus 00) PCI: Discovered primary peer bus 01 [IRQ] PCI: Using IRQ router ServerWorks [1166/0200] at 00:0f.0 PCI->APIC IRQ transform: (B0,I2,P0) -> 16 PCI->APIC IRQ transform: (B1,I2,P0) -> 30 PCI->APIC IRQ transform: (B1,I2,P1) -> 31 Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd Journalled Block Device driver loaded devfs: v1.12c (20020818) Richard Gooch (rgooch@atnf.csiro.au) devfs: boot_options: 0x0 SGI XFS snapshot-xfs-2.4.20-2003-04-07_05:19_UTC with ACLs, realtime, no debug enabled SGI XFS Quota Management subsystem vesafb: framebuffer at 0xfc000000, mapped to 0xe080d000, size 4096k vesafb: mode is 1024x768x16, linelength=2048, pages=1 vesafb: protected mode interface info at c000:4a6c vesafb: scrolling: redraw vesafb: directcolor: size=0:5:6:5, shift=0:11:5:0 Looking for splash picture.... silenjpeg size 70780 bytes, found (1024x768, 20089 bytes, v3). Got silent jpeg. Got silent jpeg. Console: switching to colour frame buffer device 122x39 fb0: VESA VGA frame buffer device pty: 256 Unix98 ptys configured Real Time Clock Driver v1.10e Uniform Multi-Platform E-IDE driver Revision: 6.31 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx SvrWks OSB4: IDE controller on PCI bus 00 dev 79 SvrWks OSB4: chipset revision 0 SvrWks OSB4: not 100% native mode: will probe irqs later SvrWks OSB4: default first interface base=0x01f0, second interface base=0x170 ide0: BM-DMA at 0x08b0-0x08b7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0x08b8-0x08bf, BIOS settings: hdc:pio, hdd:pio hda: Hewlett-Packard CD-Writer Plus 9500, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: ATAPI 32X CD-ROM CD-R/RW drive, 4096kB Cache Uniform CD-ROM driver Revision: 3.12 RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize Equalizer1996: $Revision: 1.2.1 $ $Date: 1996/09/22 13:52:00 $ Simon Janes (simon@ncm.com) SCSI subsystem driver Revision: 1.00 kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2 kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2 kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2 mice: PS/2 mouse device common for all mice NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP IP: routing cache hash table of 4096 buckets, 32Kbytes TCP: Hash tables configured (established 32768 bind 32768) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. IPv6 v0.8 for NET4.0 IPv6 over IPv4 tunneling driver RAMDISK: ext2 filesystem found at block 0 RAMDISK: Loading 5000 blocks [1 disk] into ram disk... |/-\|/-\|/-\|/-\|/-\|/- \|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/- \|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/- \|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/- \|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|done. Freeing initrd memory: 5088k freed VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 128k freed usb.c: registered new driver usbdevfs usb.c: registered new driver hub uhci.c: USB Universal Host Controller Interface driver v1.1 usb-ohci.c: USB OHCI at membase 0xe0c3f000, IRQ 10 usb-ohci.c: usb-00:0f.2, ServerWorks OSB4/CSB5 OHCI USB Controller usb.c: new USB bus registered, assigned bus number 1 hub.c: USB hub found hub.c: 2 ports detected usb.c: registered new driver hid hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz> hid-core.c: USB HID support drivers Initializing USB Mass Storage driver... usb.c: registered new driver usb-storage USB Mass Storage support registered. ISO 9660 Extensions: Microsoft Joliet Level 3 ISO 9660 Extensions: RRIP_1991A cloop: Welcome to cloop v0.68 cloop: /newroot/mnt/cdrom/livecd.cloop: 3283 blocks, 65536 bytes/block, largest block is 65562 bytes. cloop: loaded (max 1 devices) isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A inserting floppy driver for 2.4.20-xfs-r3 Floppy drive(s): fd0 is 1.44M FDC 0 is a National Semiconductor PC87306 scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.32 <Adaptec aic7899 Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.32 <Adaptec aic7899 Ultra160 SCSI adapter> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit) (scsi1:A:1): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit) Vendor: FUJITSU Model: MAN3184MP Rev: 5507 Type: Direct-Access ANSI SCSI revision: 03 scsi0:A:0:0: Tagged Queuing enabled. Depth 253 Vendor: FUJITSU Model: MAN3184MP Rev: 5507 Type: Direct-Access ANSI SCSI revision: 03 scsi1:A:1:0: Tagged Queuing enabled. Depth 253 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi disk sdb at scsi1, channel 0, id 1, lun 0 SCSI device sda: 35566478 512-byte hdwr sectors (18210 MB) Partition check: /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 SCSI device sdb: 35566478 512-byte hdwr sectors (18210 MB) /dev/scsi/host1/bus0/target1/lun0: p1 p2 p3 eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others eth0: Intel Corp. 82557/8/9 [Ethernet Pro 100], 00:B0:D0:FC:22:A9, IRQ 16. Receiver lock-up bug exists -- enabling work-around. Board assembly 07195d-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). Receiver lock-up workaround activated. usb-uhci.c: $Revision: 1.275 $ time 02:45:27 Jul 23 2003 usb-uhci.c: High bandwidth mode enabled usb-uhci.c: v1.275:USB Universal Host Controller Interface driver uhci.c: USB Universal Host Controller Interface driver v1.1 Looking for splash picture....<6>Looking for splash picture....<6>error 14 while decompressing picture. found (1024x768, 20089 bytes, v3). Splash status on console 0 changed to on Looking for splash picture.... found (1024x768, 20089 bytes, v3). Splash status on console 0 changed to on Looking for splash picture.... found (1024x768, 20089 bytes, v3). Splash status on console 0 changed to on eth0: no IPv6 routers present Looking for splash picture.... found (1024x768, 20089 bytes, v3). Splash status on console 0 changed to on
Again here's a related one from my same outage incident that I'll second, I've seen a similar but different problem which is probably related. The hardware in question is a dual P3-600 (440GX, old VA Linux ClusterCity nodes). The non-smp kernel doesn't boot at all, but that's acceptable. The "smp" kernel on the x86 livecds works fine for all purposes. The "pentium3" livecd's smp kernel, however, does a hard lockup with no panic message when I raidstart any software raid5 device. Since x86 works fine and p3 has the bug, all other factors being identical, I'm guessing your p3 optimizations on the smp kernel are causing some stability problems.
Well, non-smp works fine for me. I did the install to stage-3 with the non-smp kernel. Then built my kernel with the linux-gs sources using the P3 process opts and SMP. I need to run a diff of my .config and the stock gentoo to see what the differnce is because it runs fine now. Just the boot kernel. I'll run the diffs tonight and report back. -Bill
I've experienced strangely similar lockup problems with my Poweredge 4300 (440GX) and a Unisys ES5045 (440NX). No problems were encountered during any phase of compiling when either system was booted off of the Livecd, but after rebooting off the compiled kernel the system is subject to random freezes with no errors/debugging messages. The bug seems most likely to strike during compile jobs for me. On the PE4300, rebooting with a single processor and a terminator resulted in clean running. This could potentially be a problem only encountered with 440-series chipsets.
Correction: The ES5045 uses a 450NX chipset, not a 440NX.
similar problems on a Dell Precision 420, AIC-7899 dual channel and smp after booting smp from the cd #rmmod aic7xxx #modprobe aic7xxx_old Seems that are lingering problems with smp and aic7xxx.
I had the same problem with a box I'm building from old parts. The card is an AHA-2940, and aic7xxx would reliably lock up the box under an smp kernel. 2.6.0 from kernel.org seems to be immune. Also, Mandrake 9.2 does not have this problem. I think version 6.2.33 of the aic7xxx driver has this problem (which is in the gentoo sources on the 20030911 cds), but 6.2.35 (in 2.6.0) and 6.2.36 (in gentoo 2.4.22-r1, and in Mandrake 9.2) are immune.
Do newer kernels without the aic patch help this?
Yes, I have the same. Everytime lockup when trying to format the /dev/sda3 when formating with normal-uniprocessor-kernel the job is done (expectably) when rebooted with smp-kernel the #mount command for the third partition goes into nirvana (see #42652 my actual report)