Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 26141 - System locks solid while formatting SCSI Drives in SMP Kernel
Summary: System locks solid while formatting SCSI Drives in SMP Kernel
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Release Media
Classification: Unclassified
Component: Everything (show other bugs)
Hardware: x86 Linux
: High blocker (vote)
Assignee: Bob Johnson (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-08-07 10:55 UTC by Bill Church
Modified: 2004-02-23 15:42 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Church 2003-08-07 10:55:33 UTC
Dell PowerEdge 1400SC, Server Works Chipset
ServerWorks CNB20LE
Fujitsu MAN318MP (18.2GB 10KRPM Ultra 160)
Adaptec AIC-7899P U160 onboard dual channel SCSI

Formatting large partitions when using the SMP kernel on the pentium3 Live CD 
results in hard lock up. This was reproduced several times using mkraid, 
mkreiserfs, mke2fs.

Occurs with and without framebuffer.

When running non-smp kernel, formats fine. Searched the Kernel mailing lists 
and googled on the symptoms but didn't see anything that caught my eye.

Reproducible: Always
Steps to Reproduce:
1. Boot pentium live CD with SMP mode and nodma=ide (serverworks bug aparently)
2. Create 3 partition on a SCSI drive, sda1=50M, sda2=512M, sda3=rest
3. format sda1 as ext2, mkswap on sda2, format sda3 as any filesystem and it 
will lock up. Rebooting and attempting to just format sda3 again results in 
another lockup.

Actual Results:  
Console and network freezes, system must be manually rebooted.

Expected Results:  
Format partition with out issue

output of dmesg:
Linux version 2.4.20-xfs-r3 (root@ToyRoom) (gcc version 3.2.1 20021207 (Gentoo 
Linux 3.2.1-20021207)) #1 SMP Wed Jul 23 02:32:07 Local time zone must be set--
see zic 
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fffe000 (usable)
 BIOS-e820: 000000001fffe000 - 0000000020000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
511MB LOWMEM available.
ACPI: have wakeup address 0xc0002000
found SMP MP-table at 000fe710
hm, page 000fe000 reserved twice.
hm, page 000ff000 reserved twice.
hm, page 000f0000 reserved twice.
On node 0 totalpages: 131070
zone(0): 4096 pages.
zone(1): 126974 pages.
zone(2): 0 pages.
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: DELL     Product ID: POWEREDGE CE APIC at: 0xFEE00000
Processor #1 Pentium(tm) Pro APIC version 17
Processor #0 Pentium(tm) Pro APIC version 17
I/O APIC #2 Version 17 at 0xFEC00000.
I/O APIC #3 Version 17 at 0xFEC01000.
Processors: 2
Kernel command line: splash=silent vga=791 initrd=initrd2.1024 acpi=off 
root=/dev/ram0 init=/linuxrc nomce BOOT_IMAGE=smp nofb ide=nodma
bootsplash: silent mode.
ide_setup: ide=nodmaIDE: Prevented DMA
Initializing CPU#0
Detected 931.032 MHz processor.
Console: colour dummy device 80x25
Calibrating delay loop... 1854.66 BogoMIPS
Memory: 509836k/524280k available (2064k kernel code, 14060k reserved, -2596k 
data, 128k init, 0k highmem)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
Proc Config support by ptb@it.uc3m.es
proc config counted 6977 bytes in names
proc config counted 765 bytes in value handles
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
Checking for popad bug... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
CPU0: Intel Pentium III (Coppermine) stepping 0a
per-CPU timeslice cutoff: 731.28 usecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000040
ESR value after enabling vector: 00000000
Booting processor 1/0 eip 3000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 1861.22 BogoMIPS
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
CPU1: Intel Pentium III (Coppermine) stepping 0a
Total of 2 processors activated (3715.89 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
Setting 3 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 3 ... ok.
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-2, 2-5, 2-11, 2-13, 3-13 not connected.
..TIMER: vector=0x31 pin1=-1 pin2=0
...trying to set up timer (IRQ0) through the 8259A ... 
..... (found pin 0) ...works.
number of MP IRQ sources: 39.
number of IO-APIC #2 registers: 16.
number of IO-APIC #3 registers: 16.
testing the IO APIC.......................

IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 003 03  0    0    0   0   0    1    1    31
 01 003 03  0    0    0   0   0    1    1    39
 02 000 00  1    0    0   0   0    0    0    00
 03 003 03  0    0    0   0   0    1    1    41
 04 003 03  0    0    0   0   0    1    1    49
 05 000 00  1    0    0   0   0    0    0    00
 06 003 03  0    0    0   0   0    1    1    51
 07 003 03  0    0    0   0   0    1    1    59
 08 003 03  0    0    0   0   0    1    1    61
 09 003 03  0    0    0   0   0    1    1    69
 0a 003 03  1    1    0   1   0    1    1    71
 0b 000 00  1    0    0   0   0    0    0    00
 0c 003 03  0    0    0   0   0    1    1    79
 0d 000 00  1    0    0   0   0    0    0    00
 0e 003 03  0    0    0   0   0    1    1    81
 0f 003 03  0    0    0   0   0    1    1    89

IO APIC #3......
.... register #00: 03000000
.......    : physical APIC id: 03
.... register #01: 000F0011
.......     : max redirection entries: 000F
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 0D000000
.......     : arbitration: 0D
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 003 03  1    1    0   1   0    1    1    91
 01 003 03  1    1    0   1   0    1    1    99
 02 003 03  1    1    0   1   0    1    1    A1
 03 003 03  1    1    0   1   0    1    1    A9
 04 003 03  1    1    0   1   0    1    1    B1
 05 003 03  1    1    0   1   0    1    1    B9
 06 003 03  1    1    0   1   0    1    1    C1
 07 003 03  1    1    0   1   0    1    1    C9
 08 003 03  1    1    0   1   0    1    1    D1
 09 003 03  1    1    0   1   0    1    1    D9
 0a 003 03  1    1    0   1   0    1    1    E1
 0b 003 03  1    1    0   1   0    1    1    E9
 0c 003 03  1    1    0   1   0    1    1    32
 0d 000 00  1    0    0   0   0    0    0    00
 0e 003 03  1    1    0   1   0    1    1    3A
 0f 003 03  1    1    0   1   0    1    1    42
IRQ to pin mappings:
IRQ0 -> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ12 -> 0:12
IRQ14 -> 0:14
IRQ15 -> 0:15
IRQ16 -> 1:0
IRQ17 -> 1:1
IRQ18 -> 1:2
IRQ19 -> 1:3
IRQ20 -> 1:4
IRQ21 -> 1:5
IRQ22 -> 1:6
IRQ23 -> 1:7
IRQ24 -> 1:8
IRQ25 -> 1:9
IRQ26 -> 1:10
IRQ27 -> 1:11
IRQ28 -> 1:12
IRQ30 -> 1:14
IRQ31 -> 1:15
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 930.9703 MHz.
..... host bus clock speed is 132.9956 MHz.
cpu: 0, clocks: 1329956, slice: 443318
CPU0<T0:1329952,T1:886624,D:10,S:443318,C:1329956>
cpu: 1, clocks: 1329956, slice: 443318
CPU1<T0:1329952,T1:443312,D:4,S:443318,C:1329956>
checking TSC synchronization across CPUs: passed.
Waiting on wait_init_idle (map = 0x2)
All processors have done init_idle
ACPI: Subsystem revision 20021212
ACPI: Disabled via command line (acpi=off)
PCI: PCI BIOS revision 2.10 entry at 0xfc7ce, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: ACPI tables contain no PCI IRQ routing entries
PCI: Probing PCI hardware (bus 00)
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Using IRQ router ServerWorks [1166/0200] at 00:0f.0
PCI->APIC IRQ transform: (B0,I2,P0) -> 16
PCI->APIC IRQ transform: (B1,I2,P0) -> 30
PCI->APIC IRQ transform: (B1,I2,P1) -> 31
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Journalled Block Device driver loaded
devfs: v1.12c (20020818) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x0
SGI XFS snapshot-xfs-2.4.20-2003-04-07_05:19_UTC with ACLs, realtime, no debug 
enabled
SGI XFS Quota Management subsystem
vesafb: framebuffer at 0xfc000000, mapped to 0xe080d000, size 4096k
vesafb: mode is 1024x768x16, linelength=2048, pages=1
vesafb: protected mode interface info at c000:4a6c
vesafb: scrolling: redraw
vesafb: directcolor: size=0:5:6:5, shift=0:11:5:0
Looking for splash picture.... silenjpeg size 70780 bytes, found (1024x768, 
20089 bytes, v3).
Got silent jpeg.
Got silent jpeg.
Console: switching to colour frame buffer device 122x39
fb0: VESA VGA frame buffer device
pty: 256 Unix98 ptys configured
Real Time Clock Driver v1.10e
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks OSB4: IDE controller on PCI bus 00 dev 79
SvrWks OSB4: chipset revision 0
SvrWks OSB4: not 100% native mode: will probe irqs later
SvrWks OSB4: default first interface base=0x01f0, second interface base=0x170
    ide0: BM-DMA at 0x08b0-0x08b7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x08b8-0x08bf, BIOS settings: hdc:pio, hdd:pio
hda: Hewlett-Packard CD-Writer Plus 9500, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 32X CD-ROM CD-R/RW drive, 4096kB Cache
Uniform CD-ROM driver Revision: 3.12
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
Equalizer1996: $Revision: 1.2.1 $ $Date: 1996/09/22 13:52:00 $ Simon Janes 
(simon@ncm.com)
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
mice: PS/2 mouse device common for all mice
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
IPv6 v0.8 for NET4.0
IPv6 over IPv4 tunneling driver
RAMDISK: ext2 filesystem found at block 0
RAMDISK: Loading 5000 blocks [1 disk] into ram disk... |/-\|/-\|/-\|/-\|/-\|/-
\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-
\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-
\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-
\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|done.
Freeing initrd memory: 5088k freed
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 128k freed
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
uhci.c: USB Universal Host Controller Interface driver v1.1
usb-ohci.c: USB OHCI at membase 0xe0c3f000, IRQ 10
usb-ohci.c: usb-00:0f.2, ServerWorks OSB4/CSB5 OHCI USB Controller
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
Initializing USB Mass Storage driver...
usb.c: registered new driver usb-storage
USB Mass Storage support registered.
ISO 9660 Extensions: Microsoft Joliet Level 3
ISO 9660 Extensions: RRIP_1991A
cloop: Welcome to cloop v0.68
cloop: /newroot/mnt/cdrom/livecd.cloop: 3283 blocks, 65536 bytes/block, largest 
block is 65562 bytes.
cloop: loaded (max 1 devices)
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI 
ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
inserting floppy driver for 2.4.20-xfs-r3
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.32
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.32
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

(scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
(scsi1:A:1): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
  Vendor: FUJITSU   Model: MAN3184MP         Rev: 5507
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0:A:0:0: Tagged Queuing enabled.  Depth 253
  Vendor: FUJITSU   Model: MAN3184MP         Rev: 5507
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi1:A:1:0: Tagged Queuing enabled.  Depth 253
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi1, channel 0, id 1, lun 0
SCSI device sda: 35566478 512-byte hdwr sectors (18210 MB)
Partition check:
 /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3
SCSI device sdb: 35566478 512-byte hdwr sectors (18210 MB)
 /dev/scsi/host1/bus0/target1/lun0: p1 p2 p3
eepro100.c:v1.09j-t 9/29/99 Donald Becker 
http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin 
<saw@saw.sw.com.sg> and others
eth0: Intel Corp. 82557/8/9 [Ethernet Pro 100], 00:B0:D0:FC:22:A9, IRQ 16.
  Receiver lock-up bug exists -- enabling work-around.
  Board assembly 07195d-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).
  Receiver lock-up workaround activated.
usb-uhci.c: $Revision: 1.275 $ time 02:45:27 Jul 23 2003
usb-uhci.c: High bandwidth mode enabled
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
uhci.c: USB Universal Host Controller Interface driver v1.1
Looking for splash picture....<6>Looking for splash picture....<6>error 14 
while decompressing picture.
 found (1024x768, 20089 bytes, v3).
Splash status on console 0 changed to on
Looking for splash picture.... found (1024x768, 20089 bytes, v3).
Splash status on console 0 changed to on
Looking for splash picture.... found (1024x768, 20089 bytes, v3).
Splash status on console 0 changed to on
eth0: no IPv6 routers present
Looking for splash picture.... found (1024x768, 20089 bytes, v3).
Splash status on console 0 changed to on
Comment 1 Brandon Black 2003-08-15 10:58:38 UTC
Again here's a related one from my same outage incident that I'll second, I've seen a similar but different problem which is probably related.  The hardware in question is a dual P3-600 (440GX, old VA Linux ClusterCity nodes).  The non-smp kernel doesn't boot at all, but that's acceptable.  The "smp" kernel on the x86 livecds works fine for all purposes.  The "pentium3" livecd's smp kernel, however, does a hard lockup with no panic message when I raidstart any software raid5 device.  Since x86 works fine and p3 has the bug, all other factors being identical, I'm guessing your p3 optimizations on the smp kernel are causing some stability problems.
Comment 2 Bill Church 2003-08-15 13:09:57 UTC
Well, non-smp works fine for me. I did the install to stage-3 with the non-smp kernel. Then built my kernel with the linux-gs sources using the P3 process opts and SMP. I need to run a diff of my .config and the stock gentoo to see what the differnce is because it runs fine now. Just the boot kernel.

I'll run the diffs tonight and report back.

-Bill
Comment 3 Davin Boling 2003-08-28 08:42:24 UTC
I've experienced strangely similar lockup problems with my Poweredge 4300 (440GX) and a Unisys ES5045 (440NX). No problems were encountered during any phase of compiling when either system was booted off of the Livecd, but after rebooting off the compiled kernel the system is subject to random freezes with no errors/debugging messages. The bug seems most likely to strike during compile jobs for me. On the PE4300, rebooting with a single processor and a terminator resulted in clean running.

This could potentially be a problem only encountered with 440-series chipsets.
Comment 4 Davin Boling 2003-08-28 08:44:09 UTC
Correction:
The ES5045 uses a 450NX chipset, not a 440NX.
Comment 5 Pat Grimm 2003-11-07 07:54:16 UTC
similar problems on a Dell Precision 420, AIC-7899 dual channel and smp

after booting smp from the cd

#rmmod aic7xxx
#modprobe aic7xxx_old

Seems that are lingering problems with smp and aic7xxx.
Comment 6 Jeremy Drake 2003-12-19 15:25:37 UTC
I had the same problem with a box I'm building from old parts.  The card is an AHA-2940, and aic7xxx would reliably lock up the box under an smp kernel.  2.6.0 from kernel.org seems to be immune.  Also, Mandrake 9.2 does not have this problem.  I think version 6.2.33 of the aic7xxx driver has this problem (which is in the gentoo sources on the 20030911 cds), but 6.2.35 (in 2.6.0) and 6.2.36 (in gentoo 2.4.22-r1, and in Mandrake 9.2) are immune.
Comment 7 Bob Johnson (RETIRED) gentoo-dev 2004-01-20 17:24:12 UTC
Do newer kernels without the aic patch help this?
Comment 8 Gerrit Slomma 2004-02-23 15:42:42 UTC
Yes, I have the same.
Everytime lockup when trying to format the /dev/sda3

when formating with normal-uniprocessor-kernel the job is done (expectably)
when rebooted with smp-kernel the
#mount command for the third partition goes into nirvana (see #42652 my actual report)