Some SATA controllers (Promise and my VIA chipset for sure) mixed with DMA transfers and perhaps a few more required kernel settings cause hard disk errors directly caused by DMA. Reproducible: Always Steps to Reproduce: 1. Setup your SATA HD to use DMA with the right controller 2. Use the computer, emerge something.. 3. Actual Results: The computer will partially freeze for 10 seconds, u can move the mouse around and click on stuff but that is about it, menu's wont open, basically anything that requires disk access will not take place, and dmesg will output something like this: hdg: dma_timer_expiry: dma status == 0x24 hdg: DMA interrupt recovery hdg: lost interrupt Expected Results: Not freeze and output that. Read forum for fix.
Here is my full dmesg output: Bootdata ok (command line is root=/dev/hdg6 devfs=nomount video=mtrr,vesa:1280x1024 vga=0x31a splash=silent) Linux version 2.6.7-gentoo-r11 (root@kow) (gcc version 3.4.1 (Gentoo Linux 3.4.1-r1, ssp-3.4-2, pie-8.7.6.3)) #2 SMP Wed Jul 21 11:18:13 CDT 2004 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009d800 (usable) BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fef0000 (usable) BIOS-e820: 000000003fef0000 - 000000003fef3000 (ACPI NVS) BIOS-e820: 000000003fef3000 - 000000003ff00000 (ACPI data) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) Scanning NUMA topology in Northbridge 24 Number of nodes 2 (10010) Node 0 MemBase 0000000000000000 Limit 000000003fef0000 Skipping disabled node 1 Using node hash shift of 24 Bootmem setup node 0 0000000000000000-000000003fef0000 No mptable found. On node 0 totalpages: 261872 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 257776 pages, LIFO batch:16 HighMem zone: 0 pages, LIFO batch:1 ACPI: RSDP (v000 VIAK8 ) @ 0x00000000000f6ca0 ACPI: RSDT (v001 VIAK8 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fef3000 ACPI: FADT (v001 VIAK8 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fef3040 ACPI: MADT (v001 VIAK8 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000003fef7b00 ACPI: DSDT (v001 VIAK8 AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x0000000000000000 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:5 APIC version 16 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 15:5 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) IOAPIC[0]: Assigned apic_id 2 IOAPIC[0]: apic_id 2, version 3, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Using ACPI (MADT) for SMP configuration information Checking aperture... CPU 0: aperture @ e0000000 size 128 MB CPU 1: aperture @ e0000000 size 128 MB Built 2 zonelists Kernel command line: root=/dev/hdg6 devfs=nomount video=mtrr,vesa:1280x1024 vga=0x31a splash=silent console=tty0 bootsplash: silent mode. Initializing CPU#0 PID hash table entries: 16 (order 4: 256 bytes) time.c: Using 1.193182 MHz PIT timer. time.c: Detected 1603.718 MHz processor. Console: colour dummy device 80x25 Memory: 1027244k/1047488k available (2892k kernel code, 0k reserved, 1298k data, 180k init) Calibrating delay loop... 3145.72 BogoMIPS Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) Mount-cache hash table entries: 256 (order: 0, 4096 bytes) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) Using local APIC NMI watchdog using perfctr0 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU0: AMD Opteron(tm) Processor 242 stepping 01 per-CPU timeslice cutoff: 1024.37 usecs. task migration cache decay timeout: 2 msecs. Booting processor 1/1 rip 6000 rsp 10001e39f58 Initializing CPU#1 Calibrating delay loop... 3203.07 BogoMIPS CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) AMD Opteron(tm) Processor 242 stepping 01 Total of 2 processors activated (6348.80 BogoMIPS). ENABLING IO-APIC IRQs init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. ..TIMER: vector=0x31 pin1=2 pin2=-1 Using local APIC timer interrupts. Detected 12.529 MHz APIC timer. checking TSC synchronization across 2 CPUs: passed. time.c: Using PIT/TSC based timekeeping. Brought up 2 CPUs CPU0: online domain 0: span 1 groups: 1 domain 1: span 3 groups: 1 2 CPU1: online domain 0: span 2 groups: 2 domain 1: span 3 groups: 2 1 checking if image is initramfs...it isn't (ungzip failed); looks like an initrd NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20040326 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 6 7 10 11 12) *5 ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 6 7 10 *11 12) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 6 7 10 *11 12) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 *10 11 12) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 11 12) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 6 7 10 11 12) *0, disabled. ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 6 7 10 11 12) *0, disabled. ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 6 7 10 11 12) *0, disabled. ACPI: PCI Interrupt Link [ALKA] (IRQs 20) *0 ACPI: PCI Interrupt Link [ALKB] (IRQs 21) *0 ACPI: PCI Interrupt Link [ALKC] (IRQs 22) *0 ACPI: PCI Interrupt Link [ALKD] (IRQs 23) *0 IOAPIC[0]: Set PCI routing entry (2-18 -> 0xa9 -> IRQ 18 Mode:1 Active:1) 00:00:07[A] -> 2-18 -> IRQ 18 IOAPIC[0]: Set PCI routing entry (2-19 -> 0xb1 -> IRQ 19 Mode:1 Active:1) 00:00:07[B] -> 2-19 -> IRQ 19 IOAPIC[0]: Set PCI routing entry (2-16 -> 0xb9 -> IRQ 16 Mode:1 Active:1) 00:00:07[C] -> 2-16 -> IRQ 16 IOAPIC[0]: Set PCI routing entry (2-17 -> 0xc1 -> IRQ 17 Mode:1 Active:1) 00:00:07[D] -> 2-17 -> IRQ 17 ACPI: PCI Interrupt Link [ALKB] BIOS reported IRQ 0, using IRQ 21 ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21 IOAPIC[0]: Set PCI routing entry (2-21 -> 0xc9 -> IRQ 21 Mode:1 Active:1) 00:00:10[A] -> 2-21 -> IRQ 21 ACPI: PCI Interrupt Link [ALKA] BIOS reported IRQ 0, using IRQ 20 ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 20 IOAPIC[0]: Set PCI routing entry (2-20 -> 0xd1 -> IRQ 20 Mode:1 Active:1) 00:00:11[A] -> 2-20 -> IRQ 20 ACPI: PCI Interrupt Link [ALKC] BIOS reported IRQ 0, using IRQ 22 ACPI: PCI Interrupt Link [ALKC] enabled at IRQ 22 IOAPIC[0]: Set PCI routing entry (2-22 -> 0xd9 -> IRQ 22 Mode:1 Active:1) 00:00:11[C] -> 2-22 -> IRQ 22 ACPI: PCI Interrupt Link [ALKD] BIOS reported IRQ 0, using IRQ 23 ACPI: PCI Interrupt Link [ALKD] enabled at IRQ 23 IOAPIC[0]: Set PCI routing entry (2-23 -> 0xe1 -> IRQ 23 Mode:1 Active:1) 00:00:11[D] -> 2-23 -> IRQ 23 number of MP IRQ sources: 15. number of IO-APIC #2 registers: 24. testing the IO APIC....................... IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 .... register #01: 00178003 ....... : max redirection entries: 0017 ....... : PRQ implemented: 1 ....... : IO APIC version: 0003 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 001 01 0 0 0 0 0 1 1 39 02 001 01 0 0 0 0 0 1 1 31 03 001 01 0 0 0 0 0 1 1 41 04 001 01 0 0 0 0 0 1 1 49 05 001 01 0 0 0 0 0 1 1 51 06 001 01 0 0 0 0 0 1 1 59 07 001 01 0 0 0 0 0 1 1 61 08 001 01 0 0 0 0 0 1 1 69 09 001 01 0 1 0 1 0 1 1 71 0a 001 01 0 0 0 0 0 1 1 79 0b 001 01 0 0 0 0 0 1 1 81 0c 001 01 0 0 0 0 0 1 1 89 0d 001 01 0 0 0 0 0 1 1 91 0e 001 01 0 0 0 0 0 1 1 99 0f 001 01 0 0 0 0 0 1 1 A1 10 001 01 1 1 0 1 0 1 1 B9 11 001 01 1 1 0 1 0 1 1 C1 12 001 01 1 1 0 1 0 1 1 A9 13 001 01 1 1 0 1 0 1 1 B1 14 001 01 1 1 0 1 0 1 1 D1 15 001 01 1 1 0 1 0 1 1 C9 16 001 01 1 1 0 1 0 1 1 D9 17 001 01 1 1 0 1 0 1 1 E1 IRQ to pin mappings: IRQ0 -> 0:2 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 IRQ16 -> 0:16 IRQ17 -> 0:17 IRQ18 -> 0:18 IRQ19 -> 0:19 IRQ20 -> 0:20 IRQ21 -> 0:21 IRQ22 -> 0:22 IRQ23 -> 0:23 .................................... done. PCI: Using ACPI for IRQ routing agpgart: Detected AGP bridge 0 agpgart: Maximum main memory to use for agp memory: 940M agpgart: AGP aperture is 128M @ 0xe0000000 PCI-DMA: Disabling IOMMU. vesafb: framebuffer at 0xe8000000, mapped to 0xffffff000004e000, size 5120k vesafb: mode is 1280x1024x16, linelength=2560, pages=50 vesafb: scrolling: redraw vesafb: directcolor: size=0:5:6:5, shift=0:11:5:0 fb0: VESA VGA frame buffer device IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au) devfs: boot_options: 0x0 Initializing Cryptographic API PCI: Via IRQ fixup for 0000:00:10.2, from 11 to 5 pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI: Power Button (FF) [PWRF] ACPI: Fan [FAN] (on) ACPI: Processor [CPU0] (supports C1) ACPI: Processor [CPU1] (supports C1) ACPI: Thermal Zone [THRM] (60 C) mice: PS/2 mouse device common for all mice serio: i8042 AUX port at 0x60,0x64 irq 12 input: ImPS/2 Generic Wheel Mouse on isa0060/serio1 serio: i8042 KBD port at 0x60,0x64 irq 1 input: AT Translated Set 2 keyboard on isa0060/serio0 bootsplash 3.1.4-2004/02/19-spock-0.1: looking for picture.... silentjpeg size 75071 bytes, found (1280x1024, 85755 bytes, v3). Console: switching to colour frame buffer device 153x54 lp: driver loaded but no devices found Real Time Clock Driver v1.12 Non-volatile memory driver v1.2 Linux agpgart interface v0.100 (c) Dave Jones parport0: PC-style at 0x378 [PCSPP,TRISTATE] parport0: Printer, HEWLETT-PACKARD DESKJET 820C lp0: using parport0 (polling). lp0: console ready Using anticipatory io scheduler Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html 0000:00:08.0: 3Com PCI 3cSOHO100-TX Hurricane at 0xb800. Vers LK1.1.19 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VIA8237SATA: IDE controller at PCI slot 0000:00:0f.0 VIA8237SATA: chipset revision 128 VIA8237SATA: 100% native mode on irq 20 ide2: BM-DMA at 0xcc00-0xcc07, BIOS settings: hde:pio, hdf:pio ide3: BM-DMA at 0xcc08-0xcc0f, BIOS settings: hdg:pio, hdh:pio hdg: WDC WD2500JD-22GBB0, ATA DISK drive ide3 at 0xc400-0xc407,0xc802 on irq 20 VP_IDE: IDE controller at PCI slot 0000:00:0f.1 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt8237 (rev 00) IDE UDMA133 controller on pci0000:00:0f.1 ide0: BM-DMA at 0xd400-0xd407, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xd408-0xd40f, BIOS settings: hdc:DMA, hdd:pio hda: _NEC DVD_RW ND-2510A, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: PLEXTOR DVDR PX-504A, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hdg: max request size: 1024KiB hdg: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63 /dev/ide/host2/bus1/target0/lun0: p1 p2 p3 < p5 p6 > p4 hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 hdc: ATAPI 40X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) i2c /dev entries driver Advanced Linux Sound Architecture Driver Version 1.0.4 (Mon May 17 14:31:44 2004 UTC). ALSA device list: #0: Sound Blaster Audigy2 (rev.4) at 0xb000, irq 16 NET: Registered protocol family 2 IP: routing cache hash table of 4096 buckets, 64Kbytes TCP: Hash tables configured (established 131072 bind 65536) NET: Registered protocol family 1 NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 RAMDISK: Couldn't find valid RAM disk image starting at 0. ReiserFS: hdg6: found reiserfs format "3.6" with standard journal ReiserFS: hdg6: using ordered data mode ReiserFS: hdg6: journal params: device hdg6, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: hdg6: checking transaction log (hdg6) ReiserFS: hdg6: Using r5 hash to sort names VFS: Mounted root (reiserfs filesystem) readonly. Freeing unused kernel memory: 180k freed Adding 497972k swap on /dev/hdg5. Priority:-1 extents:1 NTFS driver 2.1.14 [Flags: R/O MODULE]. NTFS volume version 3.1. usbcore: registered new driver usbfs usbcore: registered new driver hub Disabled Privacy Extensions on device ffffffff804bfd00(lo) bootsplash 3.1.4-2004/02/19-spock-0.1: looking for picture.... found (1280x1024, 26385 bytes, v3). bootsplash: status on console 0 changed to on bootsplash 3.1.4-2004/02/19-spock-0.1: looking for picture.... found (1280x1024, 26385 bytes, v3). bootsplash: status on console 1 changed to on bootsplash 3.1.4-2004/02/19-spock-0.1: looking for picture.... found (1280x1024, 26385 bytes, v3). bootsplash: status on console 2 changed to on bootsplash 3.1.4-2004/02/19-spock-0.1: looking for picture.... found (1280x1024, 26385 bytes, v3). bootsplash: status on console 3 changed to on bootsplash 3.1.4-2004/02/19-spock-0.1: looking for picture.... found (1280x1024, 26385 bytes, v3). bootsplash: status on console 4 changed to on bootsplash 3.1.4-2004/02/19-spock-0.1: looking for picture.... found (1280x1024, 26385 bytes, v3). bootsplash: status on console 5 changed to on mtrr: 0xe8000000,0x8000000 overlaps existing 0xe8000000,0x400000 eth0: no IPv6 routers present hdg: dma_timer_expiry: dma status == 0x24 hdg: DMA interrupt recovery hdg: lost interrupt hdg: dma_timer_expiry: dma status == 0x24 hdg: DMA interrupt recovery hdg: lost interrupt hdg: dma_timer_expiry: dma status == 0x24 hdg: DMA interrupt recovery hdg: lost interrupt
Looking into it.
This problem seems to only occur with the old now deprecated VIA controllers... the new ones that use libata and SCSI emulation seem fine (at least for the VIA 8237 contoller)
Ok, marking as fixed. libata is the way to go.