I upgraded to kernel-genkernel-x86_64-2.6.20-gentoo-r8 recently from a 2.6.18 kernel. I have had a fairly stable relationship with 2.6.18 for many many months. My system has 4 HDs, all sata. Motherboard is Asus A8N-E (CK804 chipset). Should support NCQ. My 4 drives are: sda/ata1: Seagate ST3300622AS; sdb/ata2: WDC WD2500JS-00MVB1; sdc/ata3: Maxtor 7H500F0; sdd/ata4: Maxtor 7H500F0 According to `dmesg | grep -i ata`, the seagate and two maxtors both have NCQ depth 32. The WD does not. The Seagate and WD I have no problems with. When I go to access (fsck or read or write) something from either sdc or sdd, I get a hard hang after about 10 seconds. Steps to reproduce: 1. Boot computer with 2.6.20 or 2.6.22 gentoo kernel (sata_nv and libata) 2. Run `fsck.jfs /dev/sdc2` 3. Computer hangs within 10 seconds requiring hard power down One time I received this error: MCE CPU 0: 4 bank 4:b200000000070f0f ata3:SRST failed (errno=-19) ata3:reset failed (errno=-19) retry in 10 sec This is *NOT* a software problem! Please contact your hardware vendor TSC 736629147e ------- Going back to 2.6.18 -- my problems go away. After doing some research into NCQ, sata_nv and blacklists, I found that the file in /usr/src/linux/drivers/ata/libata-core.c had a blacklist for HDs and devices that give trouble. I added my Maxtor HD models to the list like so (SEE line 3339 -- thats the line I added): 3291 static const struct ata_blacklist_entry ata_device_blacklist [] = { 3292 /* Devices with DMA related problems under Linux */ 3293 { "WDC AC11000H", NULL, ATA_HORKAGE_NODMA }, 3294 { "WDC AC22100H", NULL, ATA_HORKAGE_NODMA }, 3295 { "WDC AC32500H", NULL, ATA_HORKAGE_NODMA }, 3296 { "WDC AC33100H", NULL, ATA_HORKAGE_NODMA }, 3297 { "WDC AC31600H", NULL, ATA_HORKAGE_NODMA }, 3298 { "WDC AC32100H", "24.09P07", ATA_HORKAGE_NODMA }, 3299 { "WDC AC23200L", "21.10N21", ATA_HORKAGE_NODMA }, 3300 { "Compaq CRD-8241B", NULL, ATA_HORKAGE_NODMA }, 3301 { "CRD-8400B", NULL, ATA_HORKAGE_NODMA }, 3302 { "CRD-8480B", NULL, ATA_HORKAGE_NODMA }, 3303 { "CRD-8482B", NULL, ATA_HORKAGE_NODMA }, 3304 { "CRD-84", NULL, ATA_HORKAGE_NODMA }, 3305 { "SanDisk SDP3B", NULL, ATA_HORKAGE_NODMA }, 3306 { "SanDisk SDP3B-64", NULL, ATA_HORKAGE_NODMA }, 3307 { "SANYO CD-ROM CRD", NULL, ATA_HORKAGE_NODMA }, 3308 { "HITACHI CDR-8", NULL, ATA_HORKAGE_NODMA }, 3309 { "HITACHI CDR-8335", NULL, ATA_HORKAGE_NODMA }, 3310 { "HITACHI CDR-8435", NULL, ATA_HORKAGE_NODMA }, 3311 { "Toshiba CD-ROM XM-6202B", NULL, ATA_HORKAGE_NODMA }, 3312 { "TOSHIBA CD-ROM XM-1702BC", NULL, ATA_HORKAGE_NODMA }, 3313 { "CD-532E-A", NULL, ATA_HORKAGE_NODMA }, 3314 { "E-IDE CD-ROM CR-840",NULL, ATA_HORKAGE_NODMA }, 3315 { "CD-ROM Drive/F5A", NULL, ATA_HORKAGE_NODMA }, 3316 { "WPI CDD-820", NULL, ATA_HORKAGE_NODMA }, 3317 { "SAMSUNG CD-ROM SC-148C", NULL, ATA_HORKAGE_NODMA }, 3318 { "SAMSUNG CD-ROM SC", NULL, ATA_HORKAGE_NODMA }, 3319 { "ATAPI CD-ROM DRIVE 40X MAXIMUM",NULL,ATA_HORKAGE_NODMA }, 3320 { "_NEC DV5800A", NULL, ATA_HORKAGE_NODMA }, 3321 { "SAMSUNG CD-ROM SN-124","N001", ATA_HORKAGE_NODMA }, 3322 3323 /* Devices we expect to fail diagnostics */ 3324 3325 /* Devices where NCQ should be avoided */ 3326 /* NCQ is slow */ 3327 { "WDC WD740ADFD-00", NULL, ATA_HORKAGE_NONCQ }, 3328 /* http://thread.gmane.org/gmane.linux.ide/14907 */ 3329 { "FUJITSU MHT2060BH", NULL, ATA_HORKAGE_NONCQ }, 3330 /* NCQ is broken */ 3331 { "Maxtor 6L250S0", "BANC1G10", ATA_HORKAGE_NONCQ }, 3332 /* NCQ hard hangs device under heavier load, needs hard power cycle */ 3333 { "Maxtor 6B250S0", "BANC1B70", ATA_HORKAGE_NONCQ }, 3334 /* Blacklist entries taken from Silicon Image 3124/3132 3335 Windows driver .inf file - also several Linux problem reports */ 3336 { "HTS541060G9SA00", "MB3OC60D", ATA_HORKAGE_NONCQ, }, 3337 { "HTS541080G9SA00", "MB4OC60D", ATA_HORKAGE_NONCQ, }, 3338 { "HTS541010G9SA00", "MBZOC60D", ATA_HORKAGE_NONCQ, }, 3339 { "Maxtor 7H500F0", NULL, ATA_HORKAGE_NONCQ, }, 3340 3341 3342 /* Devices with NCQ limits */ 3343 3344 3345 /* End Marker */ 3346 { } 3347 }; Recompiling the kernel and using the steps to reproduce error above my problem went away. Looks like this drive might also belong in the libata blacklist.
There is an additional hitch. I tried the 2.6.23-rc1 vanilla-sources to try to replicate this bug. 2.6.23 did not seem to have this problem. The problem either was fixed by 2.6.23 (in the kernel) or is a problem limited to Gentoo-sources.
(In reply to comment #1) > (...) 2.6.23 did not seem to have this problem. The problem > either was fixed by 2.6.23 (in the kernel) or is a problem limited to > Gentoo-sources. Can you test the vanilla 2.6.22? > Looks like this drive might also belong in the libata blacklist. Better check how it was fixed in the 2.6.23, could you see if it's blacklisted there?
> Can you test the vanilla 2.6.22? Will do. > > Looks like this drive might also belong in the libata blacklist. > Better check how it was fixed in the 2.6.23, could you see if it's > blacklisted there? Just to clarify, its not blacklisted in 2.6.23 (and I didn't have to add it), and I also had NCQ support for all HDs. Just by doing a quick diff on libata-core.c and sata_nv.c between 2.6.22 gentoo and 2.6.23 vanilla, there have been lots of code changes.
Michael, Did you get a chance to test the vanilla kernel, currently at 2.6.23-rc5 as of this writing?
Please reopen when you've had a chance to test with the latest development kernel and are able to post the test results.