| Summary: | gentoo sources-2.6.20 - hard hangs caused libata/sata_nv, -- NCQ issues | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Michael Kers <michael.kers> |
| Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
| Status: | RESOLVED NEEDINFO | ||
| Severity: | critical | CC: | nelson.batalha |
| Priority: | High | ||
| Version: | 2007.0 | ||
| Hardware: | AMD64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
There is an additional hitch. I tried the 2.6.23-rc1 vanilla-sources to try to replicate this bug. 2.6.23 did not seem to have this problem. The problem either was fixed by 2.6.23 (in the kernel) or is a problem limited to Gentoo-sources. (In reply to comment #1) > (...) 2.6.23 did not seem to have this problem. The problem > either was fixed by 2.6.23 (in the kernel) or is a problem limited to > Gentoo-sources. Can you test the vanilla 2.6.22? > Looks like this drive might also belong in the libata blacklist. Better check how it was fixed in the 2.6.23, could you see if it's blacklisted there? > Can you test the vanilla 2.6.22? Will do. > > Looks like this drive might also belong in the libata blacklist. > Better check how it was fixed in the 2.6.23, could you see if it's > blacklisted there? Just to clarify, its not blacklisted in 2.6.23 (and I didn't have to add it), and I also had NCQ support for all HDs. Just by doing a quick diff on libata-core.c and sata_nv.c between 2.6.22 gentoo and 2.6.23 vanilla, there have been lots of code changes. Michael, Did you get a chance to test the vanilla kernel, currently at 2.6.23-rc5 as of this writing? Please reopen when you've had a chance to test with the latest development kernel and are able to post the test results. |
I upgraded to kernel-genkernel-x86_64-2.6.20-gentoo-r8 recently from a 2.6.18 kernel. I have had a fairly stable relationship with 2.6.18 for many many months. My system has 4 HDs, all sata. Motherboard is Asus A8N-E (CK804 chipset). Should support NCQ. My 4 drives are: sda/ata1: Seagate ST3300622AS; sdb/ata2: WDC WD2500JS-00MVB1; sdc/ata3: Maxtor 7H500F0; sdd/ata4: Maxtor 7H500F0 According to `dmesg | grep -i ata`, the seagate and two maxtors both have NCQ depth 32. The WD does not. The Seagate and WD I have no problems with. When I go to access (fsck or read or write) something from either sdc or sdd, I get a hard hang after about 10 seconds. Steps to reproduce: 1. Boot computer with 2.6.20 or 2.6.22 gentoo kernel (sata_nv and libata) 2. Run `fsck.jfs /dev/sdc2` 3. Computer hangs within 10 seconds requiring hard power down One time I received this error: MCE CPU 0: 4 bank 4:b200000000070f0f ata3:SRST failed (errno=-19) ata3:reset failed (errno=-19) retry in 10 sec This is *NOT* a software problem! Please contact your hardware vendor TSC 736629147e ------- Going back to 2.6.18 -- my problems go away. After doing some research into NCQ, sata_nv and blacklists, I found that the file in /usr/src/linux/drivers/ata/libata-core.c had a blacklist for HDs and devices that give trouble. I added my Maxtor HD models to the list like so (SEE line 3339 -- thats the line I added): 3291 static const struct ata_blacklist_entry ata_device_blacklist [] = { 3292 /* Devices with DMA related problems under Linux */ 3293 { "WDC AC11000H", NULL, ATA_HORKAGE_NODMA }, 3294 { "WDC AC22100H", NULL, ATA_HORKAGE_NODMA }, 3295 { "WDC AC32500H", NULL, ATA_HORKAGE_NODMA }, 3296 { "WDC AC33100H", NULL, ATA_HORKAGE_NODMA }, 3297 { "WDC AC31600H", NULL, ATA_HORKAGE_NODMA }, 3298 { "WDC AC32100H", "24.09P07", ATA_HORKAGE_NODMA }, 3299 { "WDC AC23200L", "21.10N21", ATA_HORKAGE_NODMA }, 3300 { "Compaq CRD-8241B", NULL, ATA_HORKAGE_NODMA }, 3301 { "CRD-8400B", NULL, ATA_HORKAGE_NODMA }, 3302 { "CRD-8480B", NULL, ATA_HORKAGE_NODMA }, 3303 { "CRD-8482B", NULL, ATA_HORKAGE_NODMA }, 3304 { "CRD-84", NULL, ATA_HORKAGE_NODMA }, 3305 { "SanDisk SDP3B", NULL, ATA_HORKAGE_NODMA }, 3306 { "SanDisk SDP3B-64", NULL, ATA_HORKAGE_NODMA }, 3307 { "SANYO CD-ROM CRD", NULL, ATA_HORKAGE_NODMA }, 3308 { "HITACHI CDR-8", NULL, ATA_HORKAGE_NODMA }, 3309 { "HITACHI CDR-8335", NULL, ATA_HORKAGE_NODMA }, 3310 { "HITACHI CDR-8435", NULL, ATA_HORKAGE_NODMA }, 3311 { "Toshiba CD-ROM XM-6202B", NULL, ATA_HORKAGE_NODMA }, 3312 { "TOSHIBA CD-ROM XM-1702BC", NULL, ATA_HORKAGE_NODMA }, 3313 { "CD-532E-A", NULL, ATA_HORKAGE_NODMA }, 3314 { "E-IDE CD-ROM CR-840",NULL, ATA_HORKAGE_NODMA }, 3315 { "CD-ROM Drive/F5A", NULL, ATA_HORKAGE_NODMA }, 3316 { "WPI CDD-820", NULL, ATA_HORKAGE_NODMA }, 3317 { "SAMSUNG CD-ROM SC-148C", NULL, ATA_HORKAGE_NODMA }, 3318 { "SAMSUNG CD-ROM SC", NULL, ATA_HORKAGE_NODMA }, 3319 { "ATAPI CD-ROM DRIVE 40X MAXIMUM",NULL,ATA_HORKAGE_NODMA }, 3320 { "_NEC DV5800A", NULL, ATA_HORKAGE_NODMA }, 3321 { "SAMSUNG CD-ROM SN-124","N001", ATA_HORKAGE_NODMA }, 3322 3323 /* Devices we expect to fail diagnostics */ 3324 3325 /* Devices where NCQ should be avoided */ 3326 /* NCQ is slow */ 3327 { "WDC WD740ADFD-00", NULL, ATA_HORKAGE_NONCQ }, 3328 /* http://thread.gmane.org/gmane.linux.ide/14907 */ 3329 { "FUJITSU MHT2060BH", NULL, ATA_HORKAGE_NONCQ }, 3330 /* NCQ is broken */ 3331 { "Maxtor 6L250S0", "BANC1G10", ATA_HORKAGE_NONCQ }, 3332 /* NCQ hard hangs device under heavier load, needs hard power cycle */ 3333 { "Maxtor 6B250S0", "BANC1B70", ATA_HORKAGE_NONCQ }, 3334 /* Blacklist entries taken from Silicon Image 3124/3132 3335 Windows driver .inf file - also several Linux problem reports */ 3336 { "HTS541060G9SA00", "MB3OC60D", ATA_HORKAGE_NONCQ, }, 3337 { "HTS541080G9SA00", "MB4OC60D", ATA_HORKAGE_NONCQ, }, 3338 { "HTS541010G9SA00", "MBZOC60D", ATA_HORKAGE_NONCQ, }, 3339 { "Maxtor 7H500F0", NULL, ATA_HORKAGE_NONCQ, }, 3340 3341 3342 /* Devices with NCQ limits */ 3343 3344 3345 /* End Marker */ 3346 { } 3347 }; Recompiling the kernel and using the steps to reproduce error above my problem went away. Looks like this drive might also belong in the libata blacklist.