Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 187686 - gentoo sources-2.6.20 - hard hangs caused libata/sata_nv, -- NCQ issues
Summary: gentoo sources-2.6.20 - hard hangs caused libata/sata_nv, -- NCQ issues
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-04 00:23 UTC by Michael Kers
Modified: 2007-09-24 15:34 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Kers 2007-08-04 00:23:08 UTC
I upgraded to kernel-genkernel-x86_64-2.6.20-gentoo-r8 recently from a 2.6.18 kernel.  I have had a fairly stable relationship with 2.6.18 for many many months.

My system has 4 HDs, all sata.  Motherboard is Asus A8N-E (CK804 chipset).  Should support NCQ.  My 4 drives are: sda/ata1: Seagate ST3300622AS; sdb/ata2: WDC WD2500JS-00MVB1; sdc/ata3: Maxtor 7H500F0; sdd/ata4: Maxtor 7H500F0 

According to `dmesg | grep -i ata`, the seagate and two maxtors both have NCQ depth 32.  The WD does not.

The Seagate and WD I have no problems with.

When I go to access (fsck or read or write) something from either sdc or sdd, I get a hard hang after about 10 seconds.

Steps to reproduce:
1. Boot computer with 2.6.20 or 2.6.22 gentoo kernel (sata_nv and libata)
2. Run `fsck.jfs /dev/sdc2`
3. Computer hangs within 10 seconds requiring hard power down

One time I received this error:
MCE 
 CPU 0: 4 bank 4:b200000000070f0f 
 ata3:SRST failed (errno=-19) 
 ata3:reset failed (errno=-19) retry in 10 sec 
 
 This is *NOT* a software problem! 
 Please contact your hardware vendor 
 TSC 736629147e

-------
Going back to 2.6.18 -- my problems go away.

After doing some research into NCQ, sata_nv and blacklists, I found that the file in /usr/src/linux/drivers/ata/libata-core.c had a blacklist for HDs and devices that give trouble.  I added my Maxtor HD models to the list like so (SEE line 3339 -- thats the line I added):

   3291 static const struct ata_blacklist_entry ata_device_blacklist [] = { 
    3292         /* Devices with DMA related problems under Linux */ 
    3293         { "WDC AC11000H",       NULL,           ATA_HORKAGE_NODMA }, 
    3294         { "WDC AC22100H",       NULL,           ATA_HORKAGE_NODMA }, 
    3295         { "WDC AC32500H",       NULL,           ATA_HORKAGE_NODMA }, 
    3296         { "WDC AC33100H",       NULL,           ATA_HORKAGE_NODMA }, 
    3297         { "WDC AC31600H",       NULL,           ATA_HORKAGE_NODMA }, 
    3298         { "WDC AC32100H",       "24.09P07",     ATA_HORKAGE_NODMA }, 
    3299         { "WDC AC23200L",       "21.10N21",     ATA_HORKAGE_NODMA }, 
    3300         { "Compaq CRD-8241B",   NULL,           ATA_HORKAGE_NODMA }, 
    3301         { "CRD-8400B",          NULL,           ATA_HORKAGE_NODMA }, 
    3302         { "CRD-8480B",          NULL,           ATA_HORKAGE_NODMA }, 
    3303         { "CRD-8482B",          NULL,           ATA_HORKAGE_NODMA }, 
    3304         { "CRD-84",             NULL,           ATA_HORKAGE_NODMA }, 
    3305         { "SanDisk SDP3B",      NULL,           ATA_HORKAGE_NODMA }, 
    3306         { "SanDisk SDP3B-64",   NULL,           ATA_HORKAGE_NODMA }, 
    3307         { "SANYO CD-ROM CRD",   NULL,           ATA_HORKAGE_NODMA }, 
    3308         { "HITACHI CDR-8",      NULL,           ATA_HORKAGE_NODMA }, 
    3309         { "HITACHI CDR-8335",   NULL,           ATA_HORKAGE_NODMA }, 
    3310         { "HITACHI CDR-8435",   NULL,           ATA_HORKAGE_NODMA }, 
    3311         { "Toshiba CD-ROM XM-6202B", NULL,      ATA_HORKAGE_NODMA }, 
    3312         { "TOSHIBA CD-ROM XM-1702BC", NULL,     ATA_HORKAGE_NODMA }, 
    3313         { "CD-532E-A",          NULL,           ATA_HORKAGE_NODMA }, 
    3314         { "E-IDE CD-ROM CR-840",NULL,           ATA_HORKAGE_NODMA }, 
    3315         { "CD-ROM Drive/F5A",   NULL,           ATA_HORKAGE_NODMA }, 
    3316         { "WPI CDD-820",        NULL,           ATA_HORKAGE_NODMA }, 
    3317         { "SAMSUNG CD-ROM SC-148C", NULL,       ATA_HORKAGE_NODMA }, 
    3318         { "SAMSUNG CD-ROM SC",  NULL,           ATA_HORKAGE_NODMA }, 
    3319         { "ATAPI CD-ROM DRIVE 40X MAXIMUM",NULL,ATA_HORKAGE_NODMA }, 
    3320         { "_NEC DV5800A",       NULL,           ATA_HORKAGE_NODMA }, 
    3321         { "SAMSUNG CD-ROM SN-124","N001",       ATA_HORKAGE_NODMA }, 
    3322 
    3323         /* Devices we expect to fail diagnostics */ 
    3324 
    3325         /* Devices where NCQ should be avoided */ 
    3326         /* NCQ is slow */ 
    3327         { "WDC WD740ADFD-00",   NULL,           ATA_HORKAGE_NONCQ }, 
    3328         /* http://thread.gmane.org/gmane.linux.ide/14907 */ 
    3329         { "FUJITSU MHT2060BH",  NULL,           ATA_HORKAGE_NONCQ }, 
    3330         /* NCQ is broken */ 
    3331         { "Maxtor 6L250S0",     "BANC1G10",     ATA_HORKAGE_NONCQ }, 
    3332         /* NCQ hard hangs device under heavier load, needs hard power cycle */ 
    3333         { "Maxtor 6B250S0",     "BANC1B70",     ATA_HORKAGE_NONCQ }, 
    3334         /* Blacklist entries taken from Silicon Image 3124/3132 
    3335            Windows driver .inf file - also several Linux problem reports */ 
    3336         { "HTS541060G9SA00",    "MB3OC60D",     ATA_HORKAGE_NONCQ, }, 
    3337         { "HTS541080G9SA00",    "MB4OC60D",     ATA_HORKAGE_NONCQ, }, 
    3338         { "HTS541010G9SA00",    "MBZOC60D",     ATA_HORKAGE_NONCQ, }, 
    3339         { "Maxtor 7H500F0",     NULL,           ATA_HORKAGE_NONCQ, }, 
    3340 
    3341 
    3342         /* Devices with NCQ limits */ 
    3343 
    3344 
    3345         /* End Marker */ 
    3346         { } 
    3347 }; 

Recompiling the kernel and using the steps to reproduce error above my problem went away.

Looks like this drive might also belong in the libata blacklist.
Comment 1 Michael Kers 2007-08-05 14:59:00 UTC
There is an additional hitch.  I tried the 2.6.23-rc1 vanilla-sources to try to replicate this bug.  2.6.23 did not seem to have this problem.  The problem either was fixed by 2.6.23 (in the kernel) or is a problem limited to Gentoo-sources.
Comment 2 Nelson 2007-08-05 15:19:25 UTC
(In reply to comment #1)
> (...) 2.6.23 did not seem to have this problem.  The problem
> either was fixed by 2.6.23 (in the kernel) or is a problem limited to
> Gentoo-sources.

Can you test the vanilla 2.6.22?

> Looks like this drive might also belong in the libata blacklist.

Better check how it was fixed in the 2.6.23, could you see if it's blacklisted there?
Comment 3 Michael Kers 2007-08-05 17:32:31 UTC
> Can you test the vanilla 2.6.22?

Will do.

> > Looks like this drive might also belong in the libata blacklist.
> Better check how it was fixed in the 2.6.23, could you see if it's 
> blacklisted there?

Just to clarify, its not blacklisted in 2.6.23 (and I didn't have to add it), and I also had NCQ support for all HDs.

Just by doing a quick diff on libata-core.c and sata_nv.c  between 2.6.22 gentoo and 2.6.23 vanilla, there have been lots of code changes.

Comment 4 Mike Pagano gentoo-dev 2007-09-10 23:03:52 UTC
Michael,

Did you get a chance to test the vanilla kernel, currently at 2.6.23-rc5 as of this writing?
Comment 5 Mike Pagano gentoo-dev 2007-09-24 15:34:04 UTC
Please reopen when you've had a chance to test with the latest development kernel and are able to post the test results.