I'm filing this in view of an exchange I had with a user in the #gentoo channel. He had been in the process of migrating to a software-based 4-way RAID-5 setup and was at the final stage of the process which was to add the 4th device (hiterhto missing) to the array. Doing so triggered a resync as expected which, of course, can make for a fairly intensive workload. Unfortunately, the process went awry: Aug 29 20:48:36 frummel ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x1380000 action 0x2 frozen Aug 29 20:48:36 frummel ata2: SError: { 10B8B Dispar BadCRC TrStaTrns } Aug 29 20:48:36 frummel ata2.00: cmd 25/00:f8:3f:28:32/00:03:13:00:00/e0 tag 0 dma 520192 in Aug 29 20:48:36 frummel res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 29 20:48:36 frummel ata2.00: status: { DRDY } Aug 29 20:48:41 frummel ata2: port is slow to respond, please be patient (Status 0xff) Aug 29 20:48:46 frummel ata2: device not ready (errno=-16), forcing hardreset Aug 29 20:48:46 frummel ata2: hard resetting link Aug 29 20:48:52 frummel ata2: port is slow to respond, please be patient (Status 0xff) Aug 29 20:48:56 frummel ata2: COMRESET failed (errno=-16) Aug 29 20:48:56 frummel ata2: hard resetting link Aug 29 20:50:46 frummel ata2: reset failed, giving up Aug 29 20:50:46 frummel sd 1:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08 Aug 29 20:50:46 frummel sd 1:0:0:0: [sdb] Sense Key : 0xb [current] [descriptor] Aug 29 20:50:46 frummel Descriptor sense data with sense descriptors (in hex): Aug 29 20:50:46 frummel 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Aug 29 20:50:46 frummel 00 00 00 00 Aug 29 20:50:46 frummel sd 1:0:0:0: [sdb] ASC=0x0 ASCQ=0x0 Aug 29 20:50:46 frummel end_request: I/O error, dev sdb, sector 322054207 Aug 29 20:50:46 frummel raid5:md4: read error not correctable (sector 322054144 on sdb1). Fortunately, his data survived this mishap. As it turned out, the user had already conducted some research and found this thread: http://www.mail-archive.com/linux-ide%40vger.kernel.org/msg10106.html In summary, it seems that one of two things may be the case: 1) The contoller simply isn't stable in SATA 300 mode 2) The driver is buggy or unable to drive the controller reliably in SATA 300 mode for some unknown reason Unfortunately, I didn't establish precisely which kernel he was running but I know that it was a version of ~gentoo-sources-2.6.25. Given that the above mentioned post dates from September 2007, it seems that the issue has been observed since around 2.6.21. Further, there is a patch that works around the problem simply by arbitrarily dropping to SATA 150 (1.5 Gbits/s) mode. As of the time of posting, it is still available here for a recent 2.6.27 release candidate: http://user.it.uu.se/~mikpe/linux/patches/2.6/patch-sata_promise-limit-sataii-to-1.5Gbps-2.6.27-rc4 The questions that spring to my mind are: * How widespread is this problem? * Can anything be done about it? * If not, should we adopt this patch until such time as the issue is addressed?
The patch link is dead, so I can't check if this is included yet. I would suggest taking this to the linux-ide list.