Summary: | sys-apps/smartmontools - smartd can lead to failures with software raid-5 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | georg.lippold |
Component: | [OLD] Core system | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED UPSTREAM | ||
Severity: | critical | ||
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
hdparm -I /dev/sd{a,b,c,d}
/etc/smartd.conf |
Description
georg.lippold
2006-08-21 09:35:23 UTC
Reopen with /etc/smartd.conf and 'hdparm -I /dev/sd?' output attached. Created attachment 94796 [details]
hdparm -I /dev/sd{a,b,c,d}
Created attachment 94797 [details]
/etc/smartd.conf
Attachments as requested. since you've made your own custom config file, why dont you start with the stock one and see what options are causing you troubles I won't use this software any more. It took me too long to figure the bug out. Plus, it is only a nice addon for raid storage but not necessary since disk failures are not crucial. Here's some additional information: It is _always_ /dev/sda that fails, regardless of the attached disk (I swapped them). Additionally, the disks attached to the controller are not detected in the correct order. The disk on port1 is detected as /dev/sdd, on port 3 as /dev/sda and so on. The incorrect detection may be a driver specific issue and not related to the data corruption. I strongly doubt that it has anything to do with the custom config file because I only set custom device checking intervals. Regards, Georg Well, to be honest I think your SATA cable or your controller is faulty, and this has nothing in common w/ smartmontools. Not much we could do here if you are not going to test anything for us. Problem is, that this is the backup server at work (and unfortunately it is a small business that cannot afford a production and a testing system). I can definitely say that these errors never happened for about 1 year. They occured first when I used smartmontools about two weeks ago to additionally watch the disks for errors. Now that I disabled smartmontools, everything works as expected. I am currently restoring as many backups as possible from our external hard disks and do not want to break things again. Sorry that testing is not possible. It may be an error in the SATA-Controller, but I rather think it is in the driver or in the way smartmontools accesses SATA disks. The error occurs almost only with heavy load, such as rebuilding the array or copying large chunks of data to it. Regards, Georg those PIO errors are normal ... it means you're trying to use an option the device does not understand if you read more of your logs or just ran smartd with -d, you'd see like: Device: /dev/sdb, opened Device: /dev/sdb, not found in smartd database. Error SMART Enable Auto-save failed: Input/output error Device: /dev/sdb, could not enable SMART Attribute Autosave. Error SMART Enable Automatic Offline failed: Input/output error Device: /dev/sdb, enable SMART Automatic Offline Testing failed. and then the kernel would spit: ata2: PIO error ata2: status=0x50 { DriveReady SeekComplete } for each feature that failed as for the I/O errors, i'm pretty sure that is not smartd's fault |