I'm having a software raid-5 using a Promise SATA 300 TX-4 Serial ATA controller with the sata_promise driver statically linked into the kernel and four attached Seagate 300 GB SATA-Disks. When I use /etc/init.d/smartd start I always get the following errors while rebuilding the raid array: Aug 21 18:10:46 backup kernel: ata1: PIO error Aug 21 18:10:46 backup kernel: ata1: status=0x50 { DriveReady SeekComplete } Aug 21 18:10:46 backup kernel: ata1: PIO error Aug 21 18:10:46 backup kernel: ata1: status=0x50 { DriveReady SeekComplete } Aug 21 18:10:46 backup kernel: ata2: PIO error Aug 21 18:10:46 backup kernel: ata2: status=0x50 { DriveReady SeekComplete } Aug 21 18:10:47 backup kernel: ata2: PIO error Aug 21 18:10:47 backup kernel: ata2: status=0x50 { DriveReady SeekComplete } Aug 21 18:10:47 backup kernel: ata3: PIO error Aug 21 18:10:47 backup kernel: ata3: status=0x50 { DriveReady SeekComplete } Aug 21 18:10:47 backup kernel: ata3: PIO error Aug 21 18:10:47 backup kernel: ata3: status=0x50 { DriveReady SeekComplete } Aug 21 18:10:47 backup kernel: ata4: PIO error Aug 21 18:10:47 backup kernel: ata4: status=0x50 { DriveReady SeekComplete } Aug 21 18:10:48 backup kernel: ata4: PIO error Aug 21 18:10:48 backup kernel: ata4: status=0x50 { DriveReady SeekComplete } Sometimes, even worse things happen: Aug 21 17:46:15 backup kernel: ata1: PIO error Aug 21 17:46:15 backup kernel: ata1: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:20 backup kernel: ata1: PIO error Aug 21 17:46:20 backup kernel: ata1: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:21 backup kernel: ata2: PIO error Aug 21 17:46:21 backup kernel: ata2: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:22 backup kernel: ata2: PIO error Aug 21 17:46:22 backup kernel: ata2: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:22 backup kernel: ata3: PIO error Aug 21 17:46:22 backup kernel: ata3: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:22 backup kernel: ata3: PIO error Aug 21 17:46:22 backup kernel: ata3: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:22 backup kernel: ata4: PIO error Aug 21 17:46:22 backup kernel: ata4: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:22 backup kernel: ata4: PIO error Aug 21 17:46:22 backup kernel: ata4: status=0x50 { DriveReady SeekComplete } Aug 21 17:46:33 backup kernel: ata1: status=0xff { Busy } Aug 21 17:47:08 backup kernel: ata1: status=0xd0 { Busy } Aug 21 17:47:38 backup kernel: ata1: status=0xff { Busy } Aug 21 17:47:38 backup kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002 Aug 21 17:47:38 backup kernel: sda: Current: sense key: Aborted Command Aug 21 17:47:38 backup kernel: Additional sense: Scsi parity error Aug 21 17:47:38 backup kernel: end_request: I/O error, dev sda, sector 2709447 Aug 21 17:48:08 backup kernel: ata1: status=0xff { Busy } Aug 21 17:48:08 backup kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002 Aug 21 17:48:08 backup kernel: sda: Current: sense key: Aborted Command Aug 21 17:48:08 backup kernel: Additional sense: Scsi parity error Aug 21 17:48:08 backup kernel: end_request: I/O error, dev sda, sector 2709455 Aug 21 17:48:38 backup kernel: ata1: status=0xff { Busy } Aug 21 17:48:38 backup kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002 Aug 21 17:48:38 backup kernel: sda: Current: sense key: Aborted Command Aug 21 17:48:38 backup kernel: Additional sense: Scsi parity error Aug 21 17:48:38 backup kernel: end_request: I/O error, dev sda, sector 2709463 Aug 21 17:49:08 backup kernel: ata1: status=0xff { Busy } Aug 21 17:49:08 backup kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002 Aug 21 17:49:08 backup kernel: sda: Current: sense key: Aborted Command Aug 21 17:49:08 backup kernel: Additional sense: Scsi parity error Aug 21 17:49:08 backup kernel: end_request: I/O error, dev sda, sector 2709471 Although after a reboot, the disks do not have bad sectors (as one would expect). Additionally, I got a corrupted (and unrecoverable) ext2 filesystem on the raid-5 device after a running time of appox. 1 week. It seems that the smartmontools issue some commands that interfere with disk i/o and can lead to filesystem (and worse, raid) corruption. Best Regards, Georg Portage 2.1-r1 (default-linux/x86/2006.0, gcc-3.4.6, glibc-2.3.6-r4, 2.6.17-gentoo-r4 i686) ================================================================= System uname: 2.6.17-gentoo-r4 i686 AMD Sempron(tm) 2400+ Gentoo Base System version 1.6.15 app-admin/eselect-compiler: [Not Present] dev-lang/python: 2.3.5, 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: [Not Present] dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=i686 -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -march=i686 -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict" GENTOO_MIRRORS="http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo http://mirrors.sec.informatik.tu-darmstadt.de/gentoo/ http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.de.gentoo.org/gentoo-portage" USE="x86 X acpi acpi4linux alsa apache2 apm cli crypt dlloader dri eds emboss fortran gstreamer ipv6 isdnlog mp3 nptl ogg pam pcre png pppd qt3 qt4 readline reflection session spl ssl tcpd truetype-fonts type1-fonts udev vorbis xml xorg zlib elibc_glibc input_devices_keyboard input_devices_mouse input_devices_evdev kernel_linux userland_GNU" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
Reopen with /etc/smartd.conf and 'hdparm -I /dev/sd?' output attached.
Created attachment 94796 [details] hdparm -I /dev/sd{a,b,c,d}
Created attachment 94797 [details] /etc/smartd.conf
Attachments as requested.
since you've made your own custom config file, why dont you start with the stock one and see what options are causing you troubles
I won't use this software any more. It took me too long to figure the bug out. Plus, it is only a nice addon for raid storage but not necessary since disk failures are not crucial. Here's some additional information: It is _always_ /dev/sda that fails, regardless of the attached disk (I swapped them). Additionally, the disks attached to the controller are not detected in the correct order. The disk on port1 is detected as /dev/sdd, on port 3 as /dev/sda and so on. The incorrect detection may be a driver specific issue and not related to the data corruption. I strongly doubt that it has anything to do with the custom config file because I only set custom device checking intervals. Regards, Georg
Well, to be honest I think your SATA cable or your controller is faulty, and this has nothing in common w/ smartmontools. Not much we could do here if you are not going to test anything for us.
Problem is, that this is the backup server at work (and unfortunately it is a small business that cannot afford a production and a testing system). I can definitely say that these errors never happened for about 1 year. They occured first when I used smartmontools about two weeks ago to additionally watch the disks for errors. Now that I disabled smartmontools, everything works as expected. I am currently restoring as many backups as possible from our external hard disks and do not want to break things again. Sorry that testing is not possible. It may be an error in the SATA-Controller, but I rather think it is in the driver or in the way smartmontools accesses SATA disks. The error occurs almost only with heavy load, such as rebuilding the array or copying large chunks of data to it. Regards, Georg
those PIO errors are normal ... it means you're trying to use an option the device does not understand if you read more of your logs or just ran smartd with -d, you'd see like: Device: /dev/sdb, opened Device: /dev/sdb, not found in smartd database. Error SMART Enable Auto-save failed: Input/output error Device: /dev/sdb, could not enable SMART Attribute Autosave. Error SMART Enable Automatic Offline failed: Input/output error Device: /dev/sdb, enable SMART Automatic Offline Testing failed. and then the kernel would spit: ata2: PIO error ata2: status=0x50 { DriveReady SeekComplete } for each feature that failed as for the I/O errors, i'm pretty sure that is not smartd's fault