Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 344717 - sys-kernel/gentoo-sources-2.6.34-r12: software raid fails/drops every sata drive in array, but drives okay after reboot (sas)
Summary: sys-kernel/gentoo-sources-2.6.34-r12: software raid fails/drops every sata dr...
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High critical
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-08 18:39 UTC by dak
Modified: 2010-12-17 14:31 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg output from first resync failure (ta790gx-2.6.34-r12-dmesg-resync1,81.09 KB, text/plain)
2010-11-08 18:54 UTC, dak
Details
lspci -v, affected array is attached to marvell sas controller (ta790gx-2.6.34-r12-lspciv,10.11 KB, text/plain)
2010-11-08 19:01 UTC, dak
Details
cat /proc/mdstat (ta790gx-2.6.34-r12-proc.mdstat,435 bytes, text/plain)
2010-11-08 19:01 UTC, dak
Details
kernel .config (ta790gx-2.6.34-r12-kernelconfig,55.14 KB, text/plain)
2010-11-08 19:06 UTC, dak
Details
dmesg from 2.6.36 kernel resync successful (ta790gx-2.6.36-dmesg,58.85 KB, text/plain)
2010-11-08 19:35 UTC, dak
Details
dmesg from second machine fails all drives in the array (a7n8x-2.6.34-r12-dmesg,59.79 KB, text/plain)
2010-11-08 20:13 UTC, dak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dak 2010-11-08 18:39:31 UTC
This happens with gentoo-sources-2.6.34-r12. Software raid5 array fails to resync because all drives are dropped from the array. 

Reproducible: Always

Steps to Reproduce:
1. echo check > /sys/block/md2/md/sync_action
2.
3.

Actual Results:  
All drives are dropped from the affected array md2. Will attach dmesg output.

Expected Results:  
Drives are only marked faulty if they are, in fact, faulty.
Comment 1 dak 2010-11-08 18:54:53 UTC
Created attachment 253673 [details]
dmesg output from first resync failure
Comment 2 dak 2010-11-08 19:01:18 UTC
Created attachment 253675 [details]
lspci -v, affected array is attached to marvell sas controller
Comment 3 dak 2010-11-08 19:01:52 UTC
Created attachment 253677 [details]
cat /proc/mdstat
Comment 4 dak 2010-11-08 19:06:13 UTC
Created attachment 253681 [details]
kernel .config
Comment 5 dak 2010-11-08 19:35:39 UTC
Created attachment 253691 [details]
dmesg from 2.6.36 kernel resync successful

Have changed kernels to gentoo-sources 2.6.36 and a resync was successful, but dmesg is still showing some similar errors.
Comment 6 dak 2010-11-08 20:13:58 UTC
Created attachment 253697 [details]
dmesg from second machine fails all drives in the array

Different machine, same result of dropped drives. This was not during a resync, just a sustained read from the array. I'm not sure about this one, it looks like the network died first? Anyway both the network and all drives in the raid array were gone, but were there after a reboot. The r8169 nic and the kernel were new, I had upgraded the kernel thinking the problems with the other machine were limited to mvsas, so either it's related or I'm just especially unlucky. I plan to change this one back to the very,very old kernel I was using previously.
Comment 7 dak 2010-11-09 15:00:15 UTC
(In reply to comment #6)
> Created an attachment (id=253697) [details]
> dmesg from second machine fails all drives in the array
> 
I found a flaky connection on this second machine.. still not sure how it affected all drives though.
Comment 8 Panagiotis Christopoulos (RETIRED) gentoo-dev 2010-11-20 20:27:48 UTC
Are you sure this isn't happening due to hardware problems? Are your drives OK (check them with smartctl (emerge smartmontools)? I assign this bug to the kernel team, as they probably will understand more from the dmesg logs etc.
Comment 9 Mike Pagano gentoo-dev 2010-11-20 22:56:29 UTC
Is this still an issue?