Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 344717

Summary: sys-kernel/gentoo-sources-2.6.34-r12: software raid fails/drops every sata drive in array, but drives okay after reboot (sas)
Product: Gentoo Linux Reporter: dak <admin>
Component: [OLD] Core systemAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED NEEDINFO    
Severity: critical CC: alexanderyt, pchrist
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: dmesg output from first resync failure
lspci -v, affected array is attached to marvell sas controller
cat /proc/mdstat
kernel .config
dmesg from 2.6.36 kernel resync successful
dmesg from second machine fails all drives in the array

Description dak 2010-11-08 18:39:31 UTC
This happens with gentoo-sources-2.6.34-r12. Software raid5 array fails to resync because all drives are dropped from the array. 

Reproducible: Always

Steps to Reproduce:
1. echo check > /sys/block/md2/md/sync_action
2.
3.

Actual Results:  
All drives are dropped from the affected array md2. Will attach dmesg output.

Expected Results:  
Drives are only marked faulty if they are, in fact, faulty.
Comment 1 dak 2010-11-08 18:54:53 UTC
Created attachment 253673 [details]
dmesg output from first resync failure
Comment 2 dak 2010-11-08 19:01:18 UTC
Created attachment 253675 [details]
lspci -v, affected array is attached to marvell sas controller
Comment 3 dak 2010-11-08 19:01:52 UTC
Created attachment 253677 [details]
cat /proc/mdstat
Comment 4 dak 2010-11-08 19:06:13 UTC
Created attachment 253681 [details]
kernel .config
Comment 5 dak 2010-11-08 19:35:39 UTC
Created attachment 253691 [details]
dmesg from 2.6.36 kernel resync successful

Have changed kernels to gentoo-sources 2.6.36 and a resync was successful, but dmesg is still showing some similar errors.
Comment 6 dak 2010-11-08 20:13:58 UTC
Created attachment 253697 [details]
dmesg from second machine fails all drives in the array

Different machine, same result of dropped drives. This was not during a resync, just a sustained read from the array. I'm not sure about this one, it looks like the network died first? Anyway both the network and all drives in the raid array were gone, but were there after a reboot. The r8169 nic and the kernel were new, I had upgraded the kernel thinking the problems with the other machine were limited to mvsas, so either it's related or I'm just especially unlucky. I plan to change this one back to the very,very old kernel I was using previously.
Comment 7 dak 2010-11-09 15:00:15 UTC
(In reply to comment #6)
> Created an attachment (id=253697) [details]
> dmesg from second machine fails all drives in the array
> 
I found a flaky connection on this second machine.. still not sure how it affected all drives though.
Comment 8 Panagiotis Christopoulos (RETIRED) gentoo-dev 2010-11-20 20:27:48 UTC
Are you sure this isn't happening due to hardware problems? Are your drives OK (check them with smartctl (emerge smartmontools)? I assign this bug to the kernel team, as they probably will understand more from the dmesg logs etc.
Comment 9 Mike Pagano gentoo-dev 2010-11-20 22:56:29 UTC
Is this still an issue?