Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 79789 - using kernel-2.6, aacraid will not access the /dev/sdx if the controller is not running in optimal mode.
Summary: using kernel-2.6, aacraid will not access the /dev/sdx if the controller is n...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Daniel Drake (RETIRED)
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks:
 
Reported: 2005-01-27 16:46 UTC by Jeffrey Crawford
Modified: 2005-02-05 02:19 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
aacraid-1.1.2lk-to-aacraid-1.1.5w26patches (aacraid.patch,248.88 KB, patch)
2005-01-29 15:46 UTC, Nick Hadaway
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeffrey Crawford 2005-01-27 16:46:09 UTC
the aacraid module will fail if the underlying controller is running in any mode other thatn Optimal. If the card is in Optimal mode the drives work fine. However if one of the drives is missing the system will no longer access the good drive. The controller is detected, but it can't be accessed with mount of fdisk. After the second drive is brought back on line and it is no longer reinitializing. It will work. It won't even work if both drives are present and syncing.

Reproducible: Always
Steps to Reproduce:
1. Use the 2.6 series kernel and compile in the aacraid module
2. Detatch one of the fail safe (RAID-1) drives
3. The system will load start menu and kernel but partitions can't be mounted

Actual Results:  
The machine will only boot with the kernel-2.4. the kernel-2.6 will just hang
and give up when trying to mount the root filesystem. Note this is only if the
controller is operating in some mode other than Optimal. Again kernel-2.4 works.

snipet from dmesg

Red Hat/Adaptec aacraid driver (1.1.2-lk2 Jan 27 2005)
ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 24 (level, low) -> IRQ 24
AAC0: kernel 4.1.4 build 7403
AAC0: monitor 4.1.4 build 7403
AAC0: bios 4.1.0 build 7403
AAC0: serial bfccd5fafaf001
scsi0 : aacraid
  Vendor: DELL      Model: DATA 1            Rev: V1.0
  Type:   Direct-Access                      ANSI SCSI revision: 02


Expected Results:  
Mounted and accessed the disks normally while the one good drive is still available.
Comment 1 Daniel Drake (RETIRED) gentoo-dev 2005-01-28 03:59:30 UTC
Which kernels have you tried? Please try 2.6.11_rc2 if you haven't already.
Comment 2 Jeffrey Crawford 2005-01-28 11:37:48 UTC
the 2.6.11_rc2 seems to work, but the 2.6.11_rc1 does not! so it looks like that whatever has fixed it is very new, here is the information from the rc2 kernel:

Red Hat/Adaptec aacraid driver (1.1.2-lk2 Jan 28 2005)
ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 24 (level, low) -> IRQ 24
AAC0: kernel 4.1.4 build 7403
AAC0: monitor 4.1.4 build 7403
AAC0: bios 4.1.0 build 7403
AAC0: serial bfccd5fafaf001
scsi0 : aacraid
  Vendor: DELL      Model: DATA 1            Rev: V1.0
  Type:   Direct-Access                      ANSI SCSI revision: 02

=======================================================================
the rc1 kernel had this information:
Red Hat/Adaptec aacraid driver (1.1.2-lk2 Jan 28 2005)
ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 24 (level, low) -> IRQ 24
AAC0: kernel 4.1.4 build 7403
AAC0: monitor 4.1.4 build 7403
AAC0: bios 4.1.0 build 7403
AAC0: serial bfccd5fafaf001
scsi0 : aacraid
  Vendor: DELL      Model: DATA 1            Rev: V1.0
  Type:   Direct-Access                      ANSI SCSI revision: 02

===========================================================================
I don't see any difference above. The changelog between rc1 and rc2 had this to say regarding the aacraid driver:
<coughlan@redhat.com>
	[PATCH] aacraid: remove aac_handle_aif
	
	When aac_command_thread detects an adapter event (AifCmdDriverNotify or
	AifCmdEventNotify) it calls aac_handle_aif. This routine sets a flag,
	calls fib_adapter_complete, and returns. The bad news is that after the
	return, aac_command_thread continues to process the command and calls
	fib_adapter_complete again.
	
	Under some circumstances this causes the driver to take the device
	offline. In my case, it happens with a Dell CERC SATA with a RAID 5 in
	the "building" state:
	
	aacraid: Host adapter reset request. SCSI hang ?
	aacraid: Host adapter appears dead
	scsi: Device offlined - not ready after error recovery: host 0 channel 0
	id 0 lun 0
	SCSI error : <0 0 0 0> return code = 0x6000000
	end_request: I/O error, dev sda, sector 976537592
	
	Mark Salyzyn says the intent is for aac_handle_aif to perform some
	plug-n-play actions based on the adapter event, and return, leaving the
	command completion to the caller.
	
	The attached patch solves the problem by removing aac_handle_aif
	entirely, since it is wrong, and there is currently no code in the
	driver to actually do anything with these events.
	
	Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>

==========================================================================
I hope someone can assess this and figure out if the module is now stable.
Comment 3 Daniel Drake (RETIRED) gentoo-dev 2005-01-29 09:34:37 UTC
Please apply this patch against 2.6.10 and let us know if it fixes it:
http://linux.bkbits.net:8080/linux-2.6/gnupatch@41d9b79fvwpNcY-B1BabPhtde7JhuA
Comment 4 Nick Hadaway 2005-01-29 15:41:37 UTC
adaptec-1.1-5 aacraid driver plus a couple of fixes tested to work with gentoo-dev-sources-2.6.10-r5.  I had 2 servers with crashing aacraid drivers with the stock driver.  The servers have been running stable now for a week with no problems after using the 1.1-5+fixes...

Being as this is bugzilla I made a patch from the 2.6.10-gentoo-r5 kernel to what I have currently.  If you want me to send a tar.bz2 of the aacraid directory I can do that too...
Comment 5 Nick Hadaway 2005-01-29 15:46:22 UTC
Created attachment 49889 [details, diff]
aacraid-1.1.2lk-to-aacraid-1.1.5w26patches
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2005-01-29 15:56:48 UTC
I'm not keen on applying such a big patch. I'd rather just apply the fix alone, which is probably the patch I mentioned in comment #3. If someone could confirm whether that alone solves the problem it would be appreciated.
Comment 7 Jeffrey Crawford 2005-01-31 15:38:09 UTC
I have applied to the patch in comment 3 and I now can use the module normally. I have applied the patch to vanilla-sources and gentoo-sources 2.6.10 verion and they both work. I have also done some load testing on one drive and dual drive operation and have not seen any adverse effects. So far the server is running stable. Will this be added to gentoo-sources any time soon?
Comment 8 Daniel Drake (RETIRED) gentoo-dev 2005-02-01 09:02:59 UTC
Thanks for testing. Yes, it will go in the next release.
Comment 9 Daniel Drake (RETIRED) gentoo-dev 2005-02-05 02:19:44 UTC
Fixed in gentoo-dev-sources-2.6.10-r7