Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 145719 - gentoo-sources-2.6.17* break nvidia SATA
Summary: gentoo-sources-2.6.17* break nvidia SATA
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-31 08:30 UTC by Paolo Pedroni
Modified: 2006-10-19 05:08 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paolo Pedroni 2006-08-31 08:30:34 UTC
User-Agent:       Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.2 (like Gecko)
Build Identifier: 

Recently I upgraded my home machine (AMD64 X2 4600, Asus A8N-E nForce4 
motherboard, 2 GB RAM, Ati X1300Pro Graphics, 2 Raid1 arrays on the SATA 
controller) to kernel gentoo-sources-2.6.17-r7. Ever since I've been suffering 
hangups whenever I had sustained disk access (tar'ing or untar'ing big files, 
writing several consecutive files to disk, whatever).
I tried with gentoo-sources-2.6.17-r4 and the problem was still there, I tried 
with gentoo-sources-2.6.16-r13 and the problem went away, I tried again with 
vanilla-sources-2.6.17.11 and still no problem.
I then went to check what exactly was added to vanilla kernel 2.6.17 to make a 
gentoo kernel and found the following patches:

Revision 509: Support new nvidia MCP65 SATA controllers (dsd)
Added: 4100_ahci-nvidia-mcp65.patch
Revision 510: Support even more new nvidia SATA hardware (dsd)
Added: 4115_nvidia-sata-new.patch
...
Revision 512: Support new nvidia IDE hardware (dsd)
Added: 4125_nvidia-ide-new.patch
...
Revision 525: Fix patches (dsd)
Added: 4110_nvidia-mcp61.patch
Added: 4135_promise-pdc2037x.patch
Modified: 4015_forcedeth-new-ids.patch
Modified: 4125_nvidia-ide-new.patch
Modified: 4200_fbsplash-0.9.2-r5.patch
Modified: 4205_vesafb-tng-1.0-rc2.patch
Deleted: 4110_promise-pdc2037x.patch

Is it possible that one of these patches inadvertently breaks older Nvidia SATA 
controllers?

Is there any way to check? I can do some more troubleshooting if you wish.

I marked the bug critical because it might eventually lead to data loss. 

Reproducible: Always

Steps to Reproduce:
1. Start system with gentoo-sources-2.6.17-r[47] kernel
2. Start any disk intensive process (in my case the culprit was 'USE="nowin" 
emerge -1av nwn nwn-data' which reads, unpacks, processes and emerges a 1.2 GB 
file)
Actual Results:  
The system crashed and burned with all kjournald, kswapd, kmirrord, pdflush 
kernel threads dead and no access at all to hard drives and raid arrays.

Expected Results:  
Everything working flawlessly like it did before (gentoo-sources-2.6.16 
kernels) and does now (vanilla kernel 2.6.17.11)
Comment 1 Jakub Moc (RETIRED) gentoo-dev 2006-08-31 08:40:43 UTC
Can you bisect it?

http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/

Anyway, no issues here.
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2006-08-31 08:59:52 UTC
You can't bisect if vanilla is unaffected
Comment 3 Paolo Pedroni 2006-08-31 09:11:44 UTC
(In reply to comment #2)
> You can't bisect if vanilla is unaffected
> 

That's what I was about to say.

I remember that I had the same issue (or a very similar one) with the kernel in the 2006.0 livecd (I can't remember which version it was, though).

Anyway I really think that the problem is in one of the revisions that I highlighted (they're the only ones that deal with SATA or IDE stuff). Maybe if someone can prepare some ebuild for gentoo 2.6.17 kernel without those patches I can test them (I don't think I'm good enough to do it myself).
Comment 4 Daniel Drake (RETIRED) gentoo-dev 2006-08-31 09:50:33 UTC
At this stage the best thing to do is to try gentoo-sources-2.6.17 (first release)
Comment 5 Paolo Pedroni 2006-08-31 11:02:45 UTC
(In reply to comment #4)
> At this stage the best thing to do is to try gentoo-sources-2.6.17 (first
> release)
> 

I will try tomorrow, as soon as I have the time and report back.
Comment 6 Paolo Pedroni 2006-09-03 12:40:45 UTC
Sorry if I didn't report sooner, but I had the fault happen with vanilla kernel 2.6.17.11, too. At the moment I'm also pondering the possibility of a hardware fault. I'll make some hardware tests in the near future and then I'll try to bisect the official tree as in Comment #1.

I'll follow up as soon as I can.
Comment 7 Paolo Pedroni 2006-09-10 03:14:42 UTC
(In reply to comment #6)
> Sorry if I didn't report sooner, but I had the fault happen with vanilla kernel
> 2.6.17.11, too. At the moment I'm also pondering the possibility of a hardware
> fault. I'll make some hardware tests in the near future and then I'll try to
> bisect the official tree as in Comment #1.

No hardware fault, AFAICS. I'm still working on determining where exactly the problem lies. I'm starting to bisect the tree.

I'll let you know when I will find something.
Comment 8 Neil Skrypuch 2006-09-22 13:54:21 UTC
Hmm, I'm running a fairly similar system (Athlon X2 4200+, 2G RAM, Asus A8N32-SLI Deluxe, Geforce 6600, though with only one 320G SATA drive and a 160G PATA drive, no RAID) with gentoo-sources-r7 right now.

I haven't noticed any hangups yet. For comparison's sake, I just tar/bzipped my 5.6G distfiles directory, copied the tar to /dev/null, then removed it without any issues.

Are you running a 32 bit or a 64 bit install? I'm using a 64 bit install.
Comment 9 Paolo Pedroni 2006-09-23 10:33:17 UTC
(In reply to comment #8)
> Hmm, I'm running a fairly similar system (Athlon X2 4200+, 2G RAM, Asus
> A8N32-SLI Deluxe, Geforce 6600, though with only one 320G SATA drive and a 160G
> PATA drive, no RAID) with gentoo-sources-r7 right now.
> 
> I haven't noticed any hangups yet. For comparison's sake, I just tar/bzipped my
> 5.6G distfiles directory, copied the tar to /dev/null, then removed it without
> any issues.
> 
> Are you running a 32 bit or a 64 bit install? I'm using a 64 bit install.
> 

I can trigger the problem quite reliably with 'USE="nowin" emerge -1 nwn-data nwn' (warning: it will download 1.2 GB of files and unpack them), while performing some other operations involving the hard drives, such as querying the drives' temperature with 'hddtemp', having 'top' continuously running, and issuing 'ps -aux' from time to time.

I have a hunch that the problem might lie in some weird interaction of the SATARAID system and memory swapping, but I can't pin it down.

Some preliminary testing shows that 2.6.18-gentoo works fine, anyway. If this will hold until 2.6.18 goes stable, I will mark this bug INVALID, or something else.
Comment 10 Daniel Drake (RETIRED) gentoo-dev 2006-10-15 12:30:37 UTC
Is gentoo-sources-2.6.18 still working OK?
Comment 11 Paolo Pedroni 2006-10-15 12:40:13 UTC
(In reply to comment #10)
> Is gentoo-sources-2.6.18 still working OK?
> 

It seems like it is. I don't know what else to say...
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2006-10-19 05:08:33 UTC
OK. Marking fixed as 2.6.18 is in the tree and on it's way to going stable.