User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.2 (like Gecko) Build Identifier: Recently I upgraded my home machine (AMD64 X2 4600, Asus A8N-E nForce4 motherboard, 2 GB RAM, Ati X1300Pro Graphics, 2 Raid1 arrays on the SATA controller) to kernel gentoo-sources-2.6.17-r7. Ever since I've been suffering hangups whenever I had sustained disk access (tar'ing or untar'ing big files, writing several consecutive files to disk, whatever). I tried with gentoo-sources-2.6.17-r4 and the problem was still there, I tried with gentoo-sources-2.6.16-r13 and the problem went away, I tried again with vanilla-sources-2.6.17.11 and still no problem. I then went to check what exactly was added to vanilla kernel 2.6.17 to make a gentoo kernel and found the following patches: Revision 509: Support new nvidia MCP65 SATA controllers (dsd) Added: 4100_ahci-nvidia-mcp65.patch Revision 510: Support even more new nvidia SATA hardware (dsd) Added: 4115_nvidia-sata-new.patch ... Revision 512: Support new nvidia IDE hardware (dsd) Added: 4125_nvidia-ide-new.patch ... Revision 525: Fix patches (dsd) Added: 4110_nvidia-mcp61.patch Added: 4135_promise-pdc2037x.patch Modified: 4015_forcedeth-new-ids.patch Modified: 4125_nvidia-ide-new.patch Modified: 4200_fbsplash-0.9.2-r5.patch Modified: 4205_vesafb-tng-1.0-rc2.patch Deleted: 4110_promise-pdc2037x.patch Is it possible that one of these patches inadvertently breaks older Nvidia SATA controllers? Is there any way to check? I can do some more troubleshooting if you wish. I marked the bug critical because it might eventually lead to data loss. Reproducible: Always Steps to Reproduce: 1. Start system with gentoo-sources-2.6.17-r[47] kernel 2. Start any disk intensive process (in my case the culprit was 'USE="nowin" emerge -1av nwn nwn-data' which reads, unpacks, processes and emerges a 1.2 GB file) Actual Results: The system crashed and burned with all kjournald, kswapd, kmirrord, pdflush kernel threads dead and no access at all to hard drives and raid arrays. Expected Results: Everything working flawlessly like it did before (gentoo-sources-2.6.16 kernels) and does now (vanilla kernel 2.6.17.11)
Can you bisect it? http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ Anyway, no issues here.
You can't bisect if vanilla is unaffected
(In reply to comment #2) > You can't bisect if vanilla is unaffected > That's what I was about to say. I remember that I had the same issue (or a very similar one) with the kernel in the 2006.0 livecd (I can't remember which version it was, though). Anyway I really think that the problem is in one of the revisions that I highlighted (they're the only ones that deal with SATA or IDE stuff). Maybe if someone can prepare some ebuild for gentoo 2.6.17 kernel without those patches I can test them (I don't think I'm good enough to do it myself).
At this stage the best thing to do is to try gentoo-sources-2.6.17 (first release)
(In reply to comment #4) > At this stage the best thing to do is to try gentoo-sources-2.6.17 (first > release) > I will try tomorrow, as soon as I have the time and report back.
Sorry if I didn't report sooner, but I had the fault happen with vanilla kernel 2.6.17.11, too. At the moment I'm also pondering the possibility of a hardware fault. I'll make some hardware tests in the near future and then I'll try to bisect the official tree as in Comment #1. I'll follow up as soon as I can.
(In reply to comment #6) > Sorry if I didn't report sooner, but I had the fault happen with vanilla kernel > 2.6.17.11, too. At the moment I'm also pondering the possibility of a hardware > fault. I'll make some hardware tests in the near future and then I'll try to > bisect the official tree as in Comment #1. No hardware fault, AFAICS. I'm still working on determining where exactly the problem lies. I'm starting to bisect the tree. I'll let you know when I will find something.
Hmm, I'm running a fairly similar system (Athlon X2 4200+, 2G RAM, Asus A8N32-SLI Deluxe, Geforce 6600, though with only one 320G SATA drive and a 160G PATA drive, no RAID) with gentoo-sources-r7 right now. I haven't noticed any hangups yet. For comparison's sake, I just tar/bzipped my 5.6G distfiles directory, copied the tar to /dev/null, then removed it without any issues. Are you running a 32 bit or a 64 bit install? I'm using a 64 bit install.
(In reply to comment #8) > Hmm, I'm running a fairly similar system (Athlon X2 4200+, 2G RAM, Asus > A8N32-SLI Deluxe, Geforce 6600, though with only one 320G SATA drive and a 160G > PATA drive, no RAID) with gentoo-sources-r7 right now. > > I haven't noticed any hangups yet. For comparison's sake, I just tar/bzipped my > 5.6G distfiles directory, copied the tar to /dev/null, then removed it without > any issues. > > Are you running a 32 bit or a 64 bit install? I'm using a 64 bit install. > I can trigger the problem quite reliably with 'USE="nowin" emerge -1 nwn-data nwn' (warning: it will download 1.2 GB of files and unpack them), while performing some other operations involving the hard drives, such as querying the drives' temperature with 'hddtemp', having 'top' continuously running, and issuing 'ps -aux' from time to time. I have a hunch that the problem might lie in some weird interaction of the SATARAID system and memory swapping, but I can't pin it down. Some preliminary testing shows that 2.6.18-gentoo works fine, anyway. If this will hold until 2.6.18 goes stable, I will mark this bug INVALID, or something else.
Is gentoo-sources-2.6.18 still working OK?
(In reply to comment #10) > Is gentoo-sources-2.6.18 still working OK? > It seems like it is. I don't know what else to say...
OK. Marking fixed as 2.6.18 is in the tree and on it's way to going stable.