Summary: | kernel - random I/O errors resulting in readonly FS on heavy load | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Michal Špondr <michal.spondr> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | VERIFIED TEST-REQUEST | ||
Severity: | critical | ||
Priority: | High | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
output of: tune2fs -l /dev/sda
output of: smartctl --all /dev/sda output of: /var/log/messages output of: lspci -nxxxvvv |
Description
Michal Špondr
2008-03-14 19:26:45 UTC
Created attachment 146149 [details]
output of: tune2fs -l /dev/sda
Created attachment 146151 [details]
output of: smartctl --all /dev/sda
Created attachment 146153 [details]
output of: /var/log/messages
When the /home/m1c4a1 hangs, I'm able to read /var/log/messages. There are lines like this in the attachment.
I though it has something to do with "NCQ spurious completion problem" (http://www.spinics.net/lists/linux-ide/msg18296.html), because a same disk like mine is mentioned (Hitachi HTS541616J9SA00), so I tried to install gentoo-sources 2.6.24-gentoo-r3. Unfortunatelly it is still hanging. Maybe this bug is related, too, but it doesn't help me to solve the problem: https://bugs.gentoo.org/show_bug.cgi?id=187686 (In reply to comment #0) > # emerge --info > Portage 2.1.4.4 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, > 2.6.24-gentoo-r3 x86_64) > ================================================================= > System uname: 2.6.24-gentoo-r3 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ > 2.00GHz You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How about fixing this first and try again? (In reply to comment #5) > You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How > about fixing this first and try again? There's nothing to fix there, it's perfectly valid. Otherwise, we need some info on the HW involved, like lspci -nxxxvvv output for the SATA controller etc. Created attachment 146203 [details]
output of: lspci -nxxxvvv
Disk related devices are these, I hope:
00:1f.0 ISA bridge: Intel Corporation Mobile LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation Mobile IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation Mobile SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
(In reply to comment #6) > (In reply to comment #5) > > You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How > > about fixing this first and try again? > > There's nothing to fix there, it's perfectly valid. I cannot see this. The AMD64 profile enables compiler optimisations which are not valid for Intel Core2-Duo CPUs. (In reply to comment #8) > (In reply to comment #6) > > (In reply to comment #5) > > > You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How > > > about fixing this first and try again? > > > > There's nothing to fix there, it's perfectly valid. > > I cannot see this. The AMD64 profile enables compiler optimisations which are > not valid for Intel Core2-Duo CPUs. > amd64 is just an alias for x86_64, isn't it? It doesn't have anything to do with AMD CPUs. I managed to catch the bug on 2.6.24-gentoo-r4 kernel again (while playing Call of Duty 1). It's quite hard, because sync of disc is not possible, when the bug occurs, so there are no messages in /var/log/messages about that. So I had to rewrite the approriate part of /var/log/messages by hand. Here is the list: Apr 5 12:31:37 usambara ata1: COMRESET failed (errno=-16) Apr 5 12:31:37 usambara ata1: hard reseting link Apr 5 12:31:42 usambara ata1: port is slow to respond, please be patient (Status 0x80) Now an access (read+write) to /dev/sda7 is unaccessible, but I can still access /dev/sda1 with /, system commands and /var/log/message file. Few seconds later: Apr 5 12:32:12 usambara ata1: COMRESET failed (errno=-16) Apr 5 12:32:12 usambara ata1: limiting SATA link speed to 1.5 Gbps Apr 5 12:32:12 usambara ata1: hard reseting link Now I can't access even /dev/sda1, can't access all the commands and only way to reboot is through poweroff button. I'm trying vanilla-sources package instead of gentoo-sources package and since it (~ 1-2 weeks) there was no failure of filesystem. So I think it has something to do with gentoo patches to kernel. Anything new here? Does the gentoo-sources-2.6.26-r1 or vanilla-2.6.26.2 work? Which vanilla-sources version are you using now? It's not a fair comparison unless you're looking at 2.6.25, and in both cases you ideally want to be running something newer (In reply to comment #13) > Which vanilla-sources version are you using now? It's not a fair comparison > unless you're looking at 2.6.25, and in both cases you ideally want to be > running something newer > I don't remember which vanilla-sources version was actual and stable at the same time as gentoo-sources-2.6.24. But since I switched kernel (gentoo-sources to vanilla-sources, both of them stable in that time), I didn't get any problem with that. Now I'm using vanilla-sources-2.6.25.9 and going to test vanilla-sources-2.6.27_rc4 because of another bug, after it I might test and compare recent gentoo and vanilla-sources. Anything to report here? I had my laptop in a service and they told me I had damaged disk. So they replaced it, possibly the bug is because of this. I haven't tried gentoo-sources since it (and because of the bug 218565 I won't try either, I need my wifi to be functional), so I hope it was "just" a disk failure. I have a new disk now. I am running on 2.6.27-gentoo-r7 and I haven't experienced any of problems mentioned above. So I think it was really a disk failure. |