Sometimes system stops to react, when I do something more complex using disk. It hangs for a while, disk is working and then it stops. When I try to create a file (e.g. touch myfile), it won't create. Suddenly a filesystem is read-only. I have a few partitions on my disk, when it happens to my system partition / (moreover I have got one "data" partition on /home/m1c4a1), I can't even reboot my filesystem or read logs.
I can't reproduce it accurately, however when I try to play Battle for Wesnoth (or Call of Duty 1 using wine), I'm not able to finish the level without hanging and unmounting the root filesystem.
Steps to Reproduce:
1.Run Battle for Wesnoth
2.Play a game
3.In a hour it will hang
System hangs, I have to restart it manually using poweroff button.
It should run normally without hanging.
# emerge --info
Portage 126.96.36.199 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r3 x86_64)
System uname: 2.6.24-gentoo-r3 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz
Timestamp of tree: Fri, 14 Mar 2008 12:46:01 +0000
dev-java/java-config: 1.3.7, 2.1.4
sys-devel/autoconf: 2.13, 2.61-r1
sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
CFLAGS="-O2 -pipe -march=nocona"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-O2 -pipe -march=nocona"
FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
USE="X alsa amd64 ao audiofile bindist bluetooth bzip2 cairo cddb cli cracklib crypt dbus dri dvd dvdr flac gdbm geoip gif glut gnutls gpm gtk2 iconv ieee1394 imagemagick isdnlog jpeg jpeg2k lm_sensors midi mikmod mmap mmx mp3 mplayer mudflap ncurses nls nptl nptlonly ogg openal opengl openmp pcre pdf plotutils png pppd qt4 quicktime readline reflection sdl session sharedmem smartcard spl sse sse2 ssl tcpd threads truetype type1 unicode vim-syntax vorbis wifi xorg zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="cs" USERLAND="GNU" VIDEO_CARDS="nvidia"
Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
# uname -a
Linux usambara 2.6.24-gentoo-r3 #1 SMP Thu Mar 13 19:10:38 CET 2008 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux
Created attachment 146149 [details]
output of: tune2fs -l /dev/sda
Created attachment 146151 [details]
output of: smartctl --all /dev/sda
Created attachment 146153 [details]
output of: /var/log/messages
When the /home/m1c4a1 hangs, I'm able to read /var/log/messages. There are lines like this in the attachment.
I though it has something to do with "NCQ spurious completion problem" (http://www.spinics.net/lists/linux-ide/msg18296.html), because a same disk like mine is mentioned (Hitachi HTS541616J9SA00), so I tried to install gentoo-sources 2.6.24-gentoo-r3. Unfortunatelly it is still hanging.
Maybe this bug is related, too, but it doesn't help me to solve the problem: https://bugs.gentoo.org/show_bug.cgi?id=187686
(In reply to comment #0)
> # emerge --info
> Portage 188.8.131.52 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0,
> 2.6.24-gentoo-r3 x86_64)
> System uname: 2.6.24-gentoo-r3 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @
You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How about fixing this first and try again?
(In reply to comment #5)
> You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How
> about fixing this first and try again?
There's nothing to fix there, it's perfectly valid.
Otherwise, we need some info on the HW involved, like
lspci -nxxxvvv output for the SATA controller etc.
Created attachment 146203 [details]
output of: lspci -nxxxvvv
Disk related devices are these, I hope:
00:1f.0 ISA bridge: Intel Corporation Mobile LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation Mobile IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation Mobile SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
(In reply to comment #6)
> (In reply to comment #5)
> > You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How
> > about fixing this first and try again?
> There's nothing to fix there, it's perfectly valid.
I cannot see this. The AMD64 profile enables compiler optimisations which are not valid for Intel Core2-Duo CPUs.
(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How
> > > about fixing this first and try again?
> > There's nothing to fix there, it's perfectly valid.
> I cannot see this. The AMD64 profile enables compiler optimisations which are
> not valid for Intel Core2-Duo CPUs.
amd64 is just an alias for x86_64, isn't it? It doesn't have anything to do with AMD CPUs.
I managed to catch the bug on 2.6.24-gentoo-r4 kernel again (while playing Call of Duty 1). It's quite hard, because sync of disc is not possible, when the bug occurs, so there are no messages in /var/log/messages about that. So I had to rewrite the approriate part of /var/log/messages by hand. Here is the list:
Apr 5 12:31:37 usambara ata1: COMRESET failed (errno=-16)
Apr 5 12:31:37 usambara ata1: hard reseting link
Apr 5 12:31:42 usambara ata1: port is slow to respond, please be patient (Status 0x80)
Now an access (read+write) to /dev/sda7 is unaccessible, but I can still access /dev/sda1 with /, system commands and /var/log/message file. Few seconds later:
Apr 5 12:32:12 usambara ata1: COMRESET failed (errno=-16)
Apr 5 12:32:12 usambara ata1: limiting SATA link speed to 1.5 Gbps
Apr 5 12:32:12 usambara ata1: hard reseting link
Now I can't access even /dev/sda1, can't access all the commands and only way to reboot is through poweroff button.
I'm trying vanilla-sources package instead of gentoo-sources package and since it (~ 1-2 weeks) there was no failure of filesystem. So I think it has something to do with gentoo patches to kernel.
Anything new here? Does the gentoo-sources-2.6.26-r1 or vanilla-184.108.40.206 work?
Which vanilla-sources version are you using now? It's not a fair comparison unless you're looking at 2.6.25, and in both cases you ideally want to be running something newer
(In reply to comment #13)
> Which vanilla-sources version are you using now? It's not a fair comparison
> unless you're looking at 2.6.25, and in both cases you ideally want to be
> running something newer
I don't remember which vanilla-sources version was actual and stable at the same time as gentoo-sources-2.6.24. But since I switched kernel (gentoo-sources to vanilla-sources, both of them stable in that time), I didn't get any problem with that.
Now I'm using vanilla-sources-220.127.116.11 and going to test vanilla-sources-2.6.27_rc4 because of another bug, after it I might test and compare recent gentoo and vanilla-sources.
Anything to report here?
I had my laptop in a service and they told me I had damaged disk. So they replaced it, possibly the bug is because of this. I haven't tried gentoo-sources since it (and because of the bug 218565 I won't try either, I need my wifi to be functional), so I hope it was "just" a disk failure.
I have a new disk now. I am running on 2.6.27-gentoo-r7 and I haven't experienced any of problems mentioned above. So I think it was really a disk failure.