Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 213433 - kernel - random I/O errors resulting in readonly FS on heavy load
Summary: kernel - random I/O errors resulting in readonly FS on heavy load
Status: VERIFIED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-14 19:26 UTC by Michal Špondr
Modified: 2009-01-05 23:43 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
output of: tune2fs -l /dev/sda (tune2fs.txt,1.56 KB, text/plain)
2008-03-14 19:29 UTC, Michal Špondr
Details
output of: smartctl --all /dev/sda (smartctl.txt,9.06 KB, text/plain)
2008-03-14 19:30 UTC, Michal Špondr
Details
output of: /var/log/messages (messages.txt,7.79 KB, text/plain)
2008-03-14 19:32 UTC, Michal Špondr
Details
output of: lspci -nxxxvvv (lspci.txt,48.51 KB, text/plain)
2008-03-15 11:08 UTC, Michal Špondr
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Špondr 2008-03-14 19:26:45 UTC
Sometimes system stops to react, when I do something more complex using disk. It hangs for a while, disk is working and then it stops. When I try to create a file (e.g. touch myfile), it won't create. Suddenly a filesystem is read-only. I have a few partitions on my disk, when it happens to my system partition / (moreover I have got one "data" partition on /home/m1c4a1), I can't even reboot my filesystem or read logs.
I can't reproduce it accurately, however when I try to play Battle for Wesnoth (or Call of Duty 1 using wine), I'm not able to finish the level without hanging and unmounting the root filesystem.

Reproducible: Sometimes

Steps to Reproduce:
1.Run Battle for Wesnoth
2.Play a game
3.In a hour it will hang

Actual Results:  
System hangs, I have to restart it manually using poweroff button.

Expected Results:  
It should run normally without hanging.

# emerge --info
Portage 2.1.4.4 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r3 x86_64)
=================================================================
System uname: 2.6.24-gentoo-r3 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz
Timestamp of tree: Fri, 14 Mar 2008 12:46:01 +0000
app-shells/bash:     3.2_p17-r1
dev-java/java-config: 1.3.7, 2.1.4
dev-lang/python:     2.4.4-r9
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=nocona"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-O2 -pipe -march=nocona"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LANG="cs_CZ.UTF8"
LC_ALL="cs_CZ.UTF8"
LINGUAS="cs"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X alsa amd64 ao audiofile bindist bluetooth bzip2 cairo cddb cli cracklib crypt dbus dri dvd dvdr flac gdbm geoip gif glut gnutls gpm gtk2 iconv ieee1394 imagemagick isdnlog jpeg jpeg2k lm_sensors midi mikmod mmap mmx mp3 mplayer mudflap ncurses nls nptl nptlonly ogg openal opengl openmp pcre pdf plotutils png pppd qt4 quicktime readline reflection sdl session sharedmem smartcard spl sse sse2 ssl tcpd threads truetype type1 unicode vim-syntax vorbis wifi xorg zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="cs" USERLAND="GNU" VIDEO_CARDS="nvidia"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY

# uname -a
Linux usambara 2.6.24-gentoo-r3 #1 SMP Thu Mar 13 19:10:38 CET 2008 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux
Comment 1 Michal Špondr 2008-03-14 19:29:22 UTC
Created attachment 146149 [details]
output of: tune2fs -l /dev/sda
Comment 2 Michal Špondr 2008-03-14 19:30:18 UTC
Created attachment 146151 [details]
output of: smartctl --all /dev/sda
Comment 3 Michal Špondr 2008-03-14 19:32:06 UTC
Created attachment 146153 [details]
output of: /var/log/messages

When the /home/m1c4a1 hangs, I'm able to read /var/log/messages. There are lines like this in the attachment.
Comment 4 Michal Špondr 2008-03-14 19:33:25 UTC
I though it has something to do with "NCQ spurious completion problem" (http://www.spinics.net/lists/linux-ide/msg18296.html), because a same disk like mine is mentioned (Hitachi HTS541616J9SA00), so I tried to install gentoo-sources 2.6.24-gentoo-r3. Unfortunatelly it is still hanging.
Maybe this bug is related, too, but it doesn't help me to solve the problem: https://bugs.gentoo.org/show_bug.cgi?id=187686
Comment 5 Thomas Juerges 2008-03-15 00:22:56 UTC
(In reply to comment #0)
> # emerge --info
> Portage 2.1.4.4 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0,
> 2.6.24-gentoo-r3 x86_64)
> =================================================================
> System uname: 2.6.24-gentoo-r3 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @
> 2.00GHz

You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How about fixing this first and try again?
Comment 6 Jakub Moc (RETIRED) gentoo-dev 2008-03-15 08:07:13 UTC
(In reply to comment #5)
> You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How
> about fixing this first and try again?

There's nothing to fix there, it's perfectly valid.
 
Otherwise, we need some info on the HW involved, like 
lspci -nxxxvvv output for the SATA controller etc.
Comment 7 Michal Špondr 2008-03-15 11:08:16 UTC
Created attachment 146203 [details]
output of: lspci -nxxxvvv

Disk related devices are these, I hope:
00:1f.0 ISA bridge: Intel Corporation Mobile LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation Mobile IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation Mobile SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
Comment 8 Thomas Juerges 2008-04-01 14:41:28 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How
> > about fixing this first and try again?
> 
> There's nothing to fix there, it's perfectly valid.

I cannot see this. The AMD64 profile enables compiler optimisations which are not valid for Intel Core2-Duo CPUs.
Comment 9 Michal Špondr 2008-04-02 20:22:11 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > You are using the AMD64 profile but have an Intel Core2-Duo CPU installed. How
> > > about fixing this first and try again?
> > 
> > There's nothing to fix there, it's perfectly valid.
> 
> I cannot see this. The AMD64 profile enables compiler optimisations which are
> not valid for Intel Core2-Duo CPUs.
> 
amd64 is just an alias for x86_64, isn't it? It doesn't have anything to do with AMD CPUs.
Comment 10 Michal Špondr 2008-04-05 10:55:34 UTC
I managed to catch the bug on 2.6.24-gentoo-r4 kernel again (while playing Call of Duty 1). It's quite hard, because sync of disc is not possible, when the bug occurs, so there are no messages in /var/log/messages about that. So I had to rewrite the approriate part of /var/log/messages by hand. Here is the list:
Apr 5 12:31:37 usambara ata1: COMRESET failed (errno=-16)
Apr 5 12:31:37 usambara ata1: hard reseting link
Apr 5 12:31:42 usambara ata1: port is slow to respond, please be patient (Status 0x80)
Now an access (read+write) to /dev/sda7 is unaccessible, but I can still access /dev/sda1 with /, system commands and /var/log/message file. Few seconds later:
Apr 5 12:32:12 usambara ata1: COMRESET failed (errno=-16)
Apr 5 12:32:12 usambara ata1: limiting SATA link speed to 1.5 Gbps
Apr 5 12:32:12 usambara ata1: hard reseting link
Now I can't access even /dev/sda1, can't access all the commands and only way to reboot is through poweroff button.
Comment 11 Michal Špondr 2008-08-01 23:30:37 UTC
I'm trying vanilla-sources package instead of gentoo-sources package and since it (~ 1-2 weeks) there was no failure of filesystem. So I think it has something to do with gentoo patches to kernel.
Comment 12 Mike Pagano gentoo-dev 2008-08-16 19:57:47 UTC
Anything new here?  Does the gentoo-sources-2.6.26-r1 or vanilla-2.6.26.2 work?
Comment 13 Daniel Drake (RETIRED) gentoo-dev 2008-08-23 03:43:34 UTC
Which vanilla-sources version are you using now? It's not a fair comparison unless you're looking at 2.6.25, and in both cases you ideally want to be running something newer
Comment 14 Michal Špondr 2008-08-25 14:45:22 UTC
(In reply to comment #13)
> Which vanilla-sources version are you using now? It's not a fair comparison
> unless you're looking at 2.6.25, and in both cases you ideally want to be
> running something newer
> 

I don't remember which vanilla-sources version was actual and stable at the same time as gentoo-sources-2.6.24. But since I switched kernel (gentoo-sources to vanilla-sources, both of them stable in that time), I didn't get any problem with that.
Now I'm using vanilla-sources-2.6.25.9 and going to test vanilla-sources-2.6.27_rc4 because of another bug, after it I might test and compare recent gentoo and vanilla-sources.
Comment 15 Mike Pagano gentoo-dev 2008-09-15 23:48:42 UTC
Anything to report here?
Comment 16 Michal Špondr 2008-09-24 10:12:16 UTC
I had my laptop in a service and they told me I had damaged disk. So they replaced it, possibly the bug is because of this. I haven't tried gentoo-sources since it (and because of the bug 218565 I won't try either, I need my wifi to be functional), so I hope it was "just" a disk failure.
Comment 17 Michal Špondr 2009-01-05 23:43:02 UTC
I have a new disk now. I am running on 2.6.27-gentoo-r7 and I haven't experienced any of problems mentioned above. So I think it was really a disk failure.