Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 17571 - gentoo-sources-2.4.20-r1 is missing important ext3 fs patches
Summary: gentoo-sources-2.4.20-r1 is missing important ext3 fs patches
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: x86-kernel@gentoo.org (DEPRECATED)
URL: http://www.zip.com.au/~akpm/linux/ext...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-03-15 14:34 UTC by Paul Kronenwetter
Modified: 2003-04-15 17:49 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Configuration file of the kernel that causes the problem. (.config,30.22 KB, text/plain)
2003-03-15 14:35 UTC, Paul Kronenwetter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Kronenwetter 2003-03-15 14:34:20 UTC
I've been running into many instances where my ext3 fs has been inexplicably
corrupted, mostly in /usr/sbin but in other places as well.  Most typically I
see this in /var/log/messages after trying to access a file, getting
'Input/output error', then removing it:

Mar 14 18:42:15 ex1503 EXT3-fs warning (device ide0(3,2)): ext3_unlink: Deleting
nonexistent file (568640), 0

Most often though the filesystem has errors on boot, even though it's run the
journal.  Most often after that the contents of my /usr/sbin are completely
missing.  The *really* bizarre part is that the content sometimes come back
after another fsck.ext3 run through..  

I can't say whether the patches on the referenced URL will help but the author
does say "These fix fairly serious problems, and they should be applied."

I'm compiling a kernel now and hope to see some better results.

I've mounted root with data=journal.  I've also turned off the low-latency and
preemptable kernel options.  All in hopes of fixing the problem, but nothing
helped.  Except to slow the machine down sometimes :)

Reproducible: Sometimes
Steps to Reproduce:
1. Build gentoo-sources-2.4.20-r1
2. Run it for a while as a regular desktop machine, maybe upgrade it from 1.4_r2
to _r3.
3. Touch /forcefsck and reboot.

Actual Results:  
Most often the filesystem has errors and removes contents of /usr/sbin.  Done
again, sometimes, /usr/sbin returns.

Expected Results:  
No filesystem corruption should be seen, ever.  In a perfect world anyway :)

Machine is a Dell Latitude C840 laptop, 2.2 GHz Pentium 4, 1GB RAM, 40GB hard
drive.  New as of about 2/2003.

Portage 2.0.47-r10 (default-x86-1.4, gcc-3.2.2, glibc-2.3.1-r2)
=================================================================
System uname: 2.4.20-gentoo-r1 i686 Mobile Intel(R) Pentium(R) 4 - M CPU 2.20GHz
GENTOO_MIRRORS=" ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo
http://csociety-ftp.ecn.purdue.edu/pub/gentoo/
ftp://mirror.iawnet.sandia.gov/pub/gentoo/"
CONFIG_PROTECT="/etc /var/qmail/control /usr/kde/2/share/config
/usr/kde/3/share/config /usr/X11R6/lib/X11/xkb /opt/jakarta/tomcat/conf
/usr/share/config"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR_OVERLAY="/usr/local/portage"
USE="x86 oss 3dnow apm arts avi crypt encode gif jpeg libg++ libwww mikmod mmx
mpeg ncurses nls pdflib png quicktime spell truetype xml2 xmms xv zlib gdbm
berkdb slang readline svga java X sdl gpm tcpd pam ssl python imlib oggvorbis
gnome gtk motif opengl alsa apache2 cdr -cups dvd esd ethereal gd gphoto2 gps
junit -kde mbox mozilla mysql pcmcia pda perl pic plotutils postgres -qt samba
scanner sse tcltk tetex tiff trusted usb"
COMPILER="gcc3"
CHOST="i686-pc-linux-gnu"
CFLAGS="-O2 -mcpu=pentium3 -pipe"
CXXFLAGS="-O2 -mcpu=i686 -pipe"
ACCEPT_KEYWORDS="x86"
MAKEOPTS="-j2"
AUTOCLEAN="yes"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
FEATURES="sandbox buildpkg ccache distcc userpriv usersandbox"
Comment 1 Paul Kronenwetter 2003-03-15 14:35:05 UTC
Created attachment 9421 [details]
Configuration file of the kernel that causes the problem.
Comment 2 Paul Kronenwetter 2003-03-15 17:49:29 UTC
The patches didn't help...  Although I don't think they're doing any harm :)

After running for a while, the next boot shows / as clean but /usr/sbin is empty.  The boot after that fsck runs and says **REBOOT LINUX** (but Gentoo doesn't see that and continues on) then /usr/sbin is fine but I'm wary so I reboot anyway.  The last (third) boot works fine and everything is happy, / is clean and /usr/sbin exists.

WTF I say to myself...  I can reduce this by 1 reboot if I remember to touch /forcefsck before I reboot.  

Anyone have an idea on how I can fix this?!?!  PLEASE?!
Comment 3 Paul Kronenwetter 2003-03-15 18:29:12 UTC
I've since stopped using data=journal.  Just so much fun.
Comment 4 Matthew Humphrey 2003-03-21 21:53:57 UTC
I've seen this exact same problem. After the fsck, the system spit out an error trying to find certain files. I rebooted and things seem fine.
Comment 5 Paul Kronenwetter 2003-03-21 22:12:46 UTC
I adjusted the maximum mount count to 1 on my root fs to keep the manual work maintaining this thing to a minimum.  And, I'm ignoring the ** REBOOT LINUX ** warnings.  I think this is not good. 

I've also tried gentoo-sources-2.4.19-r10 and that didn't corrupt things as quickly but it still exhibited the same behavior...
Comment 6 Jay Pfeifer (RETIRED) gentoo-dev 2003-03-21 23:04:36 UTC
have you tried gentoo-sources-2.4.20-r2? and fresh ext3 partitions? 
 
thanks, 
 
Jay 
Comment 7 Paul Kronenwetter 2003-03-22 08:08:13 UTC
I'm starting the compile of 2.4.20-r2 right now.  However a fresh ext3 partition would mean re-installing and I'd rather not.  My 27+GB data partition has *never* exhibited "bad" behavior, which is a bit odd but I won't complain about it :).
Comment 8 Paul Kronenwetter 2003-03-23 08:55:40 UTC
2.4.20-gentoo-r2 exhibits the same behavior.  I'd like to try the redhat-sources but I haven't been able to get them to compile (and I'm usually pretty good at that :) 
Comment 9 Paul Kronenwetter 2003-03-23 09:17:31 UTC
I should also mention that I've added -W0 to hdparm in /etc/init.d/hdparm: 'hdparm -d1 -W0 /dev/ide/hd/*u?'  in hopes of getting around the "new laptop IDE write-ahead buffer problem" discussed in the forums.  It's been running like this since before the bug was filed.  Obviously, this hasn't helped either. :(
Comment 10 Paul Kronenwetter 2003-04-06 11:11:31 UTC
The redhat-sources package did do better with my ext3 partition.  Now the problem is finding the code that really fixed it.  Is it in vfs, ext2 or ext3 or maybe somewhere like the IDE code?  

The 'tune2fs -c 1 /dev/XXX' trick is still working.  I'll see what I can do about doing a diff on the vfs, ext2 and ext3 code.  I'm not hopeful that this'll be easy...

Thanks for listening!
Comment 11 Jay Pfeifer (RETIRED) gentoo-dev 2003-04-15 17:49:36 UTC
the patches are in gentoo-sources-2.4.20-r3. 
 
Jay