I've been running into many instances where my ext3 fs has been inexplicably corrupted, mostly in /usr/sbin but in other places as well. Most typically I see this in /var/log/messages after trying to access a file, getting 'Input/output error', then removing it: Mar 14 18:42:15 ex1503 EXT3-fs warning (device ide0(3,2)): ext3_unlink: Deleting nonexistent file (568640), 0 Most often though the filesystem has errors on boot, even though it's run the journal. Most often after that the contents of my /usr/sbin are completely missing. The *really* bizarre part is that the content sometimes come back after another fsck.ext3 run through.. I can't say whether the patches on the referenced URL will help but the author does say "These fix fairly serious problems, and they should be applied." I'm compiling a kernel now and hope to see some better results. I've mounted root with data=journal. I've also turned off the low-latency and preemptable kernel options. All in hopes of fixing the problem, but nothing helped. Except to slow the machine down sometimes :) Reproducible: Sometimes Steps to Reproduce: 1. Build gentoo-sources-2.4.20-r1 2. Run it for a while as a regular desktop machine, maybe upgrade it from 1.4_r2 to _r3. 3. Touch /forcefsck and reboot. Actual Results: Most often the filesystem has errors and removes contents of /usr/sbin. Done again, sometimes, /usr/sbin returns. Expected Results: No filesystem corruption should be seen, ever. In a perfect world anyway :) Machine is a Dell Latitude C840 laptop, 2.2 GHz Pentium 4, 1GB RAM, 40GB hard drive. New as of about 2/2003. Portage 2.0.47-r10 (default-x86-1.4, gcc-3.2.2, glibc-2.3.1-r2) ================================================================= System uname: 2.4.20-gentoo-r1 i686 Mobile Intel(R) Pentium(R) 4 - M CPU 2.20GHz GENTOO_MIRRORS=" ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo http://csociety-ftp.ecn.purdue.edu/pub/gentoo/ ftp://mirror.iawnet.sandia.gov/pub/gentoo/" CONFIG_PROTECT="/etc /var/qmail/control /usr/kde/2/share/config /usr/kde/3/share/config /usr/X11R6/lib/X11/xkb /opt/jakarta/tomcat/conf /usr/share/config" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" PORTDIR="/usr/portage" DISTDIR="/usr/portage/distfiles" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR_OVERLAY="/usr/local/portage" USE="x86 oss 3dnow apm arts avi crypt encode gif jpeg libg++ libwww mikmod mmx mpeg ncurses nls pdflib png quicktime spell truetype xml2 xmms xv zlib gdbm berkdb slang readline svga java X sdl gpm tcpd pam ssl python imlib oggvorbis gnome gtk motif opengl alsa apache2 cdr -cups dvd esd ethereal gd gphoto2 gps junit -kde mbox mozilla mysql pcmcia pda perl pic plotutils postgres -qt samba scanner sse tcltk tetex tiff trusted usb" COMPILER="gcc3" CHOST="i686-pc-linux-gnu" CFLAGS="-O2 -mcpu=pentium3 -pipe" CXXFLAGS="-O2 -mcpu=i686 -pipe" ACCEPT_KEYWORDS="x86" MAKEOPTS="-j2" AUTOCLEAN="yes" SYNC="rsync://rsync.gentoo.org/gentoo-portage" FEATURES="sandbox buildpkg ccache distcc userpriv usersandbox"
Created attachment 9421 [details] Configuration file of the kernel that causes the problem.
The patches didn't help... Although I don't think they're doing any harm :) After running for a while, the next boot shows / as clean but /usr/sbin is empty. The boot after that fsck runs and says **REBOOT LINUX** (but Gentoo doesn't see that and continues on) then /usr/sbin is fine but I'm wary so I reboot anyway. The last (third) boot works fine and everything is happy, / is clean and /usr/sbin exists. WTF I say to myself... I can reduce this by 1 reboot if I remember to touch /forcefsck before I reboot. Anyone have an idea on how I can fix this?!?! PLEASE?!
I've since stopped using data=journal. Just so much fun.
I've seen this exact same problem. After the fsck, the system spit out an error trying to find certain files. I rebooted and things seem fine.
I adjusted the maximum mount count to 1 on my root fs to keep the manual work maintaining this thing to a minimum. And, I'm ignoring the ** REBOOT LINUX ** warnings. I think this is not good. I've also tried gentoo-sources-2.4.19-r10 and that didn't corrupt things as quickly but it still exhibited the same behavior...
have you tried gentoo-sources-2.4.20-r2? and fresh ext3 partitions? thanks, Jay
I'm starting the compile of 2.4.20-r2 right now. However a fresh ext3 partition would mean re-installing and I'd rather not. My 27+GB data partition has *never* exhibited "bad" behavior, which is a bit odd but I won't complain about it :).
2.4.20-gentoo-r2 exhibits the same behavior. I'd like to try the redhat-sources but I haven't been able to get them to compile (and I'm usually pretty good at that :)
I should also mention that I've added -W0 to hdparm in /etc/init.d/hdparm: 'hdparm -d1 -W0 /dev/ide/hd/*u?' in hopes of getting around the "new laptop IDE write-ahead buffer problem" discussed in the forums. It's been running like this since before the bug was filed. Obviously, this hasn't helped either. :(
The redhat-sources package did do better with my ext3 partition. Now the problem is finding the code that really fixed it. Is it in vfs, ext2 or ext3 or maybe somewhere like the IDE code? The 'tune2fs -c 1 /dev/XXX' trick is still working. I'll see what I can do about doing a diff on the vfs, ext2 and ext3 code. I'm not hopeful that this'll be easy... Thanks for listening!
the patches are in gentoo-sources-2.4.20-r3. Jay