Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 73145 - Reiserfs - kernel 2.6.9, fs corruptions
Summary: Reiserfs - kernel 2.6.9, fs corruptions
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-12-02 09:27 UTC by John Gluck
Modified: 2004-12-22 15:49 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Gluck 2004-12-02 09:27:28 UTC
I hesitated to put this in as a filesystem bug it may be a kernel thing or perhaps something that can't be fixed.

Here is the problem:

On bootup, if the kernel is unable to open an initial console (udev is used), then either the root partition or another reiserfs partition mount later will be corrupted. Sometime the corruption is minor, detectable, and fixable. Sometimes entire directories disappear but no corruption is detected. 

Reproducible: Always
Steps to Reproduce:
1. requies a system with at least 2 resierfs filesystems (IE root and home), 2.6.9 kernel with udev.
2. delete all device nodes from the on disk /dev
3. reboot - kernel will complain of not having an initial console
4. Check for all filesystems being properly mounted. The second partition will likely not be mounted.
5. Run resierfsck on the filesystems. It may require "-S --rebuild-tree" to find the corruptions.

Actual Results:  
As above

Expected Results:  
Filesystems would not be corrupted. Kernel panic would be acceptable.

Don't think the emerge --info is really relevant here...

udev is 046
Comment 1 John Gluck 2004-12-02 17:58:24 UTC
When I created this bug, I forgot to include hardware information that may be pertinent.

My system is:

Asus P4PE motherboard - Intel chipset
Pentium 4 not Hyperthreaded.
1 Gig RAM
2 X 120 Gig hard disk (hda and hdb)
DVD-RAM drive (hdd)
CD-RW rdrive (hdc)
Soundblaster Audigy 2 Platinum
Nvidia GeForce 4 (440GX) video card.
Comment 2 tklauser 2004-12-04 03:46:49 UTC
What is the value of RC_DEVICE_TARBALL in your /etc/conf.d/rc file?
Comment 3 John Gluck 2004-12-04 20:13:07 UTC
In response to: What is the value of RC_DEVICE_TARBALL in your /etc/conf.d/rc file?

RC_DEVICE_TARBALL="yes"
Comment 4 tklauser 2004-12-05 02:39:44 UTC
Try to set RC_DEVICE_TARBALL="no" and reboot. Also check if you removed everything related to devfs (from the kernel and the daemon)
Comment 5 John Gluck 2004-12-05 09:59:35 UTC
There is nothing related to devfs in my kernel and the init scripts are not in any of my runlevels.

The only thing I use is udev. The system has **never** run a 2.4 series kernel.

The problem is simple. If the kernel can't open an initial console because the /dev/directory on the root partition is empty, then a Reiserfs will be corrupted.

If I create a console and null device in the /dev/directory on the root partition, then no corruption occurs.

I know that the console device must exist or the kernel will not be able to open an initial console. That's OK. What is not ok and should be protected against is corrupting a filesystem. It is strange since at that particular point in time the root partition should be mounted read-only.

Also, the  RC_DEVICE_TARBALL is really **not** the problem. It works just fine the way it is.
Comment 6 John Gluck 2004-12-10 18:00:26 UTC
OK I figured it out...

Arts has indigestion when being compiled using gcc-3-4-3-r1 but works with 3.3.4-r1 if I do the compile from the package I downloaded. 

Doing an emerge, I get the following:
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking target system type... i686-pc-linux-gnu
checking for a BSD-compatible install... /bin/install -c
checking for -p flag to install... yes
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for i686-pc-linux-gnu-strip... no
checking for strip... strip
checking for a BSD-compatible install... /bin/install -c -p
checking for style of include used by make... GNU
checking for i686-pc-linux-gnu-gcc... i686-pc-linux-gnu-gcc
checking for C compiler default output file name... configure: error: C compilercannot create executables
See `config.log' for more details.

!!! ERROR: kde-base/arts-1.3.2 failed.
!!! Function kde_src_compile, Line 130, Exitcode 77
!!! died running ./configure, kde_src_compile:configure
!!! If you need support, post the topmost build error, NOT this status message.

Comment 7 John Gluck 2004-12-10 19:05:48 UTC
Oooops sorry, I got the wrong bug. Please ignore all the arts stuff


Comment 8 Daniel Drake (RETIRED) gentoo-dev 2004-12-11 12:17:22 UTC
I don't quite follow.. I was under the impression that boot completely halts if /dev/console is not there. But your description suggests that boot continues and tries to mount things?

Also, in step 2 you said delete all the nodes on disk. How did you do this?
Comment 9 John Gluck 2004-12-12 16:32:02 UTC
To delete all the nodes on the disk you need to mount /dev on another directory. If I remember correctly, as root you do something like:

mkdir /test
mount --bind /dev/ /test

That should give you the device nodes that are actually on the disk when you look at /test

then just delete everything like you normally would delete any file.

After that reboot. The kernel will complain about an initial console. Just wait a while. It will keep going. There is no kernel panic. Then take a look at your partitions. If they are reiserfs they'll be screwed up. It may be that this is common to all filesystems but I only tried reiser.

See also: http://www.gentoo.org/doc/en/udev-guide.xml

The process for delete devnodes on disk is there as well

John
Comment 10 Daniel Drake (RETIRED) gentoo-dev 2004-12-13 00:54:06 UTC
Right, so it is just as likely that the bind mounting causes the corruption? Or are you sure this actually happens on next bootup?
Comment 11 John Gluck 2004-12-13 16:12:31 UTC
No, the mount with bind **does not** cause the corruption it is **only** a way to remove stuff in the dev directory that is hidden by mounting a tmpfs over the top of it.

The corruption is *definately* due to the boot up when an initial console is not found.

Another, though immensely more complex way of doing the same thing is to copy over everything from an existing root partition to a new partition. Then prove that it works by booting and having that partition set as root. Then reboot and mount the new partition somewhere (for example /mnt/hd). You will see that the partition has a dev directory with device node. Delete all the device nodes. Check the partition if you like (paranoia). Then reboot agian mounting the new partition as the root partition. It will now get corrupted.

Comment 12 Daniel Drake (RETIRED) gentoo-dev 2004-12-14 00:55:24 UTC
Ok, just said that because I heard a user having a similar problem, but it started as soon as he set up the bind-mount : commands like "shutdown" and "umount" were not found.

Are you able to check and see if this issue still exists on 2.6.10-rc3?
Comment 13 John Gluck 2004-12-14 10:33:53 UTC
It'll take a while to set up on free partitions that I have. I don't want to blowup my running system. But I can certainly run some tests.

The gentoo sources are at 2.6.9-r9. I guess you are talking about the sources from kernel.org when you refer to 2.6.10. Is that correct???

Comment 14 Daniel Drake (RETIRED) gentoo-dev 2004-12-14 11:22:02 UTC
Yes, thanks.
(or emerge =development-sources-2.6.10_rc3)
Comment 15 John Gluck 2004-12-14 23:47:19 UTC
This is strange....

I emerge the 2.6.10 kernel as you requested and built it.

I can use 2.6.10 to boot my system normally and it works fine.

I created a partition with reiserfs and copied my entire root partition to it. So far so good. I removed all entries from /dev on the new partition.

I reboot. This time useing the new partition as the root. I get a kernel panic and a backtrace. This is ok, I kind of expect it since there's no initial console.

I reboot using my old root and the 2.6.10 kernel. I do a reiserfsck of the new partition, there are no errors.

Now here is where it gets strange. I try to mount the partition and mount hangs forever or seg faults. If mount hangs I can't kill it, and I can't reboot. It's just stuck and after about 10 minutes I have no choice but to push the reset button.

If I boot using the 2.6.9-gentoo-r9 and try to mount the new partition, it will mount properly and all the files seem intact. I can also umount the new partition without problems.

I short, there appears to be a new bug in 2.6.10. It doesn't trash the filesystem anymore but it seems to have a problem mounting 3 resierfs filesystems.
Comment 16 Daniel Drake (RETIRED) gentoo-dev 2004-12-22 15:49:03 UTC
Ok. Could you please file a bug for the remaining 2.6.10 issue at http://bugzilla.kernel.org as this is not under our control. I can't see which patch would have fixed the 2.6.9 issue, guess I'm going to have to get 2.6.10 marked stable as soon as I feel its safe.