I hesitated to put this in as a filesystem bug it may be a kernel thing or perhaps something that can't be fixed. Here is the problem: On bootup, if the kernel is unable to open an initial console (udev is used), then either the root partition or another reiserfs partition mount later will be corrupted. Sometime the corruption is minor, detectable, and fixable. Sometimes entire directories disappear but no corruption is detected. Reproducible: Always Steps to Reproduce: 1. requies a system with at least 2 resierfs filesystems (IE root and home), 2.6.9 kernel with udev. 2. delete all device nodes from the on disk /dev 3. reboot - kernel will complain of not having an initial console 4. Check for all filesystems being properly mounted. The second partition will likely not be mounted. 5. Run resierfsck on the filesystems. It may require "-S --rebuild-tree" to find the corruptions. Actual Results: As above Expected Results: Filesystems would not be corrupted. Kernel panic would be acceptable. Don't think the emerge --info is really relevant here... udev is 046
When I created this bug, I forgot to include hardware information that may be pertinent. My system is: Asus P4PE motherboard - Intel chipset Pentium 4 not Hyperthreaded. 1 Gig RAM 2 X 120 Gig hard disk (hda and hdb) DVD-RAM drive (hdd) CD-RW rdrive (hdc) Soundblaster Audigy 2 Platinum Nvidia GeForce 4 (440GX) video card.
What is the value of RC_DEVICE_TARBALL in your /etc/conf.d/rc file?
In response to: What is the value of RC_DEVICE_TARBALL in your /etc/conf.d/rc file? RC_DEVICE_TARBALL="yes"
Try to set RC_DEVICE_TARBALL="no" and reboot. Also check if you removed everything related to devfs (from the kernel and the daemon)
There is nothing related to devfs in my kernel and the init scripts are not in any of my runlevels. The only thing I use is udev. The system has **never** run a 2.4 series kernel. The problem is simple. If the kernel can't open an initial console because the /dev/directory on the root partition is empty, then a Reiserfs will be corrupted. If I create a console and null device in the /dev/directory on the root partition, then no corruption occurs. I know that the console device must exist or the kernel will not be able to open an initial console. That's OK. What is not ok and should be protected against is corrupting a filesystem. It is strange since at that particular point in time the root partition should be mounted read-only. Also, the RC_DEVICE_TARBALL is really **not** the problem. It works just fine the way it is.
OK I figured it out... Arts has indigestion when being compiled using gcc-3-4-3-r1 but works with 3.3.4-r1 if I do the compile from the package I downloaded. Doing an emerge, I get the following: checking build system type... i686-pc-linux-gnu checking host system type... i686-pc-linux-gnu checking target system type... i686-pc-linux-gnu checking for a BSD-compatible install... /bin/install -c checking for -p flag to install... yes checking whether build environment is sane... yes checking for gawk... gawk checking whether make sets $(MAKE)... yes checking for i686-pc-linux-gnu-strip... no checking for strip... strip checking for a BSD-compatible install... /bin/install -c -p checking for style of include used by make... GNU checking for i686-pc-linux-gnu-gcc... i686-pc-linux-gnu-gcc checking for C compiler default output file name... configure: error: C compilercannot create executables See `config.log' for more details. !!! ERROR: kde-base/arts-1.3.2 failed. !!! Function kde_src_compile, Line 130, Exitcode 77 !!! died running ./configure, kde_src_compile:configure !!! If you need support, post the topmost build error, NOT this status message.
Oooops sorry, I got the wrong bug. Please ignore all the arts stuff
I don't quite follow.. I was under the impression that boot completely halts if /dev/console is not there. But your description suggests that boot continues and tries to mount things? Also, in step 2 you said delete all the nodes on disk. How did you do this?
To delete all the nodes on the disk you need to mount /dev on another directory. If I remember correctly, as root you do something like: mkdir /test mount --bind /dev/ /test That should give you the device nodes that are actually on the disk when you look at /test then just delete everything like you normally would delete any file. After that reboot. The kernel will complain about an initial console. Just wait a while. It will keep going. There is no kernel panic. Then take a look at your partitions. If they are reiserfs they'll be screwed up. It may be that this is common to all filesystems but I only tried reiser. See also: http://www.gentoo.org/doc/en/udev-guide.xml The process for delete devnodes on disk is there as well John
Right, so it is just as likely that the bind mounting causes the corruption? Or are you sure this actually happens on next bootup?
No, the mount with bind **does not** cause the corruption it is **only** a way to remove stuff in the dev directory that is hidden by mounting a tmpfs over the top of it. The corruption is *definately* due to the boot up when an initial console is not found. Another, though immensely more complex way of doing the same thing is to copy over everything from an existing root partition to a new partition. Then prove that it works by booting and having that partition set as root. Then reboot and mount the new partition somewhere (for example /mnt/hd). You will see that the partition has a dev directory with device node. Delete all the device nodes. Check the partition if you like (paranoia). Then reboot agian mounting the new partition as the root partition. It will now get corrupted.
Ok, just said that because I heard a user having a similar problem, but it started as soon as he set up the bind-mount : commands like "shutdown" and "umount" were not found. Are you able to check and see if this issue still exists on 2.6.10-rc3?
It'll take a while to set up on free partitions that I have. I don't want to blowup my running system. But I can certainly run some tests. The gentoo sources are at 2.6.9-r9. I guess you are talking about the sources from kernel.org when you refer to 2.6.10. Is that correct???
Yes, thanks. (or emerge =development-sources-2.6.10_rc3)
This is strange.... I emerge the 2.6.10 kernel as you requested and built it. I can use 2.6.10 to boot my system normally and it works fine. I created a partition with reiserfs and copied my entire root partition to it. So far so good. I removed all entries from /dev on the new partition. I reboot. This time useing the new partition as the root. I get a kernel panic and a backtrace. This is ok, I kind of expect it since there's no initial console. I reboot using my old root and the 2.6.10 kernel. I do a reiserfsck of the new partition, there are no errors. Now here is where it gets strange. I try to mount the partition and mount hangs forever or seg faults. If mount hangs I can't kill it, and I can't reboot. It's just stuck and after about 10 minutes I have no choice but to push the reset button. If I boot using the 2.6.9-gentoo-r9 and try to mount the new partition, it will mount properly and all the files seem intact. I can also umount the new partition without problems. I short, there appears to be a new bug in 2.6.10. It doesn't trash the filesystem anymore but it seems to have a problem mounting 3 resierfs filesystems.
Ok. Could you please file a bug for the remaining 2.6.10 issue at http://bugzilla.kernel.org as this is not under our control. I can't see which patch would have fixed the 2.6.9 issue, guess I'm going to have to get 2.6.10 marked stable as soon as I feel its safe.