I have been testing a lightweight (console mode only) Gentoo 2007.0 install which boots from a 4GB USB drive. The drive has two EXT2 partitions, mounted as "/boot" and "/". I am experiencing intermittent file system corruption on the root partition that occurs after a system shutdown/poweroff (using either "shutdown -h now" or by tapping the power button which initiates a shutdown). What appears to be happening is the power shuts off before the USB drive has completed writing it's buffers to flash. The result is a dirty shutdown and file system corruption. It is usually minor file system errors, but at least once I had to rebuild my install from backup. Suggestion for Fix: When a shutdown with power-off is initiated, after sync/unmount and before power actually shuts off, there should be a user configurable delay (probably 5 secs would do it). This would give the USB drive time to finish writing before power goes off Reproducible: Sometimes Steps to Reproduce: 1. From the command line issue "shutdown -h now" 2. PC will perform an orderly shutdown and shut its power off. 3. Turn on power to boot the PC. Actual Results: During the next boot there sometimes is a warning about a dirty root filesystem and a fsck is performed. The fsck results in errors which sometimes have to be fixed with a manual run. Once time there were so many errors I had to rebuild the filesystem from backup. Expected Results: A clean reboot with no filesystem warnings or errors. Gentoo 2007.0 - fully updated with the latest stable packages from portage. Test system USB drive partitions: (fstab) /dev/sdb1 /boot ext2 defaults,noauto,noatime,nodiratime 1 2 /dev/sdb2 / ext2 defaults,noatime,nodiratime 0 1 Kernels I have tested this with: kernel-2.6.22-gentoo-r5 kernel-2.6.21-gentoo-r4 kernel-2.6.20-gentoo-r8 I have observed the problem on two different systems/platforms: 1) MSI K9AGM-FID mobo, ATI SB600 & ATI Xpress 1150 (RS485) chipsets, 1GB DDR2 ram, Athlon 64 X2 3800+ cpu 2) ECS 945G-M3 V1.0B Viiv mobo, Intel 945G & ICH7DH chipsets, 512MB DDR2 ram, Celeron D 331 cpu I have observed the problem with two different brands of USB drives: 1) Corsair 4GB Flash Vorager GT (CMFUSB2.0-4GBGT) 2) PNY Attache 4GB Flash Drive (P-FD04GU20-RF) (the Corsair is more prone to corruption than the PNY)
This isn't a fix, and only a half good workaround, but try adding a line saying "sync" to /etc/conf.d/local.stop This will force flushing to occur earlier in the shutdown procedure, so the final flush will have much less to do, lessening the chance of problems.
Another thing you can do is add the "sync" option to your /etc/fstab for your usb drive. That way, writing to your usb drive will not be buffered, but done immediately. This will slow things down though. Buffering writes (or asynchronous IO) is the default behavior, so I'm not sure this is an actual bug.
Thanks for the suggestions. Here is what I have found: 1) Even after adding a "sync" to /etc/conf.d/local.stop, there is still quite a bit of write activity after it runs. 2) The root file system is used for things like portage, and kernel compilation. Mounting it with the "sync" option degrades system performance too much. It isn't an option for me. Both your suggestions gave me an idea. Add to "/etc/conf.d/local.stop" the line: "mount / -o remount,sync,dirsync". That puts the root in sync mode only during shutdown. This does slow down the shutdown quite a bit, especially on slower USB drives. But I think it is much safer. Even with this added, my Corsair Voyager GT drive still has problems. It will corrupt files unless power off is delayed (I simulate this with a reboot and then manually power off at a safe place during boot-up). I still would like to have a configurable delay right before power-off. Is that a kernel issue, or can it be done another way?
(In reply to comment #3) > ...Even with this added, my Corsair Voyager GT drive still has problems. I just ran another series of test tonight and have conflicting results. I cannot get this drive to corrupt anymore upon shutdown (even when NOT USING the remount/sync option). I am not sure what has changed. Either I am just not catching it at the right time, or something has changed that has fixed it or is masking it. Should the bug be closed until I can duplicate the problem more consistently?
So to get this clear: you are not able to reproduce this corruption anymore, on neither one of your USB drives? Have you done/changed anything that would have fixed this issue? If you want, do some more testing. If you don't want to, or remain unable to reproduce, we'll close the bug. You can always reopen it later if you want.
Closing, should you experience this corruption again you can reopen.