Reproducible: Every reboot Failing System: gentoo-sources-2.4.20-r7 EVMS2 (probably any LVM, but I use EVMS2) RAIDn with LVM objects on top of MD0 Probably insignifigant details: Compaq 1850r-450, RAID1 on 3 SCSI drives with ncr53c875 genkernel 1.8 with initrd patched for root on EVMS2 /boot is not on EVMS2 Some people report this does not happen on MD1 or above. (not tested here) This may be related to Bugs 12612, 17747 and 12460. Symptom: Init 0 results in : ... * Sending all processes the TERM signal... [ok] md: recovery thread got woken up ... md: recovery thread finished ... [ok] * Deactivating swap... [ok] * Unmounting filesystems... [ok] * Remounting remaining filesystems readonly... [ok] md: stopping all md devices. md: md0 still in use. flushing ide devices: hda Power down. On power up, md0 gets rebuilt: md: updating md0 RAID superblock on device md: [dev fe:03] [events: 00000024]<6>(write) [dev fe:03]'s sb offset: 8387392 md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc. md: using maximum available idle IO bandwith (but not more than 100000 KB/sec) for reconstruction. md: using 124k window, over a total of 8387392 blocks. md: [dev fe:01] [events: 00000024]<6>(write) [dev fe:01]'s sb offset: 8387392 md: ... autorun DONE. 15 minutes later...: md: md0: sync done. The cause appears to be related to md reference counts being incorrect. This problem is documented by Neil Brown for 2.4.22-pre9 and a patch for that version is at: http://cgi.cse.unsw.edu.au/~neilb/patches/current/linux-stable-leadingedge/included/032MdRefCount And described more here: http://www.spinics.net/lists/raid/msg03231.html And: http://www.hk.kernel.org/pub/linux/kernel/ports/ia64/v2.4/testing/cset/cset-neilb@cse.unsw.edu.au%7CChangeSet%7C20030728132013%7C24132.txt There is a larger patch that I did not use here: http://cgi.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.released/2003-07-25:05/008MdRefCount I've back-ported the first patch to work with kernel-2.4.20-gentoo-r7 and tested it on my two EVMS2 systems: --- md.orig 2003-09-20 07:28:53.000000000 -0500 +++ md.c 2003-10-01 14:37:45.000000000 -0500 @@ -1794,10 +1794,12 @@ int err = 0, resync_interrupted = 0; kdev_t dev = mddev_to_kdev(mddev); +#if 0 /* ->active is not currently reliable */ if (atomic_read(&mddev->active)>1) { printk(STILL_IN_USE, mdidx(mddev)); OUT(-EBUSY); } +#endif if (mddev->pers) { /* @@ -2746,12 +2748,17 @@ goto done_unlock; case STOP_ARRAY: - if (!(err = do_md_stop (mddev, 0))) + if (inode->i_bdev->bd_openers > 1) + err = -EBUSY; + else if (!(err = do_md_stop (mddev, 0))) mddev = NULL; goto done_unlock; case STOP_ARRAY_RO: - err = do_md_stop (mddev, 1); + if (inode->i_bdev->bd_openers > 1) + err = -EBUSY; + else + err = do_md_stop (mddev, 1); goto done_unlock; /*
In CVS: the patch is in, just sync your portage tree and remerge gentoo-sources. Thanks for the bug report!
*** Bug 30173 has been marked as a duplicate of this bug. ***