30124 – md: md0 still in use, gentoo-sources patch included.

Bug 30124 - md: md0 still in use, gentoo-sources patch included.

Summary: md: md0 still in use, gentoo-sources patch included.

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	x86 Linux

Importance:	Highest major (vote)
Assignee:	x86-kernel@gentoo.org (DEPRECATED)

URL:
Whiteboard:
Keywords:

Duplicates (1):	30173 (view as bug list)
Depends on:
Blocks:

Reported:	2003-10-01 17:08 UTC by linuxboy
Modified:	2003-10-04 07:20 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description linuxboy 2003-10-01 17:08:57 UTC

Reproducible:
 Every reboot

Failing System:
 gentoo-sources-2.4.20-r7
 EVMS2 (probably any LVM, but I use EVMS2)
 RAIDn with LVM objects on top of MD0
 Probably insignifigant details:
   Compaq 1850r-450, RAID1 on 3 SCSI drives with ncr53c875
   genkernel 1.8 with initrd patched for root on EVMS2
   /boot is not on EVMS2

Some people report this does not happen on MD1 or above. (not tested here)
This may be related to Bugs 12612, 17747 and 12460.

Symptom:
Init 0 results in :
...
  * Sending all processes the TERM signal...          [ok]
 md: recovery thread got woken up ...
 md: recovery thread finished ...                     [ok]
  * Deactivating swap...                              [ok]
  * Unmounting filesystems...                         [ok]
  * Remounting remaining filesystems readonly...      [ok]
 md: stopping all md devices.
 md: md0 still in use.
 flushing ide devices: hda
 Power down.

On power up, md0 gets rebuilt:
 md: updating md0 RAID superblock on device
 md: [dev fe:03] [events: 00000024]<6>(write) [dev fe:03]'s sb offset: 8387392
 md: syncing RAID array md0
 md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
 md: using maximum available idle IO bandwith (but not more than 100000 KB/sec) for reconstruction.
 md: using 124k window, over a total of 8387392 blocks.
 md: [dev fe:01] [events: 00000024]<6>(write) [dev fe:01]'s sb offset: 8387392
 md: ... autorun DONE.
15 minutes later...:
 md: md0: sync done.

The cause appears to be related to md reference counts being incorrect.

This problem is documented by Neil Brown for 2.4.22-pre9 and a patch for that version is at:
http://cgi.cse.unsw.edu.au/~neilb/patches/current/linux-stable-leadingedge/included/032MdRefCount

And described more here:
http://www.spinics.net/lists/raid/msg03231.html
And:
http://www.hk.kernel.org/pub/linux/kernel/ports/ia64/v2.4/testing/cset/cset-neilb@cse.unsw.edu.au%7CChangeSet%7C20030728132013%7C24132.txt

There is a larger patch that I did not use here:
http://cgi.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.released/2003-07-25:05/008MdRefCount

I've back-ported the first patch to work with kernel-2.4.20-gentoo-r7 and tested it on my two EVMS2 systems:

--- md.orig     2003-09-20 07:28:53.000000000 -0500
+++ md.c        2003-10-01 14:37:45.000000000 -0500
@@ -1794,10 +1794,12 @@
        int err = 0, resync_interrupted = 0;
        kdev_t dev = mddev_to_kdev(mddev);

+#if 0 /* ->active is not currently reliable */
        if (atomic_read(&mddev->active)>1) {
                printk(STILL_IN_USE, mdidx(mddev));
                OUT(-EBUSY);
        }
+#endif

        if (mddev->pers) {
                /*
@@ -2746,12 +2748,17 @@
                        goto done_unlock;

                case STOP_ARRAY:
-                       if (!(err = do_md_stop (mddev, 0)))
+                       if (inode->i_bdev->bd_openers > 1)
+                               err = -EBUSY;
+                       else if (!(err = do_md_stop (mddev, 0)))
                                mddev = NULL;
                        goto done_unlock;

                case STOP_ARRAY_RO:
-                       err = do_md_stop (mddev, 1);
+                       if (inode->i_bdev->bd_openers > 1)
+                               err = -EBUSY;
+                       else
+                               err = do_md_stop (mddev, 1);
                        goto done_unlock;

        /*

Comment 1 Tim Yamin (RETIRED) gentoo-dev

2003-10-02 15:12:20 UTC

In CVS: the patch is in, just sync your portage tree and remerge gentoo-sources.
Thanks for the bug report!

Comment 2 Martin Holzer (RETIRED) gentoo-dev

2003-10-04 07:20:12 UTC

*** Bug 30173 has been marked as a duplicate of this bug. ***