Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 240394 - [OpenRC] sys-apps/openrc-0.3 wants to stop mdraid when entering single-user-mode
Summary: [OpenRC] sys-apps/openrc-0.3 wants to stop mdraid when entering single-user-mode
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-07 16:03 UTC by Duncan
Modified: 2009-02-05 17:19 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan 2008-10-07 16:03:16 UTC
First, an overview of my system organization:

I have four drive spindles with multiple partitions laid out almost identically.  I run mdraid, creating three RAID devices out those partitions, one on each spindle for each RAID device.  md0 is RAID-1, for /boot.  md_d1 is mdp/partitioned RAID-6, for the main system.  md_d2 is partitioned RAID-0, fast but non-redundant, for the Gentoo and kernel trees, etc, which the net backs up. =:^)  The RAID-6 is split into three partitions, two as main system and system backup snapshot, with the third being a nice big LVM managed volume.  I set it up with the main system partition and its backup outside LVM because the kernel can handle mdraid directly, but can't handle lvm without user-mode assistance, and I didn't want to hassle an initrd/initramfs.

So at boot, the kernel loads the SATA drivers and reads the partitions, then loads the md/mdp drivers so it can read the RAID-6, then loads / from one of the RAID-6 partitions, before going user-mode and loading the RAID-0 and RAID-1 from the mdraid initscript, loading lvm from that initscript, and finally fscking and mounting all the other partitions.

Now the problem:

With openrc-0.3, switching to single user mode (init s) tries to shut down all the services, including unmounting everything but /, stopping lvm (which it can do since root isn't on lvm), and stopping mdraid, which it CANNOT properly do since root IS on mdraid.

The problem is that when it tries to shutdown mdraid, it shuts down the RAID-0 and RAID-1, but can't shut down the RAID-6, so the service doesn't properly stop.  All is fine and good while actually in single-user-mode.  However, once one finishes there and tries to switch back to a normal runlevel (init levels 2 or 3, nonet or default, tho nonet is /my/ default), the problem becomes apparent.  Since mdraid didn't properly stop, it doesn't start.  Since it doesn't start, the RAID-0 and RAID-1 devices remain offline and the filesystems on them cannot load (localmount fails with some mounts failed).

From here, the solution looks to be this:

For "multiplexed" services such as mdraid (my problem), lvm and devicemapper (I'm assuming a similar issue applies to them, tho I don't run the devicemapper service and my lvm doesn't cover root so it shuts down and restarts fine), if it appears they weren't shutdown properly when starting (as from single-user-mode but also if they were stopped manually), try to start the service anyway, thus (re)starting the bits that that WERE properly shutdown.

IOW, if various devices (mdraid, lvm, devicemapper) were shut down going to single-user-mode, they should be started back up returning from single user mode, regardless of whether the service itself was successfully stopped or not, due to / existing on top of said service.

(IDR whether this problem existed with openrc-2.5 or not, as I don't do single user mode all that often.  However, I do recall problems in the past, pre-openrc and with early openrc versions, and concluding that the only way to get it to work correctly without messing with individual services manually was never to return from single user mode at all, but to full reboot instead.  That shouldn't be necessary.  If the scripts can shut devices down, they should be able to start them back up.)

Duncan
Comment 1 Roy Marples 2008-10-07 20:12:56 UTC
Does adding rc_mdadm_keywords="nostop" to /etc/rc.conf help?
Comment 2 Duncan 2008-10-08 11:36:13 UTC
(In reply to comment #1)
> Does adding rc_mdadm_keywords="nostop" to /etc/rc.conf help?

I tried that... copy/pasting it directly so no typos, and no, it didn't.

Then I realized you probably meant mdraid (not mdadm), and tried that.  It didn't either.  

If (as it appears) it's supposed to stop the "stop" function from running, there's a bug as it doesn't seem to.  I even thought well, maybe it was initialized with the old data and I needed to reboot, but that didn't work, either.  It still runs the stop function, shutting down the two mdraid devices it can and failing on the third, which of course causes it to skip restarting and thus restarting the two it stopped, because it thinks it's still running.

Now, how I've been handling it manually, is by /etc/init.d/mdraid zap, then starting it.  That works fine.

Which suggests a solution.  When it's supposed to shutdown, based on an option (say ROOT_ON_SERVICE=yes), if it's going to fail, simply zap what's left and report success.  There's no daemon left running and no harm to restarting it without successfully stopping it, after all, so why not?  Just make it an option so those that don't have root on top of it can still depend on the service status.  And still keep the individual device stops green for success, red if they fail, so those like me with multiple mdraid devices can quickly see if more than the expected one fails.

Again, something similar is likely to apply to lvm and devicemapper, but my root isn't on them, so I can't say for sure.  But that's why I suggested a variable like ROOT_ON_SERVICE, so the same var could be used in all three (and perhaps others) conf.d files, as necessary.  Once someone figured out what it did in the one case, they'd know how it worked in the others as well.
Comment 3 Roy Marples 2008-10-26 20:18:43 UTC
OpenRC in git will fix this as it has the new sysinit runlevel which never stops. So the solution would be to add mdraid to sysinit. Note, you will need the udev script on bug #240984.
Comment 4 Doug Goldstein (RETIRED) gentoo-dev 2009-02-05 17:19:55 UTC
Fixed in OpenRC 0.4.0 and higher. Please upgrade.