Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 498720 - xendomains init script does not shutdown manually started domains
Summary: xendomains init script does not shutdown manually started domains
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Xen Devs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-20 18:44 UTC by Johann Schmitz (ercpe) (RETIRED)
Modified: 2014-01-23 05:43 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Johann Schmitz (ercpe) (RETIRED) gentoo-dev 2014-01-20 18:44:50 UTC
The stop() function in /etc/init.d/xendomains initiate the shutdown only for domUs in /etc/xen/auto.
I often start domUs temporarily (not running all the time) which causes problems during shutdown. My domU live on LVM volumes. Since the init script does not shutdown all VMs the LVM init script fails to stop properly and the shutdown hangs.

I think we should use the output of "xl list" or something to get the list of domU to shut down.

Reproducible: Always
Comment 1 Yixun Lan archtester gentoo-dev 2014-01-21 13:13:06 UTC
do we really want this?
the domUs which started manually should also been shutdown by user manually.

I think this is rather by design, xendomains should only handle domUs which created in /etc/xen/auto.
even think about the case, user may want to restart all domUs in /etc/xen/auto but leave out those manually started ones.

I can think of one solution, add shutdown_all_domU() and call it in xenconsoled(xenstored)? if we want to shutdown xen, then we definitely need to shutdown all domUs.
Comment 2 Ian Delaney (RETIRED) gentoo-dev 2014-01-21 13:51:45 UTC
the dev above suggests to try adding the line
"xl shutdown -a"
to the init script so as to shutdown any and all.
Please try, test and report on its effect.
Comment 3 Johann Schmitz (ercpe) (RETIRED) gentoo-dev 2014-01-21 17:52:47 UTC
The following line is repeated infinitely during shutdown

device-mapper: remove ioctl on  failed: Device or resource busy

and the device hangs after "Remounting / read-only".
Note: my root fs is not on LVM, so it's possible that the root filesystem cannot be remounted/whatever if it is.


(In reply to Yixun Lan from comment #1)
> do we really want this?
> the domUs which started manually should also been shutdown by user manually.
> 
> I think this is rather by design, xendomains should only handle domUs which
> created in /etc/xen/auto.
> even think about the case, user may want to restart all domUs in
> /etc/xen/auto but leave out those manually started ones.
> 
> I can think of one solution, add shutdown_all_domU() and call it in
> xenconsoled(xenstored)? if we want to shutdown xen, then we definitely need
> to shutdown all domUs.

We recently had a power outage at work and found that our shutdown plan was not working as intended which caused a lot of problems with our RAIDs. I'm deploying a new infrastructure in the next few weeks with exactly the mentioned setup (xen on top of LVM), so i want to make sure that the hosts will shut down correctly even if someone (read: me) started a dom0 without linking into AUTODIR.

So yes, maybe the xendomains script isn't the correct place but a running domU should never prevent the host system from shutdown/reboot.

From what i know the other virtualization packages do shutdown manually started vms (they have dedicated daemons for "hosting" the vms, but still), so xen should do this too.

(In reply to Ian Delaney from comment #2)
> the dev above suggests to try adding the line
> "xl shutdown -a"
> to the init script so as to shutdown any and all.
> Please try, test and report on its effect.

I have added the command as the last command in the stop() function. On shutdown, the initscript failes with

 xendomains: caught SIGTERM, aborting

with the same behaviour regarding LVM. This could be a problem due to parallelism although i haven't set rc_parallel in rc.conf.
Comment 4 Johann Schmitz (ercpe) (RETIRED) gentoo-dev 2014-01-22 19:07:39 UTC
I've done some further testing as discussed on IRC.


I think we can split up this issue into two topics:
#1: Shutdown vs. killing (where "killing" means not properly shutting down the vm) of domains which aren't symlinked into /etc/xen/auto (or whatever AUTODIR is pointing to).
#2: The question: Does the init-scripts work properly with domains on LVM.

From my view, #2 is caused by #1.


For #1: Atm, the xendomains script stops all domains from $AUTODIR. Domains started manually via "xl create foo.cfg" get killed somewhere later in the shutdown process. I haven't confirmed this yet, but since i didn't saw another process shutting down the domains i suspect that the xl processes are simple killed.
Imho one of the xen init scripts should take care of the remaining domU's and issue a xl shutdown -a -w ${domain} on them. Ideally, the parallel shutdown feature from xendomains would work here, too.


For #2: Related to #1, the shutdown process hangs as mentioned in #c3. Adding a "xl shutdown -a -w" to both init scripts, xendomains (between the shutdown calls from AUTODIR and the using_screen) and xenstored (as the first call), does shut down the domains as expected and the system reboots properly. So either way is fine for me.

I also tried a dumb disc instead of a LVM volume, with the same effect. Note: The filesystem itself is a LVM volume, so this test might be false.
dlan: This is what you have tested, right?



I think we can solve this issue by adding the xl shutdown call to one of the two init scripts. As a positive side-effect we don't risk corrupted filesystems from killed domU's.
Comment 5 Yixun Lan archtester gentoo-dev 2014-01-23 04:41:17 UTC
fixed in tree
http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/app-emulation/xen-tools/files/xenconsoled.initd?r1=1.4&r2=1.5


1) shutdown all remaining domUs in xenconsoled
   handling here has one downside that you have to always add xenconsoled into runleve. and the good side, xendomains can still be optional.

2) add "after lvm" into depend() in xenconsoled,
   so make it start after lvm, and shut down before lvm

3) i simply update the init script without rivision bump.
   please re-emerge the ebuild, or update the init script yourself (I think probably you are the only one reporting this, and bumping all ebuilds seems overkill)

(In reply to Johann Schmitz (ercpe) from comment #4)
> I also tried a dumb disc instead of a LVM volume, with the same effect.
> Note: The filesystem itself is a LVM volume, so this test might be false.
> dlan: This is what you have tested, right?
> 
I don't understand here.
two kinds of disk type:
disk   = ['file:/gentoo/xen/gen1_stable_ext4.img,xvda1,w']
or
disk   = ['phy:/dev/vg/lvm1,xvda1,w']
on top of them both using ext4
Comment 6 Johann Schmitz (ercpe) (RETIRED) gentoo-dev 2014-01-23 05:43:54 UTC
(In reply to Yixun Lan from comment #5)
> 1) shutdown all remaining domUs in xenconsoled
>    handling here has one downside that you have to always add xenconsoled
> into runleve. and the good side, xendomains can still be optional.
> 
> 2) add "after lvm" into depend() in xenconsoled,
>    so make it start after lvm, and shut down before lvm

Great! I've re-emerged at it works as expected.


> disk   = ['file:/gentoo/xen/gen1_stable_ext4.img,xvda1,w']

It didn't worked for me, because my /gentoo counterpart was a LVM volume itself, which cause the same problem just with another layer (ext4) between.