Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 333241 - app-emulation/xen-tools-4.0.0: System shutdown freezes when attempting to shutdown 'all' domU's
Summary: app-emulation/xen-tools-4.0.0: System shutdown freezes when attempting to shu...
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Xen Devs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-17 21:37 UTC by Joe Barker
Modified: 2011-03-26 11:39 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Adds ability to kill frozen domains after timeout (xend.initd-kill.frozen.domUs.patch,1.28 KB, patch)
2010-08-17 21:58 UTC, Joe Barker
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Barker 2010-08-17 21:37:23 UTC
When preparing to upgrade (or in this case, re-emerge) Xen, I temporarily rename the '/etc/xen/auto' directory so my domU's _will not_ start automatically on next boot (in case I need to fix something).

However, this also results in them not being shut down by the 'auto' init script (xendomains). Instead, they are not shutdown until the xend init script runs. The problem is that this script runs after nfs is shutdown but all my domU's are run over nfsroot. Thus they are frozen at the point the xend init script tries to shut them down. Since, in this case, they will never shutdown cleanly, the system shutdown freezes... requiring a forced shutdown of dom0 to recover.

Reproducible: Always

Steps to Reproduce:
1. Manually start (xm create) a domU running on nfsroot
2. Shutdown system (dom0)


Actual Results:  
System freezes at "Stopping all Xen domains"

Expected Results:  
Frozen domains should be killed or ignored so that dom0 shutdown can proceed.
domU shutdown should be attempted before nfs is shutdown.

Portage 2.1.7.17 (default/linux/amd64/10.0/server, gcc-4.4.3, glibc-2.11.2-r0, 2.6.32.11-xen-3.4.0 x86_64)
=================================================================
System uname: Linux-2.6.32.11-xen-3.4.0-x86_64-Quad-Core_AMD_Opteron-tm-_Processor_2350-with-gentoo-1.12.13
Timestamp of tree: Mon, 16 Aug 2010 07:15:03 +0000
app-shells/bash:     4.0_p28
dev-lang/python:     2.4.4-r13, 2.5.4-r2, 2.6.2-r1
sys-apps/baselayout: 1.12.13
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.65
sys-devel/automake:  1.7.9-r1, 1.10.2, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.3.4, 4.4.3-r2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6b
virtual/os-headers:  2.6.30-r1
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O3 -march=amdfam10 -mno-tls-direct-seg-refs -mmmx -msse -msse2 -msse4a -mfpmath=sse -mcx16 -mpopcnt -msahf -fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ftree-vectorize -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O3 -march=amdfam10 -mno-tls-direct-seg-refs -mmmx -msse -msse2 -msse4a -mfpmath=sse -mcx16 -mpopcnt -msahf -fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ftree-vectorize -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://gentoo.netnitco.net http://gentoo.osuosl.org/ http://gentoo.mirrors.tds.net/gentoo"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j9"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local /usr/portage/local/layman/sunrise /usr/portage/local/layman/voyageur"
SYNC="rsync://rsync.us.gentoo.org/gentoo-portage"
USE="acl acpi amd64 apache2 berkdb bzip2 cli cracklib crypt cxx dri gdbm gpm iconv ipv6 ldap mmx mmx2 mmxext modules mudflap multilib mysql ncurses nls nptl nptlonly pam pcre perl pppd python readline reflection session snmp spl sse sse2 sse3 sse4 ssl sysfs tcpd truetype unicode xattr xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa via vmware voodoo" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Joe Barker 2010-08-17 21:57:52 UTC
Having looked through the scripts, the problem appears to be caused in two places.

First, the cause of the freeze is the use of the "-w"/"--wait" option with xm shutdown. This option has no provision for domains that have frozen (and will never shutdown cleanly), and thus hangs. I have altered (patch attached) the xend init script to not use "--wait" and instead poll domain status until a timeout. If the timeout is reached, the remaining domain's are forcibly killed (xm destroy).

Second, the xend init script has "before nfs" which is what causes my domU's to freeze before they have a chance to be shutdown. I assume there was a reason for having this, so removing it is probably not an option. Instead, I would suggest (and I have done) moving all the shutdown logic (both auto and non-auto) to the xendomains init script, which can and should be set "after nfs".
Comment 2 Joe Barker 2010-08-17 21:58:59 UTC
Created attachment 243399 [details, diff]
Adds ability to kill frozen domains after timeout
Comment 3 Alexey Shvetsov archtester gentoo-dev 2011-03-26 11:39:42 UTC
Xen 4.1 in tree. Please test with it and reopen if it doesnt work