Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 476572 - =app-emulation/xen-tools-4.2.2-r3 - Infinite loop in xenstore init script when booting without xen.
Summary: =app-emulation/xen-tools-4.2.2-r3 - Infinite loop in xenstore init script whe...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Ian Delaney (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-11 20:07 UTC by Spooky Ghost
Modified: 2014-05-27 21:55 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
bail out if not in dom0 (xenstored-dom0.patch,364 bytes, patch)
2013-07-24 12:18 UTC, Another Mortal
Details | Diff
Restore 'sleep 15' behaviour on non-dom0 systems (xenstored.initd.patch,493 bytes, patch)
2013-07-25 10:20 UTC, Another Mortal
Details | Diff
Typos fixed, redirection added (xenstored.initd.patch,546 bytes, patch)
2013-07-31 13:08 UTC, Another Mortal
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Spooky Ghost 2013-07-11 20:07:38 UTC
The updated ebuild for this package installed a new version of the xenstore init script which contains this fragment of code:

until xenstore-exists /
do
  echo -n .
  sleep .5
done

While this is fine when booting with xen it results in an infinite loop in the init script when booting without xen.


Reproducible: Always

Steps to Reproduce:
1. Install xen-tools
2. Boot system without Xen
3. Wait forever to get a login prompt
Comment 1 Ian Delaney (RETIRED) gentoo-dev 2013-07-20 17:18:47 UTC
thx Spooky. I think I missed by way of a single . on applying a patch in a prior bug.
Try with sleep 5 not sleep .5

  20 Jul 2013; Ian Delaney <idella4@gentoo.org>
  +files/xen-4.3-fix_dotconfig-gcc.patch, +files/xen-4.3.0-anti-download.patch,
  +xen-tools-4.3.0.ebuild, files/xen-4-fix_dotconfig-gcc.patch,
  files/xen-4.2.0-anti-download.patch, files/xenstored.initd:
  bump; required culling sec patches, edit to configure and upgrading 2 patches,
  correction to xenstored.initd wrt Bug #476572 by Spooky Ghost
Comment 2 Spooky Ghost 2013-07-22 07:41:03 UTC
The length of the sleep is not important.  When you do not boot with the xen hypervisor, i.e. bare-metal Linux, then xenstore does not start.  In this case you are stuck within this until statement when start() is called as xenstore-exists will never be true which prevents a boot of the system.
Comment 3 Spooky Ghost 2013-07-22 07:44:59 UTC
Perhaps in start() you need something like:

is_privileged_domain() {
        grep -qsE '^control_d$' /proc/xen/capabilities
        return $?
}


start() {
...
if is_privileged_domain ; then
  until xenstore-exists /
  do
    ...
  done
else
  echo "Xenstore cannot be started, not a Xen dom0"
  return 1
fi
...
}
Comment 4 Another Mortal 2013-07-24 12:17:36 UTC
(In reply to Spooky Ghost from comment #0)
> 
> Steps to Reproduce:
> 1. Install xen-tools
> 2. Boot system without Xen
> 3. Wait forever to get a login prompt

While I can confirm that there *is* a potential for problems here,
I cannot resist pointing out that the above steps do not reproduce
the infinite loop behavior.  For that to occur, one needs to also
perform the 'shoot-myself-in-the-foot' step of

rc-update add xenstored default

Let's just say this is not exactly the smartest thing to do if one
intends to boot both with and without xen, but I'd certainly agree
that fixing an initscript is much easier than fixing PEBCAK issues.


Just for the record, I'd recommend the following approach (that also
works with the current initscript):

mkdir /etc/runlevels/xen
rc-update -s default xen
rc-update add xenstored xen

Then, append 'softlevel=xen' to the linux kernel command line for
the entry corresponding to booting *with* Xen in the bootloader
configuration..


Now, having gotten that out of my system,
let's return to fixing the initscript..  

I'm no Xen guru; so, I could easily be wrong, but I thought
that to support "disaggregation" xenstored could also be run
in a non-privileged domain, in which case checking for dom0
may not be desirable.  Nevertheless, the error I'm getting
from xenstored when starting non-Xen *is* about 'privileged
command interface'; so, let's just keep the dom0 assumption.

Then, I'd prefer to check for the existence of /proc/xen/privcmd
(which *is* the "privileged command interface" libxc is trying to
open [see bove]) instead of peeking inside /proc/xen/capabilities.

I'd also place the check as the very first thing in 'start()'
to avoid unnecessary work and to get rid of that ugly error message
mentioned above.

I'll attach a proposed patch.
Comment 5 Another Mortal 2013-07-24 12:18:20 UTC
Created attachment 354086 [details, diff]
bail out if not in dom0
Comment 6 Another Mortal 2013-07-25 10:08:38 UTC
Please ignore this patch.  I've just gotten around to testing it
in a domU, and -of course- /proc/xen/privcmd exists there as well.
I'm thinking that making the waiting conditional on that xenstored
proper had started successfully is probably most fool-proof (also
accounting for the brain-dead corner cases of disabling xenfs or
the /sys/hypervisor entries in the kernel configuration).

The only problem is that xenstored has 0 exit status even on error.
(--somewhat surprisingly, considering that 'barf' ends with exit(1);
 but that's hit _after_ daemonize with a few exit(0) calls..) 

So, kludgily parsing xenstored's standard error would seem necessary.
Not a nice option.

Perhaps the easiest fix is to not wait more than 15 seconds total,
which falls back to a behaviour largely equivalent to the original
sleep 15 command.

Patch coming soon..
Comment 7 Another Mortal 2013-07-25 10:20:51 UTC
Created attachment 354158 [details, diff]
Restore 'sleep 15' behaviour on non-dom0 systems

Alternatively, the dom0 check based on /proc/xem/capabilites
proposed by Sppoky Ghost could also be considered..

--- /etc/init.d/xenstored.orig	2013-07-24 14:10:12.678000000 +0200
+++ /etc/init.d/xenstored	2013-07-24 14:12:29.456000000 +0200
@@ -8,6 +8,7 @@
 }
 
 start() {
+	[ -r /proc/xen/capabilities ] && [ 'control_d' != "$(</proc/xen/capabilities)" && ewarn "Not a Xen dom0." && return 1
 	! [ -x /run/xen ] && mkdir -p /run/xen
 	ebegin "Starting xenstored daemon"
 	start-stop-daemon --start --exec /usr/sbin/xenstored \
Comment 8 Ian Delaney (RETIRED) gentoo-dev 2013-07-25 13:26:02 UTC
(In reply to Al Mighty from comment #4)
> While I can confirm that there *is* a potential for problems here,
> I cannot resist pointing out that the above steps do not reproduce
> the infinite loop behavior.  For that to occur, one needs to also
> perform the 'shoot-myself-in-the-foot' step of
> 

Big Al, I like it, the metaphor and turn of phrase in general.

> rc-update add xenstored default
> 

So Spooky, thou shouldst unset rc-update xenstored default, it seems

> Let's just say this is not exactly the smartest thing to do if one
> intends to boot both with and without xen
> 

Let's

> Now, having gotten that out of my system,
> let's return to fixing the initscript..  
> 

(In reply to Al Mighty from comment #7)
> Created attachment 354158 [details, diff] [details, diff]
> Restore 'sleep 15' behaviour on non-dom0 systems
> 
> Alternatively, the dom0 check based on /proc/xem/capabilites
> proposed by Sppoky Ghost could also be considered..
> 
> --- /etc/init.d/xenstored.orig	2013-07-24 14:10:12.678000000 +0200
> +++ /etc/init.d/xenstored	2013-07-24 14:12:29.456000000 +0200
> @@ -8,6 +8,7 @@
>  }
>  
>  start() {
> +	[ -r /proc/xen/capabilities ] && [ 'control_d' !=
> "$(</proc/xen/capabilities)" ]; && ewarn "Not a Xen dom0." && return 1
>  	! [ -x /run/xen ] && mkdir -p /run/xen
>  	ebegin "Starting xenstored daemon"
>  	start-stop-daemon --start --exec /usr/sbin/xenstored \

The luxury of choice. 
Firstly, Spooky ghost can you report as to whether you actually did the 
rc-update add xenstored xen or not.
Secondly, once verified, it seems the eternal loop is an unfortunate accident that occurred because it could.
Thirdly, while I find the  || ((15 < ++i)) somewhat equiv. to what we started with (a hard coded sleep 15), I'll go for humouring bug Al and try it, EXCEPT I believe "/ || ((15 < ++i))" need be "|| ((15 < ++i)) \".

So;
  25 Jul 2013; Ian Delaney <idella4@gentoo.org> files/xenstored.initd:
  Edit to xenstored.initd wrt Bug #476572 & #475204

Let's get some test runs and reports.
Comment 9 Another Mortal 2013-07-25 13:48:33 UTC
(In reply to Ian Delaney from comment #8)
> Thirdly, while I find the  || ((15 < ++i)) somewhat equiv. to what we
> started with (a hard coded sleep 15), I'll go for humouring bug Al and try
> it, EXCEPT I believe "/ || ((15 < ++i))" need be "|| ((15 < ++i)) \".

Actually, no.  Try this tokenization 'xenstore-exists /' || '((15 < ++i))'.
The xenstore-exists command wants a path argument and '/' was shortest… (-;

BTW, the result is pretty ugly: 17+ error messages and a 15-second wait.


----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----
# /etc/init.d/xenstored start
 * Caching service dependencies ...                                                                                                                                                               [ ok ] * Starting xenstored daemon ...
xc: error: Could not obtain handle on privileged command interface (2 = No such file or directory): Internal error
FATAL: Failed to open connection to hypervisor: No such file or directory
xenstore-exists: couldn't start transaction
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
.xenstore-exists: xs_open: No such file or directory
 * Setting domain0 name record
xenstore-write: xs_open: No such file or directory                                                                                                                                                [ !! ] * ERROR: xenstored failed to start
----- 8< ----- 8< ----- 8< ----- 8< ----- 8< ----- 8< -----

On the other hand, this "only" inconveniences those that run
/etc/init.d/xenstored on non-dom0 systems.  (-;

Doing a dom0-check first would be esthetically much more pleasing...
Comment 10 Ian Delaney (RETIRED) gentoo-dev 2013-07-25 14:29:27 UTC
(In reply to Al Mighty from comment #9)
> (In reply to Ian Delaney from comment #8)
> > Thirdly, while I find the  || ((15 < ++i)) somewhat equiv. to what we
> > started with (a hard coded sleep 15), I'll go for humouring bug Al and try
> > it, EXCEPT I believe "/ || ((15 < ++i))" need be "|| ((15 < ++i)) \".
> 
> Actually, no.  Try this tokenization 'xenstore-exists /' || '((15 < ++i))'.
> The xenstore-exists command wants a path argument and '/' was shortest… (-;
> 
Hmm Big Al, you're right.  However, can you answer these 2?
1.  the result is pretty ugly: 17+ error messages and a 15-second wait.  Are these non-fatal and merely ugly?
2. Your alternative does need a cleanup.  I'd go for;
          if [ -r /proc/xen/capabilities ] && [[ 'control_d' != "$(</proc/xen/capabilities)" ]]; then
                echo "Not a Xen dom0." 
                return 1
        fi
a) 'control_d' != warrants the double brackets since the values are being compared, and you missed closing the single bracket @ ')" && ewarn'
b) ewarn is an ebuild portage wrapper not a bashism, so it ought be plain echo.

I won't change it for now seeing you appear on-line now and it needs changing again at any rate
Comment 11 Ian Delaney (RETIRED) gentoo-dev 2013-07-25 15:04:48 UTC
Had to commit a change for https://bugs.gentoo.org/show_bug.cgi?id=478064 anyway so change of mind, corrected the loop
Comment 12 Another Mortal 2013-07-26 06:51:33 UTC
(In reply to Ian Delaney from comment #10)
> 1.  the result is pretty ugly: 17+ error messages and a 15-second wait.  Are
> these non-fatal and merely ugly?

In the sanse that nothing actually breaks, they are non-fatal.
Although xenstored insists on printing "FATAL", as you can see
in Comment#9.


> 2. Your alternative does need a cleanup.  I'd go for;
>           if [ -r /proc/xen/capabilities ] && [[ 'control_d' !=
> "$(</proc/xen/capabilities)" ]]; then
>                 echo "Not a Xen dom0." 
>                 return 1
>         fi

Sure, that looks more proper.


> a) 'control_d' != warrants the double brackets since the values are being
> compared, and you missed closing the single bracket @ ')" && ewarn'

I don't think you need double brackets.
(No 'pattern mathing' going on, just plain string compare).

My mistake was adding a ';' after ']' before '&& ewarn'.


> b) ewarn is an ebuild portage wrapper not a bashism, so it ought be plain
> echo.

I put 'ewarn' on purpose (to get the fancy colour warning ;-)),
but runscript will report that xenstored failed to start, anyway;
so, a plain echo will also work.
Comment 13 Ian Delaney (RETIRED) gentoo-dev 2013-07-26 07:33:17 UTC
portage wants the POSIX compliant, however let's await Spook Ghost to reply and test the other if needed.
Comment 14 Another Mortal 2013-07-31 11:40:05 UTC
I'm not that well-versed in POSIX;
so, this all could be wrong, but
for compliance, the following _may_
also need to be changed:

((15 < i))

to

[ 15 -lt $i ]

and

$(</proc/xen/capabilities)

to


$(cat /proc/xen/capabilities)



As for the ugly error messages, adding 'exec 2> /dev/null'
as the first statement in 'start()' will make them go away,  (-;
except that one might miss other important error messages
in other situations..

I'd certainly opt for redirecting stderr for the 'xenstore-exist /'
command, though (i.e. change from the current version to
'until xenstore-exists / 2> /dev/null || [ 15 -lt $i ]').
Comment 15 Another Mortal 2013-07-31 13:08:12 UTC
Created attachment 354722 [details, diff]
Typos fixed, redirection added
Comment 16 Ian Delaney (RETIRED) gentoo-dev 2013-08-01 07:22:15 UTC
(In reply to Al Mighty from comment #15)
> Created attachment 354722 [details, diff] [details, diff]
> Typos fixed, redirection added

hasufell can you run the adjusted patch here thru your POSIX checker and inform us if any edit required?
Comment 17 Ian Delaney (RETIRED) gentoo-dev 2013-08-01 07:24:06 UTC
 
> hasufell can you run the adjusted patch here thru your POSIX checker and
> inform us if any edit required?

seems I needed to Save first with updated CC list before bothering to add this
Comment 18 Yixun Lan archtester gentoo-dev 2014-05-27 21:55:08 UTC
I believe this problem is already fixed in later versions.. try 4.2.4, or 4.3.x

closed, and feel free to re-open if still have problem, thanks