Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 723494 - app-emulation/libvirt-6.2.0-r2: update to openrc init scripts breaks libvirt-guests - timing issue
Summary: app-emulation/libvirt-6.2.0-r2: update to openrc init scripts breaks libvirt-...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Matthias Maier
URL:
Whiteboard:
Keywords:
: 736609 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-05-17 15:07 UTC by Ian Pickworth
Modified: 2020-10-07 15:47 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
libvirt-guests patch to wait for connections (libvirt-guests-diff.patch,1.93 KB, patch)
2020-05-20 14:45 UTC, Ian Pickworth
Details | Diff
libvirt-guests patch to wait for connections - correction (libvirtd-guests-patch,2.13 KB, patch)
2020-05-21 08:13 UTC, Ian Pickworth
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Pickworth 2020-05-17 15:07:05 UTC
Following an upgrade from 6.0.0-r3 to 6.2.0-r2, running /etc/init.d/libvirt-guests on system startup (boot) failed to start any virtual machines, despite stopping four virtual machines during shutdown.

I found this commit:
https://gitweb.gentoo.org/repo/gentoo.git/commit/app-emulation/libvirt?id=ca0a61eed33d17d0bd434ea5ad5c7bf2f891621c

It changed the way the libvirtd daemon is started. As a result (on my system), /etc/init.d/libvirtd completes before the libvirtd daemon is ready to receive connections. As a result, libvirt-guests (the next service to be started) displays that it is starting four blank vms (i.e. names displayed as blanks).

I made this change to /etc/init.d/libvirtd:
From:
start_stop_daemon_args="-b --env KRB5_KTNAME=/etc/libvirt/krb5.tab"
To:
start_stop_daemon_args="-b -w 3000 --env KRB5_KTNAME=/etc/libvirt/krb5.tab"

The wait of 3 seconds this introduces is enough (on my system) to allow libvit-guests to find a responding libvirtd when it runs.

However, this does not feel like a robust solution. It would be better for /etc/init.d/libvirtd to wait until it could connect to the service it started before exiting - but I do not know how to achieve that.

Alternatively, libvirt-guests could be changed to loop a set number of times to test the libvirtd connection, but again I don't know how to best achieve that.
Comment 1 Ian Pickworth 2020-05-20 11:58:16 UTC
I've investigated this a bit further, and I am sorry that I missed the code at the start of the libvirt-guests start() function:

   for uri in ${LIBVIRT_URIS}; do
      do_virsh "${uri}" connect
      if [ $? -ne 0 ]; then
         eerror "Failed to connect to '${uri}'. Domains may not start."
      fi
   done

The bug I am reporting is in this code. As I see it, there are two things wrong:

1) The function do_virsh returns a $? value of zero even when a connection fails. (I tested this is a little script of my own.) Thus the test always passes.

2) As per the main bug report, libvirtd is not ready for connections when its start script exits. Thus, at least on my system, if the above test worked, I would always get the eerror message, and the domains would not start, on bootup. (Note that this works fine if libvirtd is already running, as is the case when you just stop and start the domains while system remains up.)

I would thus propose that the test code above be changed to use the command:
    local ruri=
    ruri=$(do_virsh "${hvuri}" uri)

and then to check that ${ruri} and ${hvuri} are equal. If not equal, the connection failed.

However, this also needs to take account of the boot startup problem (i.e. it will always show failure on my system). I would propose that an additional variable is added to the conf.d file: LIBVIRT_CONWAIT - the number of seconds to wait for a connection to be active.
Then the startup check can loop down this count (with a sleep 1), stopping on the first successful connection for each uri.

My shell script skills are rubbish, and I don't know how to create patches. However, if this approach is acceptable to the devs, I am willing to put some time in to try.
Comment 2 Ian Pickworth 2020-05-20 14:45:52 UTC
Created attachment 640570 [details, diff]
libvirt-guests patch to wait for connections

Patches init.d and conf.d for libvirt-guests
Adds a configurable wait loop for successful connection on startup.
Comment 3 Ian Pickworth 2020-05-20 14:50:12 UTC
Test results on my system:

1) When libvirtd running:
ian2 ~ # /etc/init.d/libvirt-guests start
 *  Checking connection to qemu:///system ...
 *  Connection to qemu:///system OK
 * Starting libvirt networks ...                                          [ ok ]
 * Starting libvirt domains ...
 *   gentoo-dns1
 *   gentoo-bubble-pnp
 *   ian
 *   gentoo-dns2                                                          [ ok ]
ian2 ~ #

2) Simulated boot by flushing caches:
ian2 ~ # /etc/init.d/virtlogd stop
 * Stopping libvirtd ...                                                  [ ok ]
 * Stopping virtlogd ...                                                  [ ok ]
ian2 ~ #sync; echo 1 > /proc/sys/vm/drop_caches
ian2 ~ # /etc/init.d/libvirt-guests start
 * Starting virtlogd ...                                                  [ ok ]
 * Starting libvirtd ...                                                  [ ok ]
 *  Checking libvirtd connection to qemu:///system ...
. *  Conection to qemu:///system OK
 * Starting libvirt networks ...                                          [ ok ]
 * Starting libvirt domains ...
 *   gentoo-dns1
 *   gentoo-bubble-pnp
 *   ian
 *   gentoo-dns2                                                          [ ok ]
ian2 ~ #

Note single dot above indicating one second wait required.

I commend the patch for consideration.
Comment 4 Ian Pickworth 2020-05-21 08:13:54 UTC
Created attachment 640702 [details, diff]
libvirt-guests patch to wait for connections - correction

I have updated the previous patch too use 'virsh connect' directly, and test $?.
I thought this was better since I have no way of proving that my method with 'virsh uri' will work in every type of connection.

This patch does solve the issue of do_virsh always returning zero (because of the final 'head -n -1' in the command pipe).

I will post my test results in a follow up comment.
Comment 5 Ian Pickworth 2020-05-21 08:28:26 UTC
Test results on my system:

I set LIBVIRT_CONWAIT=10 in /etc/conf.d/libvirtd-guests

1) With /etc/init.d/libvirtd stopped and removed from the run level. Expected result is a failure to connect after 10 attempts:

ian2 ~ # rc-update del libvirtd
 * service libvirtd removed from runlevel default
ian2 ~ # /etc/init.d/virtlogd stop
 * Stopping libvirtd ...                                                  [ ok ]
 * Stopping virtlogd ...                                                  [ ok ]
ian2 ~ # /etc/init.d/libvirt-guests start
 *  Checking connection to qemu:///system ...
.......... * Failed to connect to 'qemu:///system'. Domains may not start.
 * Starting libvirt networks ...                                          [ ok ]
 * Starting libvirt domains ...
 *   
 *   
 *   
 *                                                                        [ ok ]
ian2 ~ #

2) With /etc/init.d/libvirtd stopped, but added the run level. Drop caches to simulate delay experienced at startup. Expected result is at least one failed connection before successful connection:

ian2 ~ # rc-update add libvirtd default
 * service libvirtd added to runlevel default
ian2 ~ # sync; echo 1 > /proc/sys/vm/drop_caches
ian2 ~ # /etc/init.d/libvirt-guests zap  
 * Manually resetting libvirt-guests to stopped state
ian2 ~ # /etc/init.d/libvirt-guests start
 * Starting virtlogd ...                                                  [ ok ]
 * Starting libvirtd ...                                                  [ ok ]
 *  Checking connection to qemu:///system ...
. *  Connection to qemu:///system OK
 * Starting libvirt networks ...                                          [ ok ]
 * Starting libvirt domains ...
 *   gentoo-dns1
 *   gentoo-bubble-pnp
 *   ian
 *   gentoo-dns2                                                          [ ok ]
ian2 ~ # 

3) Restart after a stop with libvertd running. Expected result no wait on connection:

ian2 ~ # /etc/init.d/libvirt-guests start
 *  Checking connection to qemu:///system ...
 *  Connection to qemu:///system OK
 * Starting libvirt networks ...                                          [ ok ]
 * Starting libvirt domains ...
 *   gentoo-dns1
 *   gentoo-bubble-pnp
 *   ian
 *   gentoo-dns2                                                          [ ok ]
ian2 ~ # 


My conclusion is that this patch keeps the functionality (using 'virsh connect') of the original script, but fixes some things:
 - It now tests a valid $? value which is not always zero
 - It allows a user to configure a wait time should they experience a timing issue on startup (as I did)
 - It allows a user to configure no delay if they know there will not be a need for one.

I comment this patch for consideration.
Comment 6 Matthias Maier gentoo-dev 2020-10-07 15:45:13 UTC
*** Bug 736609 has been marked as a duplicate of this bug. ***
Comment 7 Matthias Maier gentoo-dev 2020-10-07 15:46:11 UTC
(From #736609: Georgy Yakovlev from comment #1)
> ewaitfile in the initscript start_post() usually helps to wait for
> socket/pidfile availability, it makes initscript return after file is
> available and has a timeout option.
> 
> you can define it in confd file as a workaround, but ideally it should be a
> part if initscript itself of course.