Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 384265 - app-emulation/lxc: initscript: don't wait 15 seconds to stop but check the status with timeout
Summary: app-emulation/lxc: initscript: don't wait 15 seconds to stop but check the st...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: Normal enhancement (vote)
Assignee: Diego Elio Pettenò (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-23 23:50 UTC by Stef Simoens
Modified: 2011-09-28 21:48 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stef Simoens 2011-09-23 23:50:19 UTC
It seems more efficient to check if the container has stopped instead of waiting a fixed 15 seconds.

Reproducible: Always




Proposed fix:


--- lxc.orig	2011-09-18 00:35:25.000000000 +0200
+++ lxc	2011-09-24 01:41:17.648195347 +0200
@@ -117,9 +117,14 @@
 	kill -INT ${init_pid}
 	eend $?
 
-	sleep 15
-
 	missingprocs=$(pgrep -P ${init_pid})
+	TIMEOUT=${TIMEOUT:-30}
+	i=0
+	while [ -n "${missingprocs}" ] && [ $i -lt ${TIMEOUT} ]; do
+		sleep 1
+		missingprocs=$(pgrep -P ${init_pid})
+		i=$(expr $i + 1)
+	done
 
 	if [ -n "${missingprocs}" ]; then
 		ewarn "Something failed to properly shut down in ${CONTAINER}"
Comment 1 Stef Simoens 2011-09-28 21:15:42 UTC
Hello. Not sure if I can just add this here; or if I should open a new bug; or if I should add this to the (still open) bug #330835

I like the initscript of lxc-0.7.5; however there is (I believe) a bug in the determination of the "init" process in the container.
Probably this doesn't show when the containers are made at startup, however if you start a container after some time (i.e. when the calculated PID is high), the PID of the init-process will be "high".
By running processes in the container, the PID of programs in the container (as seen on the main host) will be lower than the PID of the init-program.
As ${cgroupmount}/${CONTAINER}/tasks is sorted incrementally, the first PID in this file will *NOT* be the init-process but a random process.

Result: not init, but some random program is killed with the -INT signal. And the container will not shut down nicely.

Probably, the lxc-intended solution is to use lxc-kill, but that would not give the flexibility to wait until all processes are stopped (cf. the above mentioned patch).

I found that lxc-info --name ${CONTAINER} --pid gives this information, and still keeps the flexibility to wait to shut down the container until all processes in the container are cleanly shut down.

Patch (to be applied on top of above mentioned patch):

--- /etc/init.d/lxc~	2011-09-24 01:41:17.000000000 +0200
+++ /etc/init.d/lxc	2011-09-27 01:25:57.429081815 +0200
@@ -111,7 +111,7 @@
 	    return 0
 	fi
 
-	init_pid=$(head -n1 ${cgroupmount}/${CONTAINER}/tasks)
+	init_pid=$(/usr/sbin/lxc-info --name ${CONTAINER} --pid | cut -d: -f2)
 
 	ebegin "Shutting down system in ${CONTAINER}"
 	kill -INT ${init_pid}
Comment 2 Diego Elio Pettenò (RETIRED) gentoo-dev 2011-09-28 21:48:30 UTC
Thanks Stef, both are in 0.7.5-r2 now (and I was able to reproduce it right away as it doesn't seem to be a problem only if containers are started long after boot...)