When OpenVZ containers are active some processes like cron may interfere with OpenRC start-stop-daemon: starting services may fail if there are active containers running processes that have the same name as the host's service.
start-stop-daemon refuses, for instance, to run vixie-cron (or any other service whose process name is 'cron' as long as there are containers with children that are also named 'cron'). It happened on a Gentoo host I have installed with Debian containers. /etc/init.d/vixie-cron restart ... * Starting vixie-cron ... * start-stop-daemon: /usr/sbin/cron is already running [ !! ] * ERROR: vixie-cron failed to start Temporary solution (hard): temporarily shutdown VZ containers Temporary solution (soft): temporarily stop all services or processes with the same name in all containers Hint: all "blocking" processes are children of init processes. Maybe if start-stop-daemon checked only session leaders...
I'm seeing the same issue here under lxc. The vixie-cron init script uses the following: start-stop-daemon --start --quiet --exec /usr/sbin/cron but it detects the instances of /usr/sbin/cron running in the virtualized containers and refuses to start.
Created attachment 285095 [details] vixie-cron This is a version of the vixie-cron init script that has been rewritten to be more compatible with openrc. Can you please test with this and report back whether the issue still exists? If it doesn't, I will re-assign this bug to the cron team and ask them to update the init script. Then we will need to find other init scripts and update them as well.
Same issue here, and with your init script vixie-cron seems to start and stop correctly.
I have the same problem with practically every script that uses the start-stop-daemonfor example: bacula-fd clamd snmpd sysklogd vixie-cron ...
i don't have the vz application but i can't restart vixie cron whenever i type : /etc/init.d/vixie-cron restart i get: * Stopping vixie-cron ... [ ok ] * Starting vixie-cron ... * start-stop-daemon: /usr/sbin/cron is already running [ !! ] * ERROR: vixie-cron failed to start then i wait 30 seconds and type in: /etc/init.d/vixie-cron start and then it works as if it is waiting for something to finish, maybe a cron job at sleep.
This is not an openrc bug but merely init scripts of said daemons should be rewritten to pid files. Please open separate bug reports against each package and add a blocker for this bug. As for vixie-cron I'll CC maintainer here to review updated init script.
(In reply to comment #7) > This is not an openrc bug but merely init scripts of said daemons should be > rewritten to pid files. Please open separate bug reports against each package > and add a blocker for this bug. As for vixie-cron I'll CC maintainer here to > review updated init script. Actually, we already have a tracker for scripts that need to be rewritten for openrc, bug #367793, so do not make these block this bug, but that one. The init.d scripts can be rewritten in a much simpler form than they are now, like the one for vixie-cron which I attached to this bug.
William, this is not openrc issue, but containers issue. Probably it's good idea to document better suggestion always use pids in init scripts.
(In reply to comment #9) > William, this is not openrc issue, but containers issue. Probably it's good > idea to document better suggestion always use pids in init scripts. Hello Peter. I think I start to understand. It is the daemon itself that checks whether there are processes with the same name running, start-stop-daemon not, is that correct? So isn't there a possible workaround, like using namespaces or cgroups to isolate processes that run within containers from processes that are children of the host's init (1) ?
Vince I failed to parse you question, but... problem is that some init scripts search for daemon by name. And since by default in openvz at host all processes are visible, initscript finds and kills all daemons in all containers. That said, in openvz there is simple workaround for this issue: just run sysctl -w 'kernel.pid_ns_hide_child=1' and restart all containers. Looks like lxc does not have such feature at the moment.
(In reply to comment #11) > Vince I failed to parse you question, but... [...] > That said, in openvz there is simple workaround for this issue: > just run sysctl -w 'kernel.pid_ns_hide_child=1' and restart all containers. > Looks like lxc does not have such feature at the moment. Thanks a lot Peter. That was exactly what I was looking for. Will try that. Merry Christmas and a Happy New Year everyone.
I have reproduced this issue. I believe the best fix is to have rc_find_pids() in OpenRC modified so that on OpenVZ hosts, it automatically filters out processes. Use the following steps in rc_find_pids() on Linux systems: 1) See if /proc/<pid>/status exists 2) If it does, see if it has a "envID:" field 3) If it does, see if "envID:" is set to "0" 4) If so, then it's one of the host's processes and should be a candidate for the list. Otherwise, it is one of the container's processes and should be ignored. This should fix the bug and allow start-stop-daemon to work properly on OpenVZ hosts.
All, I have begun work on a patch for this; it should be ready this week. William
Created attachment 373506 [details, diff] fix-openvz.patch This is my first attempt at a fix for this issue. Can someone who uses OpenVZ test with this patch and report the results? Thanks, William
Yes, I can test this on an OpenVZ host. I will also update the associated Funtoo bug: http://bugs.funtoo.org/browse/FL-1127
@drobbins: I think there may be an issue with the patch. Can you please attach a /proc/<pid>/status file from an OpenVZ host?
muscleman ~ # cat /proc/1/status | wgetpaste Your paste can be seen here: http://bpaste.net/show/193557/
@drobbins: I saw what I needed to see in the example you attached. The patch in comment #15 should work. Please test and let me know your results.
Created attachment 373606 [details, diff] fix-openvz.patch This should be the correct fix; disregard the previous patch and test with this one. The previous patch used a space instead of a tab where it should have used a tab. Thanks, William
with 2nd patch revision "rc" gives segmentation fault. steps to reproduce: 1. emerge openrc, with patch applied 2. run "rc" 3. Segmentation Fault
Here is debug output, if it any helps: # rc Auto launching gdb! Attaching to process 12560 Reading symbols from /sbin/rc...(no debugging symbols found)...done. warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libutil.so.1 Reading symbols from /lib64/librc.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/librc.so.1 Reading symbols from /lib64/libeinfo.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libeinfo.so.1 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libpam.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/libncurses.so.5...(no debugging symbols found)...done. Loaded symbols for /lib64/libncurses.so.5 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6 #0 0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6 No symbol table info available. #1 0x0000000000412a80 in ?? () No symbol table info available. #2 <signal handler called> No symbol table info available. #3 0x00007f6c9c97be52 in feof () from /lib64/libc.so.6 No symbol table info available. #4 0x00007f6c9d2d82f8 in rc_getline () from /lib64/librc.so.1 No symbol table info available. #5 0x00007f6c9d2d532d in rc_find_pids () from /lib64/librc.so.1 No symbol table info available. #6 0x00007f6c9d2d5c16 in rc_service_daemons_crashed () from /lib64/librc.so.1 No symbol table info available. #7 0x00000000004074e5 in ?? () No symbol table info available. #8 0x00007f6c9c92cda5 in __libc_start_main () from /lib64/libc.so.6 No symbol table info available. #9 0x0000000000407c05 in ?? () No symbol table info available. (gdb) bt #0 0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6 #1 0x0000000000412a80 in ?? () #2 <signal handler called> #3 0x00007f6c9c97be52 in feof () from /lib64/libc.so.6 #4 0x00007f6c9d2d82f8 in rc_getline () from /lib64/librc.so.1 #5 0x00007f6c9d2d532d in rc_find_pids () from /lib64/librc.so.1 #6 0x00007f6c9d2d5c16 in rc_service_daemons_crashed () from /lib64/librc.so.1 #7 0x00000000004074e5 in ?? () #8 0x00007f6c9c92cda5 in __libc_start_main () from /lib64/libc.so.6 #9 0x0000000000407c05 in ?? () (gdb)
Created attachment 374934 [details, diff] fix-openvz.patch All, this is the latest version of this patch. I know it works on non-openvz systems, but I need someone to verify that it works on OpenVZ systems. Thanks, William
The patch segfaults because rc_getline() calls xrealloc(), which may allocate memory, but rc_getline() may not get called at all. In which case, then free() is called anyway, potentially on a NULL pointer. So to fix this patch, you should just check to see if line != NULL before calling free().
Okay, turns out there were more problems with this patch. A working version is here: http://git.funtoo.org/funtoo-overlay/tree/sys-apps/openrc/files/fix-openvz.patch?id=e7b8b50970907a56b85e3757b09986096069d17a the free() call was inside a larger while loop. That was the bigger problem. Moving it to the end of the rc_find_pids() function resolves the double-free. I also optimized the scanning for EnvID: in the status file so it completes once a match is found.
Nothing to do for cron-bugs here. Currently we handle openrc feature here.
Created attachment 379228 [details, diff] Working, in production patch for OpenVZ pid filtering from Funtoo Linux.
This is fixed in commit 9eb9b28 and will be included in OpenRC-0.13.