Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 376817

Summary: make OpenVZ host to automatically filters out container's processes
Product: Gentoo Hosted Projects Reporter: Vince C. <vincent.cadet>
Component: OpenRCAssignee: OpenRC Team <openrc>
Status: RESOLVED FIXED    
Severity: enhancement CC: andreis.vinogradovs, gus, pva, staff
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 367793, 481182    
Attachments: vixie-cron
fix-openvz.patch
fix-openvz.patch
fix-openvz.patch
Working, in production patch for OpenVZ pid filtering from Funtoo Linux.

Description Vince C. 2011-07-28 20:42:29 UTC
When OpenVZ containers are active some processes like cron may interfere with OpenRC start-stop-daemon: starting services may fail if there are active containers running processes that have the same name as the host's service.
Comment 1 Vince C. 2011-07-28 20:51:02 UTC
start-stop-daemon refuses, for instance, to run vixie-cron (or any other service whose process name is 'cron' as long as there are containers with children that are also named 'cron'). It happened on a Gentoo host I have installed with Debian containers.

/etc/init.d/vixie-cron restart
...
 * Starting vixie-cron ...
 * start-stop-daemon: /usr/sbin/cron is already running      [ !! ]
 * ERROR: vixie-cron failed to start

Temporary solution (hard): temporarily shutdown VZ containers
Temporary solution (soft): temporarily stop all services or processes with the same name in all containers

Hint: all "blocking" processes are children of init processes. Maybe if start-stop-daemon checked only session leaders...
Comment 2 Gus Power 2011-08-14 09:01:49 UTC
I'm seeing the same issue here under lxc. The vixie-cron init script uses the following:

start-stop-daemon --start --quiet --exec /usr/sbin/cron

but it detects the instances of /usr/sbin/cron running in the virtualized containers and refuses to start.
Comment 3 William Hubbs gentoo-dev 2011-08-30 16:39:53 UTC
Created attachment 285095 [details]
vixie-cron

This is a version of the vixie-cron init script that has been rewritten
to be more compatible with openrc.

Can you please test with this and report back whether the issue still
exists?

If it doesn't, I will re-assign this bug to the cron team and ask them
to update the init script. Then we will need to find other init scripts
and update them as well.
Comment 4 Jordi Marqués 2011-10-13 07:32:36 UTC
Same issue here, and with your init script vixie-cron seems to start and stop correctly.
Comment 5 Sebastian Bobrecki 2011-10-20 12:21:03 UTC
I have the same problem with practically every script that uses the start-stop-daemonfor example:
bacula-fd
clamd
snmpd
sysklogd
vixie-cron
...
Comment 6 Majed 2011-11-02 08:19:47 UTC
i don't have the vz application but i can't restart vixie cron
whenever i type :
/etc/init.d/vixie-cron restart
i get:

* Stopping vixie-cron ...                                                [ ok ]
 * Starting vixie-cron ...
 * start-stop-daemon: /usr/sbin/cron is already running                   [ !! ]
 * ERROR: vixie-cron failed to start

then i wait 30 seconds and type in:
/etc/init.d/vixie-cron start
and then it works
as if it is waiting for something to finish, maybe a cron job at sleep.
Comment 7 Peter Volkov (RETIRED) gentoo-dev 2011-12-12 05:19:25 UTC
This is not an openrc bug but merely init scripts of said daemons should be rewritten to pid files. Please open separate bug reports against each package and add a blocker for this bug. As for vixie-cron I'll CC maintainer here to review updated init script.
Comment 8 William Hubbs gentoo-dev 2011-12-12 05:30:39 UTC
(In reply to comment #7)
> This is not an openrc bug but merely init scripts of said daemons should be
> rewritten to pid files. Please open separate bug reports against each package
> and add a blocker for this bug. As for vixie-cron I'll CC maintainer here to
> review updated init script.

Actually, we already have a tracker for scripts that need to be rewritten for openrc, bug #367793, so do not make these block this bug, but that one.

The init.d scripts can be rewritten in a much simpler form than they are now, like the one for vixie-cron which I attached to this bug.
Comment 9 Peter Volkov (RETIRED) gentoo-dev 2011-12-14 05:16:29 UTC
William, this is not openrc issue, but containers issue. Probably it's good idea to document better suggestion always use pids in init scripts.
Comment 10 Vince C. 2011-12-14 08:07:29 UTC
(In reply to comment #9)
> William, this is not openrc issue, but containers issue. Probably it's good
> idea to document better suggestion always use pids in init scripts.

Hello Peter. I think I start to understand. It is the daemon itself that checks whether there are processes with the same name running, start-stop-daemon not, is that correct? So isn't there a possible workaround, like using namespaces or cgroups to isolate processes that run within containers from processes that are children of the host's init (1) ?
Comment 11 Peter Volkov (RETIRED) gentoo-dev 2011-12-17 04:16:56 UTC
Vince I failed to parse you question, but...
problem is that some init scripts search for daemon by name. And since by default in openvz at host all processes are visible, initscript finds and kills all daemons in all containers. That said, in openvz there is simple workaround for this issue:
just run sysctl -w 'kernel.pid_ns_hide_child=1' and restart all containers. Looks like lxc does not have such feature at the moment.
Comment 12 Vince C. 2011-12-18 09:04:51 UTC
(In reply to comment #11)
> Vince I failed to parse you question, but...
[...]
> That said, in openvz there is simple workaround for this issue:
> just run sysctl -w 'kernel.pid_ns_hide_child=1' and restart all containers.
> Looks like lxc does not have such feature at the moment.

Thanks a lot Peter. That was exactly what I was looking for. Will try that.

Merry Christmas and a Happy New Year everyone.
Comment 13 Daniel Robbins 2014-03-03 18:00:23 UTC
I have reproduced this issue.

I believe the best fix is to have rc_find_pids() in OpenRC modified so that on OpenVZ hosts, it automatically filters out processes.

Use the following steps in rc_find_pids() on Linux systems:

1) See if /proc/<pid>/status exists
2) If it does, see if it has a "envID:" field
3) If it does, see if "envID:" is set to "0"
4) If so, then it's one of the host's processes and should be a candidate for the list. Otherwise, it is one of the container's processes and should be ignored.

This should fix the bug and allow start-stop-daemon to work properly on OpenVZ hosts.
Comment 14 William Hubbs gentoo-dev 2014-03-24 23:30:18 UTC
All,

I have begun work on a patch for this; it should be ready this week.

William
Comment 15 William Hubbs gentoo-dev 2014-03-25 14:30:55 UTC
Created attachment 373506 [details, diff]
fix-openvz.patch

This is my first attempt at a fix for this issue. Can someone who uses
OpenVZ test with this patch and report the results?

Thanks,

William
Comment 16 Daniel Robbins 2014-03-25 16:15:46 UTC
Yes, I can test this on an OpenVZ host.

I will also update the associated Funtoo bug: http://bugs.funtoo.org/browse/FL-1127
Comment 17 William Hubbs gentoo-dev 2014-03-25 18:16:31 UTC
@drobbins:
I think there may be an issue with the patch. Can you please attach a
/proc/<pid>/status file from an OpenVZ host?
Comment 18 Daniel Robbins 2014-03-25 18:24:09 UTC
muscleman ~ # cat /proc/1/status | wgetpaste
Your paste can be seen here: http://bpaste.net/show/193557/
Comment 19 William Hubbs gentoo-dev 2014-03-25 20:46:55 UTC
@drobbins:
I saw what I needed to see in the example you attached. The patch in
comment #15 should work. Please test and let me know your results.
Comment 20 William Hubbs gentoo-dev 2014-03-26 22:46:56 UTC
Created attachment 373606 [details, diff]
fix-openvz.patch

This should be the correct fix; disregard the previous patch and test
with this one. The previous patch used a space instead of a tab where it
should have used a tab.

Thanks,

William
Comment 21 Oleh 2014-04-05 06:01:20 UTC
with 2nd patch revision "rc" gives segmentation fault.
steps to reproduce:
1. emerge openrc, with patch applied
2. run "rc"
3. Segmentation Fault
Comment 22 Oleh 2014-04-05 06:11:09 UTC
Here is debug output, if it any helps:

# rc

Auto launching gdb!

Attaching to process 12560
Reading symbols from /sbin/rc...(no debugging symbols found)...done.

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/librc.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librc.so.1
Reading symbols from /lib64/libeinfo.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libeinfo.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libpam.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib64/libncurses.so.5
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6
#0  0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000412a80 in ?? ()
No symbol table info available.
#2  <signal handler called>
No symbol table info available.
#3  0x00007f6c9c97be52 in feof () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f6c9d2d82f8 in rc_getline () from /lib64/librc.so.1
No symbol table info available.
#5  0x00007f6c9d2d532d in rc_find_pids () from /lib64/librc.so.1
No symbol table info available.
#6  0x00007f6c9d2d5c16 in rc_service_daemons_crashed () from /lib64/librc.so.1
No symbol table info available.
#7  0x00000000004074e5 in ?? ()
No symbol table info available.
#8  0x00007f6c9c92cda5 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#9  0x0000000000407c05 in ?? ()
No symbol table info available.
(gdb) bt
#0  0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6
#1  0x0000000000412a80 in ?? ()
#2  <signal handler called>
#3  0x00007f6c9c97be52 in feof () from /lib64/libc.so.6
#4  0x00007f6c9d2d82f8 in rc_getline () from /lib64/librc.so.1
#5  0x00007f6c9d2d532d in rc_find_pids () from /lib64/librc.so.1
#6  0x00007f6c9d2d5c16 in rc_service_daemons_crashed () from /lib64/librc.so.1
#7  0x00000000004074e5 in ?? ()
#8  0x00007f6c9c92cda5 in __libc_start_main () from /lib64/libc.so.6
#9  0x0000000000407c05 in ?? ()
(gdb)
Comment 23 William Hubbs gentoo-dev 2014-04-15 00:39:11 UTC
Created attachment 374934 [details, diff]
fix-openvz.patch

All,

this is the latest version of this patch. I know it works on non-openvz
systems, but I need someone to verify that it works on OpenVZ systems.

Thanks,

William
Comment 24 Daniel Robbins 2014-05-20 03:35:10 UTC
The patch segfaults because rc_getline() calls xrealloc(), which may allocate memory, but rc_getline() may not get called at all. In which case, then free() is called anyway, potentially on a NULL pointer. So to fix this patch, you should just check to see if line != NULL before calling free().
Comment 25 Daniel Robbins 2014-05-20 19:46:28 UTC
Okay, turns out there were more problems with this patch. A working version is here:

http://git.funtoo.org/funtoo-overlay/tree/sys-apps/openrc/files/fix-openvz.patch?id=e7b8b50970907a56b85e3757b09986096069d17a

the free() call was inside a larger while loop. That was the bigger problem. Moving it to the end of the rc_find_pids() function resolves the double-free. I also optimized the scanning for EnvID: in the status file so it completes once a match is found.
Comment 26 Peter Volkov (RETIRED) gentoo-dev 2014-06-02 08:14:45 UTC
Nothing to do for cron-bugs here. Currently we handle openrc feature here.
Comment 27 Daniel Robbins 2014-06-18 21:48:59 UTC
Created attachment 379228 [details, diff]
Working, in production patch for OpenVZ pid filtering from Funtoo Linux.
Comment 28 William Hubbs gentoo-dev 2014-06-20 21:32:47 UTC
This is fixed in commit 9eb9b28 and will be included in OpenRC-0.13.