376817 – make OpenVZ host to automatically filters out container's processes

Bug 376817 - make OpenVZ host to automatically filters out container's processes

Summary: make OpenVZ host to automatically filters out container's processes

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Hosted Projects
Classification:	Unclassified
Component:	OpenRC (show other bugs)
Hardware:	All Linux

Importance:	Normal enhancement
Assignee:	OpenRC Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	367793 481182
	Show dependency tree

Reported:	2011-07-28 20:42 UTC by Vince C.
Modified:	2014-06-20 21:32 UTC (History)
CC List:	4 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
vixie-cron (vixie-cron,358 bytes, text/plain) 2011-08-30 16:39 UTC, William Hubbs	Details
fix-openvz.patch (fix-openvz.patch,1.05 KB, patch) 2014-03-25 14:30 UTC, William Hubbs	Details \| Diff
fix-openvz.patch (fix-openvz.patch,1.05 KB, patch) 2014-03-26 22:46 UTC, William Hubbs	Details \| Diff
fix-openvz.patch (fix-openvz.patch,1.05 KB, patch) 2014-04-15 00:39 UTC, William Hubbs	Details \| Diff
Working, in production patch for OpenVZ pid filtering from Funtoo Linux. (fix-openvz-r1.patch,1.87 KB, patch) 2014-06-18 21:48 UTC, Daniel Robbins	Details \| Diff
Show Obsolete (3) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Vince C. 2011-07-28 20:42:29 UTC

When OpenVZ containers are active some processes like cron may interfere with OpenRC start-stop-daemon: starting services may fail if there are active containers running processes that have the same name as the host's service.

Comment 1 Vince C. 2011-07-28 20:51:02 UTC

start-stop-daemon refuses, for instance, to run vixie-cron (or any other service whose process name is 'cron' as long as there are containers with children that are also named 'cron'). It happened on a Gentoo host I have installed with Debian containers.

/etc/init.d/vixie-cron restart
...
 * Starting vixie-cron ...
 * start-stop-daemon: /usr/sbin/cron is already running      [ !! ]
 * ERROR: vixie-cron failed to start

Temporary solution (hard): temporarily shutdown VZ containers
Temporary solution (soft): temporarily stop all services or processes with the same name in all containers

Hint: all "blocking" processes are children of init processes. Maybe if start-stop-daemon checked only session leaders...

Comment 2 Gus Power 2011-08-14 09:01:49 UTC

I'm seeing the same issue here under lxc. The vixie-cron init script uses the following:

start-stop-daemon --start --quiet --exec /usr/sbin/cron

but it detects the instances of /usr/sbin/cron running in the virtualized containers and refuses to start.

Comment 3 William Hubbs gentoo-dev

2011-08-30 16:39:53 UTC

Created attachment 285095 [details]
vixie-cron

This is a version of the vixie-cron init script that has been rewritten
to be more compatible with openrc.

Can you please test with this and report back whether the issue still
exists?

If it doesn't, I will re-assign this bug to the cron team and ask them
to update the init script. Then we will need to find other init scripts
and update them as well.

Comment 4 Jordi Marqués 2011-10-13 07:32:36 UTC

Same issue here, and with your init script vixie-cron seems to start and stop correctly.

Comment 5 Sebastian Bobrecki 2011-10-20 12:21:03 UTC

I have the same problem with practically every script that uses the start-stop-daemonfor example:
bacula-fd
clamd
snmpd
sysklogd
vixie-cron
...

Comment 6 Majed 2011-11-02 08:19:47 UTC

i don't have the vz application but i can't restart vixie cron
whenever i type :
/etc/init.d/vixie-cron restart
i get:

* Stopping vixie-cron ...                                                [ ok ]
 * Starting vixie-cron ...
 * start-stop-daemon: /usr/sbin/cron is already running                   [ !! ]
 * ERROR: vixie-cron failed to start

then i wait 30 seconds and type in:
/etc/init.d/vixie-cron start
and then it works
as if it is waiting for something to finish, maybe a cron job at sleep.

Comment 7 Peter Volkov (RETIRED) gentoo-dev

2011-12-12 05:19:25 UTC

This is not an openrc bug but merely init scripts of said daemons should be rewritten to pid files. Please open separate bug reports against each package and add a blocker for this bug. As for vixie-cron I'll CC maintainer here to review updated init script.

Comment 8 William Hubbs gentoo-dev

2011-12-12 05:30:39 UTC

(In reply to comment #7)
> This is not an openrc bug but merely init scripts of said daemons should be
> rewritten to pid files. Please open separate bug reports against each package
> and add a blocker for this bug. As for vixie-cron I'll CC maintainer here to
> review updated init script.

Actually, we already have a tracker for scripts that need to be rewritten for openrc, bug #367793, so do not make these block this bug, but that one.

The init.d scripts can be rewritten in a much simpler form than they are now, like the one for vixie-cron which I attached to this bug.

Comment 9 Peter Volkov (RETIRED) gentoo-dev

2011-12-14 05:16:29 UTC

William, this is not openrc issue, but containers issue. Probably it's good idea to document better suggestion always use pids in init scripts.

Comment 10 Vince C. 2011-12-14 08:07:29 UTC

(In reply to comment #9)
> William, this is not openrc issue, but containers issue. Probably it's good
> idea to document better suggestion always use pids in init scripts.

Hello Peter. I think I start to understand. It is the daemon itself that checks whether there are processes with the same name running, start-stop-daemon not, is that correct? So isn't there a possible workaround, like using namespaces or cgroups to isolate processes that run within containers from processes that are children of the host's init (1) ?

Comment 11 Peter Volkov (RETIRED) gentoo-dev

2011-12-17 04:16:56 UTC

Vince I failed to parse you question, but...
problem is that some init scripts search for daemon by name. And since by default in openvz at host all processes are visible, initscript finds and kills all daemons in all containers. That said, in openvz there is simple workaround for this issue:
just run sysctl -w 'kernel.pid_ns_hide_child=1' and restart all containers. Looks like lxc does not have such feature at the moment.

Comment 12 Vince C. 2011-12-18 09:04:51 UTC

(In reply to comment #11)
> Vince I failed to parse you question, but...
[...]
> That said, in openvz there is simple workaround for this issue:
> just run sysctl -w 'kernel.pid_ns_hide_child=1' and restart all containers.
> Looks like lxc does not have such feature at the moment.

Thanks a lot Peter. That was exactly what I was looking for. Will try that.

Merry Christmas and a Happy New Year everyone.

Comment 13 Daniel Robbins 2014-03-03 18:00:23 UTC

I have reproduced this issue.

I believe the best fix is to have rc_find_pids() in OpenRC modified so that on OpenVZ hosts, it automatically filters out processes.

Use the following steps in rc_find_pids() on Linux systems:

1) See if /proc/<pid>/status exists
2) If it does, see if it has a "envID:" field
3) If it does, see if "envID:" is set to "0"
4) If so, then it's one of the host's processes and should be a candidate for the list. Otherwise, it is one of the container's processes and should be ignored.

This should fix the bug and allow start-stop-daemon to work properly on OpenVZ hosts.

Comment 14 William Hubbs gentoo-dev

2014-03-24 23:30:18 UTC

All,

I have begun work on a patch for this; it should be ready this week.

William

Comment 15 William Hubbs gentoo-dev

2014-03-25 14:30:55 UTC

Created attachment 373506 [details, diff]
fix-openvz.patch

This is my first attempt at a fix for this issue. Can someone who uses
OpenVZ test with this patch and report the results?

Thanks,

William

Comment 16 Daniel Robbins 2014-03-25 16:15:46 UTC

Yes, I can test this on an OpenVZ host.

I will also update the associated Funtoo bug: http://bugs.funtoo.org/browse/FL-1127

Comment 17 William Hubbs gentoo-dev

2014-03-25 18:16:31 UTC

@drobbins:
I think there may be an issue with the patch. Can you please attach a
/proc/<pid>/status file from an OpenVZ host?

Comment 18 Daniel Robbins 2014-03-25 18:24:09 UTC

muscleman ~ # cat /proc/1/status | wgetpaste
Your paste can be seen here: http://bpaste.net/show/193557/

Comment 19 William Hubbs gentoo-dev

2014-03-25 20:46:55 UTC

@drobbins:
I saw what I needed to see in the example you attached. The patch in
comment #15 should work. Please test and let me know your results.

Comment 20 William Hubbs gentoo-dev

2014-03-26 22:46:56 UTC

Created attachment 373606 [details, diff]
fix-openvz.patch

This should be the correct fix; disregard the previous patch and test
with this one. The previous patch used a space instead of a tab where it
should have used a tab.

Thanks,

William

Comment 21 Oleh 2014-04-05 06:01:20 UTC

with 2nd patch revision "rc" gives segmentation fault.
steps to reproduce:
1. emerge openrc, with patch applied
2. run "rc"
3. Segmentation Fault

Comment 22 Oleh 2014-04-05 06:11:09 UTC

Here is debug output, if it any helps:

# rc

Auto launching gdb!

Attaching to process 12560
Reading symbols from /sbin/rc...(no debugging symbols found)...done.

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/librc.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librc.so.1
Reading symbols from /lib64/libeinfo.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libeinfo.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libpam.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib64/libncurses.so.5
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6
#0  0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000412a80 in ?? ()
No symbol table info available.
#2  <signal handler called>
No symbol table info available.
#3  0x00007f6c9c97be52 in feof () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f6c9d2d82f8 in rc_getline () from /lib64/librc.so.1
No symbol table info available.
#5  0x00007f6c9d2d532d in rc_find_pids () from /lib64/librc.so.1
No symbol table info available.
#6  0x00007f6c9d2d5c16 in rc_service_daemons_crashed () from /lib64/librc.so.1
No symbol table info available.
#7  0x00000000004074e5 in ?? ()
No symbol table info available.
#8  0x00007f6c9c92cda5 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#9  0x0000000000407c05 in ?? ()
No symbol table info available.
(gdb) bt
#0  0x00007f6c9c9c0442 in wait () from /lib64/libc.so.6
#1  0x0000000000412a80 in ?? ()
#2  <signal handler called>
#3  0x00007f6c9c97be52 in feof () from /lib64/libc.so.6
#4  0x00007f6c9d2d82f8 in rc_getline () from /lib64/librc.so.1
#5  0x00007f6c9d2d532d in rc_find_pids () from /lib64/librc.so.1
#6  0x00007f6c9d2d5c16 in rc_service_daemons_crashed () from /lib64/librc.so.1
#7  0x00000000004074e5 in ?? ()
#8  0x00007f6c9c92cda5 in __libc_start_main () from /lib64/libc.so.6
#9  0x0000000000407c05 in ?? ()
(gdb)

Comment 23 William Hubbs gentoo-dev

2014-04-15 00:39:11 UTC

Created attachment 374934 [details, diff]
fix-openvz.patch

All,

this is the latest version of this patch. I know it works on non-openvz
systems, but I need someone to verify that it works on OpenVZ systems.

Thanks,

William

Comment 24 Daniel Robbins 2014-05-20 03:35:10 UTC

The patch segfaults because rc_getline() calls xrealloc(), which may allocate memory, but rc_getline() may not get called at all. In which case, then free() is called anyway, potentially on a NULL pointer. So to fix this patch, you should just check to see if line != NULL before calling free().

Comment 25 Daniel Robbins 2014-05-20 19:46:28 UTC

Okay, turns out there were more problems with this patch. A working version is here:

http://git.funtoo.org/funtoo-overlay/tree/sys-apps/openrc/files/fix-openvz.patch?id=e7b8b50970907a56b85e3757b09986096069d17a

the free() call was inside a larger while loop. That was the bigger problem. Moving it to the end of the rc_find_pids() function resolves the double-free. I also optimized the scanning for EnvID: in the status file so it completes once a match is found.

Comment 26 Peter Volkov (RETIRED) gentoo-dev

2014-06-02 08:14:45 UTC

Nothing to do for cron-bugs here. Currently we handle openrc feature here.

Comment 27 Daniel Robbins 2014-06-18 21:48:59 UTC

Created attachment 379228 [details, diff]
Working, in production patch for OpenVZ pid filtering from Funtoo Linux.

Comment 28 William Hubbs gentoo-dev

2014-06-20 21:32:47 UTC

This is fixed in commit 9eb9b28 and will be included in OpenRC-0.13.