Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 595044

Summary: sys-backup/bacula needs updated service scripts
Product: Gentoo Hosted Projects Reporter: Karl-Johan Karlsson <creideiki+gentoo-bugzilla>
Component: OpenRCAssignee: Thomas Beierlein <tomjbe>
Status: RESOLVED FIXED    
Severity: major CC: openrc
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: bacula-dir.initd
bacula-fd.initd
bacula-sd.initd

Description Karl-Johan Karlsson 2016-09-24 16:41:10 UTC
I run a lot of LXC containers. After rebooting for the first time in a while (so I'm not sure when this behaviour was introduced), onto sys-apps/openrc-0.21.7, I noticed that the app-backup/bacula client didn't start on the host, although it did start in each container.

This seems to be because start-stop-daemon considers the service already running if it can see any process with a matching command line, regardless of whether it is running in the current system or in a container.

Here's what the init script prints on the host:


 ~ # /etc/init.d/bacula-fd status
 * status: stopped

 ~ # /etc/init.d/bacula-fd start
 * Starting bacula file daemon ...
 * start-stop-daemon: /usr/sbin/bacula-fd is already running
 * ERROR: bacula-fd failed to start

 ~ # /etc/init.d/bacula-fd stop
 * WARNING: bacula-fd is already stopped


So the RC system knows the correct status of the service, it's just start-stop-daemon that's being difficult.

Here's what start-stop-daemon prints when run manually on the host:


 ~ # start-stop-daemon --test --verbose --start --exec /usr/sbin/bacula-fd -- -u root -g bacula -c /etc/bacula/bacula-fd.conf
 * Would send signal 0 to PID 20512
 * Would send signal 0 to PID 18722
[...]
 * start-stop-daemon: /usr/sbin/bacula-fd is already running


With those enumerated processes being the backup clients running in each container:


 ~ # ls -l /proc/20512/exe
lrwxrwxrwx 1 root root 0 24 sep 18.15 /proc/20512/exe -> /usr/sbin/bacula-fd
 ~ # ls -l /proc/20512/ns/pid
lrwxrwxrwx 1 root root 0 24 sep 18.16 /proc/20512/ns/pid -> 'pid:[4026533774]'

 ~ # ls -l /proc/18722/exe
lrwxrwxrwx 1 root root 0 24 sep 18.24 /proc/18722/exe -> /usr/sbin/bacula-fd
 ~ # ls -l /proc/18722/ns/pid
lrwxrwxrwx 1 root root 0 24 sep 18.16 /proc/18722/ns/pid -> 'pid:[4026532611]'


Changing the command line slightly makes start-stop-daemon no longer consider the service to be running:


 ~ # start-stop-daemon --test --verbose --start --exec /usr/sbin/bacula-fd -- -u root -g bacula -c /etc/bacula/bacula-fd.conf2
 * Would start /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf2

Reproducible: Always

Steps to Reproduce:
1. In one root shell, run:
   unshare --pid sleep 17
2. In another root shell, run:
   start-stop-daemon --verbose sleep -- 17
Actual Results:  
start-stop-daemon prints:
 * Sending signal 0 to PID 20698 ...                                                          [ ok ]
 * start-stop-daemon: sleep is already running

Expected Results:  
start-stop-daemon should ignore the process running in the other PID namespace and start its own copy.
Comment 1 William Hubbs gentoo-dev 2016-09-27 17:42:28 UTC
Hi Tom,

this was reported as a bug in start-stop-daemon. However, I took a look
at the bacula init scripts, and it looks like they should be updated to
use the default start/stop functions in OpenRC.

Can you please take a look at man openrc-run and fix the scripts?

I will also assist where I can, so feel free to ask questions.

Thanks.

William
Comment 2 William Hubbs gentoo-dev 2016-09-27 18:53:06 UTC
The first things I see are:

- you probably can remove your start and stop functions and just use the
  defaults depending on how you set up the variables.

- You should not use wildcards in the name of the pid file.
Comment 3 William Hubbs gentoo-dev 2016-10-20 13:48:13 UTC
I am adding the updated service scripts to this bug.
I know the pidfile setting is not correct. That will need to be set to
the correct path, and I am not sure what that should be since I do not
use bacula.
Comment 4 William Hubbs gentoo-dev 2016-10-20 13:51:55 UTC
Created attachment 450830 [details]
bacula-dir.initd
Comment 5 William Hubbs gentoo-dev 2016-10-20 13:52:15 UTC
Created attachment 450832 [details]
bacula-fd.initd
Comment 6 William Hubbs gentoo-dev 2016-10-20 13:52:35 UTC
Created attachment 450834 [details]
bacula-sd.initd
Comment 7 Thomas Beierlein gentoo-dev 2016-10-24 17:02:30 UTC
(In reply to William Hubbs from comment #2)
> The first things I see are:
> 
> - you probably can remove your start and stop functions and just use the
>   defaults depending on how you set up the variables.
> 
> - You should not use wildcards in the name of the pid file.

I double checked the wildcard problem and found the following:

- bacula allows the user to start its daemons a multiple of times - if needed with different configurations.
- one of the configuration option is a portnumber on which the daemon can reached. 
- That portnumber gets part of the pid-file name.
- to catch all possible port numbers the former ebuild author used the xx.*.pid syntax

I see the following possibility to fix that:

Do a standard configuration with a hard coded fixed portnumber in config and initd file. Add a description how to add more than one bacula instance (by copying, renaming and adapting the relevant configuration files.

The solution misses some flexibility but covers the standard use case for most users. Who wants to have more than one bacula running has to do some more work at all. The readme file should give them an idea how to proceed. 

So I will go that route for now.
Comment 8 Thomas Beierlein gentoo-dev 2016-10-26 05:24:58 UTC
(In reply to Karl-Johan Karlsson from comment #0)
Can you please provide some more information about the problem?

- What version of bacula do you run?
- Do you use the standard installation in each container and the host or do you adopt the config files sand start-up scripts to allow multiple bacula daemons running in parallel?

Thanks for the help.
Comment 9 Karl-Johan Karlsson 2016-10-26 05:38:28 UTC
(In reply to Thomas Beierlein from comment #8)
> (In reply to Karl-Johan Karlsson from comment #0)
> Can you please provide some more information about the problem?

I still think start-stop-daemon does things horribly wrong in the presence of containers, but anyway:

> - What version of bacula do you run?

app-backup/bacula-7.4.4, built like this on the container running the Bacula director and storage daemon:

USE="-X acl -bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6 -libressl -logwatch -mysql postgres -qt4 readline -sqlite ssl -static -tcpd -vim-syntax" ABI_X86="64"

and like this on the host and all the other containers:

USE="-X acl bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6 -libressl -logwatch -mysql -postgres -qt4 readline -sqlite ssl -static -tcpd -vim-syntax" ABI_X86="64"

> - Do you use the standard installation in each container and the host or do
> you adopt the config files sand start-up scripts to allow multiple bacula
> daemons running in parallel?

All standard. I have not changed either /etc/conf.d/bacula-* or /etc/init.d/bacula-* anywhere.
Comment 10 Thomas Beierlein gentoo-dev 2016-10-26 07:33:10 UTC
(In reply to Karl-Johan Karlsson from comment #9)
> (In reply to Thomas Beierlein from comment #8)
> > (In reply to Karl-Johan Karlsson from comment #0)
> > Can you please provide some more information about the problem?
> 
> I still think start-stop-daemon does things horribly wrong in the presence
> of containers, but anyway:
> 
Yes, that may be so. But as we need to sort out things also for bacula as a whole let us try that first.

> > - What version of bacula do you run?
> 
> app-backup/bacula-7.4.4, built like this on the container running the Bacula
> director and storage daemon:
> 
> USE="-X acl -bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6
> -libressl -logwatch -mysql postgres -qt4 readline -sqlite ssl -static -tcpd
> -vim-syntax" ABI_X86="64"
> 
> and like this on the host and all the other containers:
> 
> USE="-X acl bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6
> -libressl -logwatch -mysql -postgres -qt4 readline -sqlite ssl -static -tcpd
> -vim-syntax" ABI_X86="64"
> 
Ok, I see. you have one specialised container with the whole bacula machinery (director and storage daemon) and from there you back up all containers and the host. So there is a bacula-fd running in each container.

As I am not quite familiar with the use of containers myself - how do you address the different file daemons from the central director. Do they get different ip addresses?

> > - Do you use the standard installation in each container and the host or do
> > you adopt the config files sand start-up scripts to allow multiple bacula
> > daemons running in parallel?
> 
> All standard. I have not changed either /etc/conf.d/bacula-* or
> /etc/init.d/bacula-* anywhere.

Ok. From my point of view it looks as if the containers are isolated from each other but are shining through onto the host system. So you can have the same file daemon running on each container, who sees only his own files. But the host see his own files and the files from the container(s).

What is not quite clear to me atm is how start-stop-daemon checks if a daemon is already running. Maybe WilliamH can comment here.
Comment 11 Thomas Beierlein gentoo-dev 2016-10-26 16:51:46 UTC
Well Karl-Johan, could you please do a test for me?

Download the bacula-fd.initd from attachment 450832 [details] and replace the pidfile line with

pidfile=/var/run/bacula-fd.9102.pid

Try what happens if you just replace the init file in the host with that file.
If it works please replace at least one of the init files in a container with that file too.

Please be aware that I am away for the rest of the week and will be back not before weekend.

Thanks for the help.
Comment 12 Karl-Johan Karlsson 2016-10-26 17:33:40 UTC
(In reply to Thomas Beierlein from comment #10)

> Ok, I see. you have one specialised container with the whole bacula
> machinery (director and storage daemon) and from there you back up all
> containers and the host. So there is a bacula-fd running in each container.

Correct. Each container is in fact its own minimal Gentoo system, running one "real" service (e.g. an Apache, or the Bacula SD and DIR) plus a few maintenance services (Bacula FD, Syslog, Salt...).

> As I am not quite familiar with the use of containers myself - how do you
> address the different file daemons from the central director. Do they get
> different ip addresses?

Yes. Each container has a veth interface connected to a software bridge, which is also connected to one of the host's physical interfaces. Each container then gets its own IPv4 and IPv6 address. Traffic between containers, and between host and container, go through the bridge.

> Ok. From my point of view it looks as if the containers are isolated from
> each other but are shining through onto the host system. So you can have the
> same file daemon running on each container, who sees only his own files. But
> the host see his own files and the files from the container(s).

Correct. You may look at it like an augmented chroot(). Processes running inside a chroot() see only their own files, but processes on the outside can see everything. Containers give you similar facilities for processes, users, network connections, etc.

Here's how the process tree looks from the outside, in the host:

# ps -A --forest -o pid,user,command | grep bacula
 4708 root     /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf
 4460 root              \_ grep --colour=auto bacula
 5809 root          \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf
20396 root          \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf
21175 root          \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf
[...]
 6864 root     /usr/bin/lxc-start -l WARN -n bacula -f /export/lxc/bacula/config -d -o /var/log/lxc/bacula.log
18646 root          \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf
18671 root          \_ /usr/sbin/bacula-sd -u root -g bacula -c /etc/bacula/bacula-sd.conf
29092 root          \_ /usr/sbin/bacula-dir -u root -g bacula -c /etc/bacula/bacula-dir.conf

The lines are:
1: The host's bacula-fd.
2: The grep command.
3-5: Individual containers' bacula-fd:s.
6: A lot more identical bacula-fd:s omitted.
7: The LXC container management system master process for the container called "bacula", which runs my Bacula servers.
8: The Bacula container's bacula-fd.
9: bacula-sd, running in the container "bacula".
10: bacula-dir, running in the container "bacula".

From inside the Bacula server container, the process tree looks like this:

# ps -A --forest -o pid,user,command | grep bacula
 6987 root      \_ grep --colour=auto bacula
  610 root     /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf
  631 root     /usr/sbin/bacula-sd -u root -g bacula -c /etc/bacula/bacula-sd.conf
10378 root     /usr/sbin/bacula-dir -u root -g bacula -c /etc/bacula/bacula-dir.conf

These are the same processes as lines 8-10 from the outside, but with different PID:s on the inside. The bacula-fd processes running in other containers are not visible.

> What is not quite clear to me atm is how start-stop-daemon checks if a
> daemon is already running. Maybe WilliamH can comment here.

It looks like simple command line comparison. Not even the running user is taken into account: If I run, as a regular user:

  sleep 17

and in another terminal, as root:

  start-stop-daemon --verbose sleep -- 17

start-stop-daemon claims that the process is already running:

   * Sending signal 0 to PID 13658 ...   [ ok ]
   * start-stop-daemon: sleep is already running

(In reply to Thomas Beierlein from comment #10)
> Well Karl-Johan, could you please do a test for me?

Sure, but probably not today.
Comment 13 Karl-Johan Karlsson 2016-10-30 10:03:04 UTC
(In reply to Thomas Beierlein from comment #11)
> Well Karl-Johan, could you please do a test for me?
> 
> Download the bacula-fd.initd from attachment 450832 [details] and replace
> the pidfile line with
> 
> pidfile=/var/run/bacula-fd.9102.pid
> 
> Try what happens if you just replace the init file in the host with that
> file.

It seems to work just fine.

Here's the standard script, still failing:

# ./bacula-fd stop
 * Stopping bacula file daemon ...                            [ ok ]
# ./bacula-fd start
 * Caching service dependencies ...                           [ ok ]
 * Starting bacula file daemon ...
 * start-stop-daemon: /usr/sbin/bacula-fd is already running  [ !! ]


And here's with the script from the attachment above:

# cp bacula-fd.bugzilla bacula-fd
# ./bacula-fd status
 * status: stopped
# ./bacula-fd start
 * Caching service dependencies ...                           [ ok ]
 * Starting bacula-fd ...                                     [ ok ]


The process is running, with the correct arguments:

# ps -A -o pid,user,command | grep $(cat /var/run/bacula-fd.9102.pid)
27226 root     /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf


The director is happy:

*status client=xyz-fd
Connecting to Client xyz-fd at xyz:9102

xyz-fd Version: 7.4.4 (20 September 2016)  x86_64-pc-linux-gnu gentoo 
Daemon started 30-okt-16 10:46. Jobs: run=0 running=0.


And the process stops when asked to:

# ./bacula-fd stop
 * Stopping bacula-fd ...                                     [ ok ]
# cat /var/run/bacula-fd.9102.pid
cat: /var/run/bacula-fd.9102.pid: No such file or directory
# ps -p 27226
  PID TTY          TIME CMD
#

> If it works please replace at least one of the init files in a container
> with that file too.

That works too; starting, checking status, and stopping.
Comment 14 Thomas Beierlein gentoo-dev 2016-11-08 12:09:55 UTC
Very well Karl-Johan, thanks for the test report (and also for the inforamtion about the containers before).

I will now prepare a 7.4.4-r1 in the evening  with fixes for the problem.
Comment 15 Thomas Beierlein gentoo-dev 2016-11-10 16:10:49 UTC
Fixed in 7.4.4-r1.

>    app-backup/bacula: Update init.d service scripts bug #595044
>    and fix slot operator for dev-db/postgresql bug #597666