Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 293090 - mail-filter/sid-milter-1.0.0-r3 reports status "crashed" if multiple domains are used
Summary: mail-filter/sid-milter-1.0.0-r3 reports status "crashed" if multiple domains ...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High major (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-13 18:07 UTC by steveb
Modified: 2009-12-13 16:41 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Fix init script (sid-milter.patch,734 bytes, patch)
2009-12-05 20:31 UTC, Roy Marples
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description steveb 2009-11-13 18:07:26 UTC
According to the documentation one can use a comma separated list of domains to be ignored by sid-milter:
-d domain[,...]
  A comma-separated list of domains whose mail should be ignored by this filter.

If I add multiple domains in /etc/conf.d/sid-filter then the init.d script is starting sid-filter but the state stays always in "crashed" mode. Removing the additional domains and just having one single domain works without issues.

Not working:
DOMAIN="localhost.localdomain,example.com"

Working:
DOMAIN="example.com"

sid-filter is correctly started but the Gentoo init.d system does not work the proper way.

Reproducible: Always
Comment 1 Alin Năstac (RETIRED) gentoo-dev 2009-11-16 22:50:12 UTC
It works for me on a box with baselayout-1.12.11.1 and I have 4 entries in my DOMAIN list.
What baselayout are you using?
Comment 2 steveb 2009-11-17 09:20:54 UTC
(In reply to comment #1)
> It works for me on a box with baselayout-1.12.11.1 and I have 4 entries in my
> DOMAIN list.
> What baselayout are you using?
> 
sys-apps/baselayout: 2.0.1


Maybe that's the issue? And sys-apps/openrc-0.5.2-r2?

// Steve
Comment 3 Alin Năstac (RETIRED) gentoo-dev 2009-11-17 18:42:22 UTC
Reassigned to base-system team.
Roy, can you help us in this matter?
Comment 4 Roy Marples 2009-11-24 14:10:05 UTC
I suspect it's because it's using the --background flag.
The code looks like it forks, it's safe to remove.

Also, remove the --quiet flag unless it's too chatty on start.

This looks like a copy n paste init script :/
Comment 5 steveb 2009-11-24 17:26:54 UTC
(In reply to comment #4)
> I suspect it's because it's using the --background flag.
> The code looks like it forks, it's safe to remove.
> 
> Also, remove the --quiet flag unless it's too chatty on start.
> 
That did not help. Still in state crashed after removing --background flag (and the --quiet flag too).


> This looks like a copy n paste init script :/
> 
Comment 6 Roy Marples 2009-11-24 18:06:48 UTC
Does the service create a pidfile in /var/run ?
Comment 7 steveb 2009-11-24 18:40:48 UTC
(In reply to comment #6)
> Does the service create a pidfile in /var/run ?
> 
Yes.


Maybe I should mention you what the issue is? Okay. There is a configuration option where one can set multiple values and all of them are separated by a comma. As soon as I do that and restart the service Gentoo is telling me that the service is crashed.

If I look how the service is started I see this here:
milter   19042  0.0  0.0  18412   816 ?        Ssl  19:34   0:00 /usr/bin/sid-filter -P /var/run/sid-filter.pid -B -T 10 -a /etc/mail/opendkim/PeerList -d vunet.local localhost.localdomain -h -l -p inet:8026@127.0.0.1 -u milter

See that -d followed by two domains? The init.d script is removing those commas and that should not happen. Don't ask me why the script is removing them.

However... the pid file is correctly created:
theia ~ # cat /var/run/sid-filter.pid
19042
theia ~ # ps auxw|grep 19042
milter   19042  0.0  0.0  35084  1176 ?        Ssl  19:34   0:00 /usr/bin/sid-filter -P /var/run/sid-filter.pid -B -T 10 -a /etc/mail/opendkim/PeerList -d vunet.local localhost.localdomain -h -l -p inet:8026@127.0.0.1 -u milter
root     19236  0.0  0.0   1936   576 pts/1    S+   19:38   0:00 grep --colour=auto 19042
theia ~ #

I don't know if this is related to the crashed state Gentoo is reporting?


// Steve
Comment 8 steveb 2009-11-24 19:12:08 UTC
Wait a minute. I am not sure if the application is creating a pid by it self. I instructed it to create one by appending -P to the parameters and a path to the pid file. What puzzles me is that without adding multiple domains everything works but as soon as I add more then one domain (separated by a comma) then things don't work.
Comment 9 Roy Marples 2009-11-24 20:18:01 UTC
Does it work if you change the start command to this?

start-stop-daemon --start \
     --exec /usr/bin/sid-filter --name sid-filter \
     -- ${SID_FILTER_OPTS}
Comment 10 steveb 2009-11-24 22:39:26 UTC
(In reply to comment #9)
> Does it work if you change the start command to this?
> 
> start-stop-daemon --start \
>      --exec /usr/bin/sid-filter --name sid-filter \
>      -- ${SID_FILTER_OPTS}
> 
Nope. Same problem.
Comment 11 steveb 2009-11-24 22:48:18 UTC
The issue with the comma being replaced by space is not an issue. The application does that. I mean if you start by hand and then check with ps you see the process running and the parameters have no comma but the application is working. So it's normal.
Comment 12 steveb 2009-12-05 13:13:01 UTC
(In reply to comment #4)
> I suspect it's because it's using the --background flag.
> The code looks like it forks, it's safe to remove.
> 
> Also, remove the --quiet flag unless it's too chatty on start.
> 
> This looks like a copy n paste init script :/
> 
Even when using this init.d script here -> http://bugs.gentoo.org/attachment.cgi?id=212149 <- it does not work.

I still have the same issue:
--------------------------------------
theia ~ # /etc/init.d/sid-filter restart
* Caching service dependencies...             [ ok ]
* Stopping Sender-ID Filter...                [ ok ]
* Starting Sender-ID Filter...                [ ok ]
theia ~ # /etc/init.d/sid-filter status
* status: crashed
theia ~ # cat /var/run/sid-filter.pid
3092
theia ~ # ps auxww|grep -i sid
milter    3092  0.0  0.0  26888  1108 ?        Ssl  12:53   0:00 sid-filter -P /var/run/sid-filter.pid -B -T 10 -a /etc/mail/opendkim/PeerList -d vunet.local localhost.localdomain -h -l -p inet:8026@127.0.0.1 -u milter
root      3174  0.0  0.0   1940   572 pts/0    S+   12:54   0:00 grep --colour=auto -i sid
theia ~ # ls -lah /var/run/sid-filter.pid
-rw-r--r-- 1 milter root 5 Dec  5 12:53 /var/run/sid-filter.pid
theia ~ # find /lib/rc/init.d/ -name "sid-filter"
/lib/rc/init.d/daemons/sid-filter
/lib/rc/init.d/started/sid-filter
theia ~ #
--------------------------------------

To me this looks like a bug in sys-apps/openrc (I use 0.5.2-r2 but updated it to 0.5.3 now) when used with sys-apps/baselayout (I use 2.0.1).

@Roy: Can you explain to me how Gentoo is checking for the pid? Why is service_crashed() reporting the service as crashed?

I still think the replacement of the "," in the switch -d is the issue. If I leave everything as it is then it's not working:
--------------------------------------
theia ~ # cat /lib/rc/init.d/daemons/sid-filter/001
exec=/usr/bin/sid-filter
argv_0=sid-filter
argv_1=-P
argv_2=/var/run/sid-filter.pid
argv_3=-B
argv_4=-T
argv_5=10
argv_6=-a
argv_7=/etc/mail/opendkim/PeerList
argv_8=-d
argv_9=vunet.local,localhost.localdomain
argv_10=-h
argv_11=-l
argv_12=-p
argv_13=inet:8026@127.0.0.1
argv_14=-u
argv_15=milter
pidfile=
theia ~ # /etc/init.d/sid-filter status
* status: crashed
theia ~ #
--------------------------------------

Now if I go on and modify that parameter list by hand to split at the "," then I get this result (look at the lines after argv_9 that I changed manually and renumbered them):
--------------------------------------
theia ~ # cat /lib/rc/init.d/daemons/sid-filter/001
exec=/usr/bin/sid-filter
argv_0=sid-filter
argv_1=-P
argv_2=/var/run/sid-filter.pid
argv_3=-B
argv_4=-T
argv_5=10
argv_6=-a
argv_7=/etc/mail/opendkim/PeerList
argv_8=-d
argv_9=vunet.local
argv_10=localhost.localdomain
argv_11=-h
argv_12=-l
argv_13=-p
argv_14=inet:8026@127.0.0.1
argv_15=-u
argv_16=milter
pidfile=
theia ~ # /etc/init.d/sid-filter status
* status: started
theia ~ #
--------------------------------------

Aha! So OpenRC has an issue if the process has been started with parameters that get changed by the running process:
--------------------------------------
theia ~ # hexdump -C /proc/3092/cmdline
00000000  73 69 64 2d 66 69 6c 74  65 72 00 2d 50 00 2f 76  |sid-filter.-P./v|
00000010  61 72 2f 72 75 6e 2f 73  69 64 2d 66 69 6c 74 65  |ar/run/sid-filte|
00000020  72 2e 70 69 64 00 2d 42  00 2d 54 00 31 30 00 2d  |r.pid.-B.-T.10.-|
00000030  61 00 2f 65 74 63 2f 6d  61 69 6c 2f 6f 70 65 6e  |a./etc/mail/open|
00000040  64 6b 69 6d 2f 50 65 65  72 4c 69 73 74 00 2d 64  |dkim/PeerList.-d|
00000050  00 76 75 6e 65 74 2e 6c  6f 63 61 6c 00 6c 6f 63  |.vunet.local.loc|
00000060  61 6c 68 6f 73 74 2e 6c  6f 63 61 6c 64 6f 6d 61  |alhost.localdoma|
00000070  69 6e 00 2d 68 00 2d 6c  00 2d 70 00 69 6e 65 74  |in.-h.-l.-p.inet|
00000080  3a 38 30 32 36 40 31 32  37 2e 30 2e 30 2e 31 00  |:8026@127.0.0.1.|
00000090  2d 75 00 6d 69 6c 74 65  72 00                    |-u.milter.|
0000009a
theia ~ #
--------------------------------------

Is this something that is a bug in OpenRC or do you consider this as an bug of the started application/service?

Not modifying the parameters but telling OpenRC where to find the pid solves the issue as well:
--------------------------------------
theia ~ # cat /lib/rc/init.d/daemons/sid-filter/001
exec=/usr/bin/sid-filter
argv_0=sid-filter
argv_1=-P
argv_2=/var/run/sid-filter.pid
argv_3=-B
argv_4=-T
argv_5=10
argv_6=-a
argv_7=/etc/mail/opendkim/PeerList
argv_8=-d
argv_9=vunet.local,localhost.localdomain
argv_10=-h
argv_11=-l
argv_12=-p
argv_13=inet:8026@127.0.0.1
argv_14=-u
argv_15=milter
pidfile=/var/run/sid-filter.pid
theia ~ # /etc/init.d/sid-filter status
* status: started
theia ~ #
--------------------------------------

How can I force the entry pidfile to be written by the init.d script?
Comment 13 Roy Marples 2009-12-05 20:31:47 UTC
Created attachment 212176 [details, diff]
Fix init script

This patches the referenced init script to work on pidfiles and should fix the problem.
Comment 14 Roy Marples 2009-12-05 20:33:45 UTC
(In reply to comment #12)
> Aha! So OpenRC has an issue if the process has been started with parameters
> that get changed by the running process:

...

> Is this something that is a bug in OpenRC or do you consider this as an bug of
> the started application/service?

That's a bug in the init script as it then cannot reliably find the process. Using pidfiles all around addresses this.
Comment 15 steveb 2009-12-05 23:06:47 UTC
(In reply to comment #13)
> Created an attachment (id=212176) [details]
> Fix init script
> 
> This patches the referenced init script to work on pidfiles and should fix the
> problem.
> 
I don't need to apply the patches to tell you this:
Nope. It's not fixing the issue.

I know that because I have already tried what you are trying to do with the patched init.d script.

The problem (and later a possible solution) is this:
If you start sid-filter with start-stop-daemon and use the --pidfile option from start-stop-daemon then following scenarios are possible:

1) You HAVE NOT instructed sid-filter to create a pid file (you DON'T USE -P switch in sid-filter)

2) You HAVE instructed sid-filter to create a pid file (you USE -P switch in sid-filter)

In both situations the pid file does not get created. So let's try to use --make-pidfile to force the creation. Doing so then results in: A pid file gets created but the pid referenced in the pid file is not a running process.

The daemon (sid-filter) however is running with PID+1 (where PID = content of the pid file created by start-stop-daemon). So this leads me to believe that start-stop-daemon is indeed starting sid-filter and the pid it gets when doing that is just the pid that sid-filter used before it forked and the forked daemon is one pid away from the one that sid-filter had before the fork).

This behavior seems logic to me and if my logic is not fooling me then lets try to instruct sid-filter to NOT fork and let start-stop-daemon start it in the background and manage the pid file. To test that I added switch -f (don't fork-and-exit) to sid-filter options and added --pidfile and --make-pidfile and --background to the init.d script. And guess what? IT WORKS.

So from my viewpoint we have two issues:

1) start-stop-daemon can not easy handle a forking daemon that needs to have a pid file created by start-stop-daemon.

My answer/statement to this: Well... the man page does say that not everything can't work right if the daemon to be started is forking and you instruct start-stop-daemon to create a pid file for such forking daemon. => Works as designed.

2) If start-stop-daemon is starting a daemon that is parametrized but (for whatever reason) is reporting to the OS that it uses other parameters then the one you have used when starting the daemon. Then OpenRC is having issues finding the process and claims a crashed service.

My answer/statement to this: You said that this is a bug in the application and should be fixed. Okay. Point taken. I am going to report that upstream. One thing however is something you should think about: Not using OpenRC works perfectly with that issue. Alin does not have that issue on his Gentoo that does not use OpenRC. Now I do understand your point and strictly looking at it (from a mathematical viewpoint) you have right: parameters "-a -b x,y" are not equal to "-a -b x y"

I don't know how the old Gentoo setup searched for the pid? Maybe it just reads the pid file and then looked in /proc to find the executable and if found marked it as running without checking if the parameters are the same as when started. You probably added more stricter checking into OpenRC and it's a good thing that you do that but I would appreciate if you could add the possibility to instruct OpenRC to fall back to less stricter checking of the process id. I don't mean that this should be on by default but it would be very user friendly if one could switch on/off such behavior from within the init.d script. Maybe something like this:
status() {
  RC_LAZY_PID_CHECK=1
  _status
}

But it's up to you. I don't want to tell you what you should and what you should not. For me it's clear now what the problem is and I know now how to handle that issue till it gets fixed upstream. And if upstream is not fixing it then I have a workaround for the problem.

// Steve
Comment 16 steveb 2009-12-05 23:33:46 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > Aha! So OpenRC has an issue if the process has been started with parameters
> > that get changed by the running process:
> 
> ...
> 
> > Is this something that is a bug in OpenRC or do you consider this as an bug of
> > the started application/service?
> 
> That's a bug in the init script as it then cannot reliably find the process.
> Using pidfiles all around addresses this.
> 
Come on. It can find the process. The binary started by start-stop-daemon and the one referenced in the pid are equal. The only difference is one single character in the parameter list. If we say that the parameters needs to be 100% equal then you are right. The process id referenced does not match the one currently running. If I would code something like that then I would:

1) check if pid is there -> [yes = go to 2 | no = fail]

2) check if exe referenced in the pid is the same as recorded by start-stop-daemon -> [yes = go to 3 | no = fail]

3) check if parameters referenced in the start-stop-daemon are the same as in /proc -> [yes = success | no = go to 4]

4) are we instructed by the start-stop-daemon to use some kind of legacy/lazy pid checking -> [yes go to 8 | no go to 5]

5) look if any other daemon started with start-stop-daemon is claiming to have that pid -> [yes = fail | no go to 6]

6) is the service using --name [yes = go to 7 | no = fail]

7) check if the --name matches the one recorded by start-stop-daemon -> [yes = success | no = fail]

8) look if any other daemon started with start-stop-daemon is claiming to have that pid -> [yes = fail | no go to 9]

9) is the service using --name [yes = go to 10 | no = success]

10) check if the --name matches the one recorded by start-stop-daemon -> [yes = success | no = fail]


// Steve
Comment 17 Roy Marples 2009-12-12 16:46:55 UTC
Lazy matching(In reply to comment #15)
> 1) You HAVE NOT instructed sid-filter to create a pid file (you DON'T USE -P
> switch in sid-filter)
> 
> 2) You HAVE instructed sid-filter to create a pid file (you USE -P switch in
> sid-filter)
> 
> In both situations the pid file does not get created. So let's try to use
> --make-pidfile to force the creation. Doing so then results in: A pid file gets
> created but the pid referenced in the pid file is not a running process.

--make-pidfile is to be used when the daemon itself cannot fork. The man pages does state this, as you see it's PID+1. Although some systems have random PID assignments so we cannot guarantee PID+1.

If sid-milter is not making a valid pidfile when instructed to do so then it's a bug in sid-milter. Go fix it.
Comment 18 steveb 2009-12-13 16:41:09 UTC
(In reply to comment #17)
> Lazy matching(In reply to comment #15)
> > 1) You HAVE NOT instructed sid-filter to create a pid file (you DON'T USE -P
> > switch in sid-filter)
> > 
> > 2) You HAVE instructed sid-filter to create a pid file (you USE -P switch in
> > sid-filter)
> > 
> > In both situations the pid file does not get created. So let's try to use
> > --make-pidfile to force the creation. Doing so then results in: A pid file gets
> > created but the pid referenced in the pid file is not a running process.
> 
> --make-pidfile is to be used when the daemon itself cannot fork. The man pages
> does state this, as you see it's PID+1. Although some systems have random PID
> assignments so we cannot guarantee PID+1.
> 
> If sid-milter is not making a valid pidfile when instructed to do so then it's
> a bug in sid-milter. Go fix it.
> 
Okay. I think everything regarding this bug and OpenRC has been discussed and since this is not considered a bug in Gentoo nor in OpenRC I am setting the status to "UPSTREAM".