Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 347209 - app-admin/syslog-ng: init.d script should depend on started network
Summary: app-admin/syslog-ng: init.d script should depend on started network
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Mr. Bones. (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-29 19:33 UTC by Mathias Weigt
Modified: 2011-01-19 04:33 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Patch for /etc/init.d/syslog-ng and /etc/init.d/rsyslog (syslog-net.patch,583 bytes, patch)
2010-11-29 19:35 UTC, Mathias Weigt
Details | Diff
nsswitch.conf (nsswitch.conf,534 bytes, text/plain)
2010-11-30 12:52 UTC, Mathias Weigt
Details
syslog-ng.conf (syslog-ng.conf,1.09 KB, text/plain)
2010-12-07 04:52 UTC, Mathias Weigt
Details
Another syslog-ng.conf which is resulting in syslog-ng failing to start (syslog-ng.conf,5.38 KB, text/plain)
2010-12-11 11:49 UTC, Reuben Farrelly
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mathias Weigt 2010-11-29 19:33:09 UTC
Without the attached patch my LDAP based systems die after some time running because no process can write to the syslog (/var/log/messages is empty) even though the syslog is in the process list. It can't be stopped or restarted - only killed with -9. After a while one can't login anymore and the system is behaving very "strange". (e.g. a lot of cron zombies are laying around and dmesg if full of backtraces)  This started after the last update of my systems...

To fix this, one needs to add
 "after net"
inside the
 "depends() {"
block of the start-up script of syslog-ng, rsyslog, or whatever is the
syslogger.

(See attached scripts).

Here is a (maybe completely stupid) theory about it:
 
Modern syslogger as syslog-ng and rsyslog may need networks access to
work (... of course...).
But the startup script of syslog-ng fails to detect that network could
be necessary, and rsyslog's start-up script doesn't even take this into
account.

By doing some experimenting (including failing to get a meaningful stack
trace from GDB), I found out that they start without hanging only after
network is up. After checking on other machines, it seems that
openSuSE and Debian are configured exactly that way (they start-up
rsyslog only after the network connection is up). So adding "after net"
to their requirement did fix everything.

And as it is an "after" clause and not a "need" clause, it wait for the
net only on start-up. Service restart and shut-down aren't affected.

I strongly suspect that this is due to the fact that these modern
loggers need some information which is ultimately pulled from the LDAP
in my situation, though I haven't exactly tracked down which piece of
information is needed (lack time and energy to do complete regression
testing properly).
"getent passwd" freezes in a similar manner at that point of the booting
process, so it's good enough for me.

A good suspect is the "hosts: files dns ldap" sequence in nsswitch.conf.
Modern network-aware syslogger will very likely try to get names of
networked devices. But as no DNS is running at that point the DNS fails.
And LDAP needs both network to work (should timeout after an eternity)
and name resolution to find the server (which loops back to "hosts:
files dns ldap" - creating an infinite loop, and explaining the
repeated/corrupted backtrace I get when I try observing the problem with
GDB).


Reproducible: Always

Steps to Reproduce:
1. start the system
2. wait some time (at least one day)
3. try to login (with ssh / console / kdm)
4. killall -9 syslog-ng and restart syslog to get the system working again
Actual Results:  
/var/log/messages stays empty
A lot of cron zombie processes are in the process list - because cron couldn't write to syslog it is supposed to be dead and a new cron started every some minutes...
After a while no login is possible

Expected Results:  
/var/log/messages should be populated after bootup
Comment 1 Mathias Weigt 2010-11-29 19:35:06 UTC
Created attachment 255869 [details, diff]
Patch for /etc/init.d/syslog-ng and /etc/init.d/rsyslog

Don't know if this is the best solution but it works for me...
Comment 2 Jeroen Roovers (RETIRED) gentoo-dev 2010-11-29 19:52:34 UTC
The syslog-ng init.d script already tries to find a network dependency in syslog-ng.conf, and sets "need net" accordingly, so I don't see how it would end up being started before networking, unless the sed script or the case .. esac is bad and doesn't set the dependency. If you depend on LDAP to acquire credentials/privileges, then relying on DNS may render a less than dependable experience, so you should probably go for a static networking setup, and add the LDAP host to /etc/hosts as well, or specify its IP address.

Also, if the syslogger isn't running, then /dev/log simply isn't being read, and as it's a ring buffer, I doubt the syslog being "full" is causing your problems.

Could you attach your syslog-ng.conf and tell us some more about your LDAP setup?

Meanwhile, please file a separate bug report for app-admin/rsyslog as it isn't maintained by the same Gentoo developers. In that bug report, you can refer to the patch attached here and other information, if you like.
Comment 3 Mathias Weigt 2010-11-30 12:51:43 UTC
(In reply to comment #2)
> The syslog-ng init.d script already tries to find a network dependency in
> syslog-ng.conf, and sets "need net" accordingly, so I don't see how it would
> end up being started before networking, unless the sed script or the case ..
> esac is bad and doesn't set the dependency. 

When I am following the boot sequence: first net.lo is started then among some other services syslog-ng is started and then the DHCP starts getting an IP-Adress for my network card. I checked /etc/conf.d/rc and I switched from RC_NET_STRICT_CHECKING="no" to "yes" but it didn't help.

> If you depend on LDAP to acquire
> credentials/privileges, then relying on DNS may render a less than dependable
> experience, so you should probably go for a static networking setup, and add
> the LDAP host to /etc/hosts as well, or specify its IP address.

Of course the LDAP host is already in /etc/hosts and a static networking setup is clearly no alternative as I can't manage >50 PCs in my group with static networking. The "D" in DNS and DHCP is stands for Dynamic and is meant for easing the administration now for decades...

> Could you attach your syslog-ng.conf and tell us some more about your LDAP
> setup?

I haven't modified the syslog-ng.conf.
Well I used to have a NIS master/slave server system running for years - before it stopped working after one of the last ypserv updates (around a year ago).
This basically served user data, passwords, autofs maps, hostnames and a bit more.
Because no one seemed to care about NIS anymore I decided to go for a LDAP setup (with replication) which then replaced the NIS servers successfully.
Till now the clients used "nss_ldap" and query every little thing directly from the server. For this reason I tried the still experimental "nss_ldapd" module instead of nss_ldap which seems to be a kind of caching daemon. And although this is started rather at the very end of the boot process - now (unmodified) syslog-ng seems to get what it wants from the module (even though the nslcd is not started yet) and does not freeze anymore after startup.

So stabilizing nss_ldapd would also be an option to solve this. 
Comment 4 Mathias Weigt 2010-11-30 12:52:40 UTC
Created attachment 255941 [details]
nsswitch.conf
Comment 5 Jeroen Roovers (RETIRED) gentoo-dev 2010-12-07 02:32:26 UTC
Are you running syslog-ng as non-root? Maybe it tries to obtain credentials through ldap when dropping its root privileges? Assigning anyway as this is over my head already.
Comment 6 Mr. Bones. (RETIRED) gentoo-dev 2010-12-07 02:57:14 UTC
Post your syslog-ng.conf please.
Comment 7 Mathias Weigt 2010-12-07 04:52:39 UTC
Created attachment 256571 [details]
syslog-ng.conf
Comment 8 Mathias Weigt 2010-12-07 04:59:45 UTC
No - syslog-ng seems to be running as root (according to the process list).
Also I did not change anything to the standard Gentoo syslog-ng installation (stable baselayout -> emerge syslog-ng and rc-update)
Comment 9 Mr. Bones. (RETIRED) gentoo-dev 2010-12-07 05:03:37 UTC
You aren't using any net-related destinations in your config so there's no reason for the service to depend on the net service.
Comment 10 Reuben Farrelly 2010-12-11 11:47:41 UTC
+1 here, seeing the exact same problem, namely syslog-ng failing to start on account of the network interfaces specified in the config file not yet being up.
I will post my config shortly.
Comment 11 Reuben Farrelly 2010-12-11 11:49:05 UTC
Created attachment 256888 [details]
Another syslog-ng.conf which is resulting in syslog-ng failing to start

Error message on startup:

Error binding socket; addr='AF_INET(192.168.10.12:601)', error='Cannot assign requested address (99)'
Error initializing source driver; source='net', id='net#0'
Error initializing message pipeline;
 * start-stop-daemon: failed to start `/usr/sbin/syslog-ng'
 * Failed to start syslog-ng
 Ä !! Ü
 * ERROR: syslog-ng failed to start
Comment 12 Mr. Bones. (RETIRED) gentoo-dev 2011-01-19 04:33:07 UTC
the syslog-ng init service will add need net and use stunnel to the deps for the service if you have a net-related source or destination in the config file.  If you need it to start after the network is completely up you should set RC_NET_STRICT_CHECKING=yes in /etc/conf.d/rc, otherwise, it's possible syslog-ng will be started before the necessary net device is up.  If RC_NET_STRICT_CHECKING=yes isn't working correctly that sounds like a base-system issue unrelated to syslog-ng.

The initially reported issue sounds like a dep problem with the ldap service.

In any case, making the syslog-ng service unconditionally be "after net" isn't the right thing.  It's clearly not true for people using syslog-ng only locally.

I'm marking this bug invalid since it either seems like user error or some problem with some other package other than syslog-ng, (in which case a separate bug should be filed against that other package).