| Summary: | app-admin/syslog-ng: init.d script should depend on started network | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Mathias Weigt <weigt.mathias> |
| Component: | [OLD] Core system | Assignee: | Mr. Bones. (RETIRED) <mr_bones_> |
| Status: | RESOLVED INVALID | ||
| Severity: | normal | CC: | reuben-gentoo-bugzilla |
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: |
Patch for /etc/init.d/syslog-ng and /etc/init.d/rsyslog
nsswitch.conf syslog-ng.conf Another syslog-ng.conf which is resulting in syslog-ng failing to start |
||
Created attachment 255869 [details, diff]
Patch for /etc/init.d/syslog-ng and /etc/init.d/rsyslog
Don't know if this is the best solution but it works for me...
The syslog-ng init.d script already tries to find a network dependency in syslog-ng.conf, and sets "need net" accordingly, so I don't see how it would end up being started before networking, unless the sed script or the case .. esac is bad and doesn't set the dependency. If you depend on LDAP to acquire credentials/privileges, then relying on DNS may render a less than dependable experience, so you should probably go for a static networking setup, and add the LDAP host to /etc/hosts as well, or specify its IP address. Also, if the syslogger isn't running, then /dev/log simply isn't being read, and as it's a ring buffer, I doubt the syslog being "full" is causing your problems. Could you attach your syslog-ng.conf and tell us some more about your LDAP setup? Meanwhile, please file a separate bug report for app-admin/rsyslog as it isn't maintained by the same Gentoo developers. In that bug report, you can refer to the patch attached here and other information, if you like. (In reply to comment #2) > The syslog-ng init.d script already tries to find a network dependency in > syslog-ng.conf, and sets "need net" accordingly, so I don't see how it would > end up being started before networking, unless the sed script or the case .. > esac is bad and doesn't set the dependency. When I am following the boot sequence: first net.lo is started then among some other services syslog-ng is started and then the DHCP starts getting an IP-Adress for my network card. I checked /etc/conf.d/rc and I switched from RC_NET_STRICT_CHECKING="no" to "yes" but it didn't help. > If you depend on LDAP to acquire > credentials/privileges, then relying on DNS may render a less than dependable > experience, so you should probably go for a static networking setup, and add > the LDAP host to /etc/hosts as well, or specify its IP address. Of course the LDAP host is already in /etc/hosts and a static networking setup is clearly no alternative as I can't manage >50 PCs in my group with static networking. The "D" in DNS and DHCP is stands for Dynamic and is meant for easing the administration now for decades... > Could you attach your syslog-ng.conf and tell us some more about your LDAP > setup? I haven't modified the syslog-ng.conf. Well I used to have a NIS master/slave server system running for years - before it stopped working after one of the last ypserv updates (around a year ago). This basically served user data, passwords, autofs maps, hostnames and a bit more. Because no one seemed to care about NIS anymore I decided to go for a LDAP setup (with replication) which then replaced the NIS servers successfully. Till now the clients used "nss_ldap" and query every little thing directly from the server. For this reason I tried the still experimental "nss_ldapd" module instead of nss_ldap which seems to be a kind of caching daemon. And although this is started rather at the very end of the boot process - now (unmodified) syslog-ng seems to get what it wants from the module (even though the nslcd is not started yet) and does not freeze anymore after startup. So stabilizing nss_ldapd would also be an option to solve this. Created attachment 255941 [details]
nsswitch.conf
Are you running syslog-ng as non-root? Maybe it tries to obtain credentials through ldap when dropping its root privileges? Assigning anyway as this is over my head already. Post your syslog-ng.conf please. Created attachment 256571 [details]
syslog-ng.conf
No - syslog-ng seems to be running as root (according to the process list). Also I did not change anything to the standard Gentoo syslog-ng installation (stable baselayout -> emerge syslog-ng and rc-update) You aren't using any net-related destinations in your config so there's no reason for the service to depend on the net service. +1 here, seeing the exact same problem, namely syslog-ng failing to start on account of the network interfaces specified in the config file not yet being up. I will post my config shortly. Created attachment 256888 [details]
Another syslog-ng.conf which is resulting in syslog-ng failing to start
Error message on startup:
Error binding socket; addr='AF_INET(192.168.10.12:601)', error='Cannot assign requested address (99)'
Error initializing source driver; source='net', id='net#0'
Error initializing message pipeline;
* start-stop-daemon: failed to start `/usr/sbin/syslog-ng'
* Failed to start syslog-ng
Ä !! Ü
* ERROR: syslog-ng failed to start
the syslog-ng init service will add need net and use stunnel to the deps for the service if you have a net-related source or destination in the config file. If you need it to start after the network is completely up you should set RC_NET_STRICT_CHECKING=yes in /etc/conf.d/rc, otherwise, it's possible syslog-ng will be started before the necessary net device is up. If RC_NET_STRICT_CHECKING=yes isn't working correctly that sounds like a base-system issue unrelated to syslog-ng. The initially reported issue sounds like a dep problem with the ldap service. In any case, making the syslog-ng service unconditionally be "after net" isn't the right thing. It's clearly not true for people using syslog-ng only locally. I'm marking this bug invalid since it either seems like user error or some problem with some other package other than syslog-ng, (in which case a separate bug should be filed against that other package). |
Without the attached patch my LDAP based systems die after some time running because no process can write to the syslog (/var/log/messages is empty) even though the syslog is in the process list. It can't be stopped or restarted - only killed with -9. After a while one can't login anymore and the system is behaving very "strange". (e.g. a lot of cron zombies are laying around and dmesg if full of backtraces) This started after the last update of my systems... To fix this, one needs to add "after net" inside the "depends() {" block of the start-up script of syslog-ng, rsyslog, or whatever is the syslogger. (See attached scripts). Here is a (maybe completely stupid) theory about it: Modern syslogger as syslog-ng and rsyslog may need networks access to work (... of course...). But the startup script of syslog-ng fails to detect that network could be necessary, and rsyslog's start-up script doesn't even take this into account. By doing some experimenting (including failing to get a meaningful stack trace from GDB), I found out that they start without hanging only after network is up. After checking on other machines, it seems that openSuSE and Debian are configured exactly that way (they start-up rsyslog only after the network connection is up). So adding "after net" to their requirement did fix everything. And as it is an "after" clause and not a "need" clause, it wait for the net only on start-up. Service restart and shut-down aren't affected. I strongly suspect that this is due to the fact that these modern loggers need some information which is ultimately pulled from the LDAP in my situation, though I haven't exactly tracked down which piece of information is needed (lack time and energy to do complete regression testing properly). "getent passwd" freezes in a similar manner at that point of the booting process, so it's good enough for me. A good suspect is the "hosts: files dns ldap" sequence in nsswitch.conf. Modern network-aware syslogger will very likely try to get names of networked devices. But as no DNS is running at that point the DNS fails. And LDAP needs both network to work (should timeout after an eternity) and name resolution to find the server (which loops back to "hosts: files dns ldap" - creating an infinite loop, and explaining the repeated/corrupted backtrace I get when I try observing the problem with GDB). Reproducible: Always Steps to Reproduce: 1. start the system 2. wait some time (at least one day) 3. try to login (with ssh / console / kdm) 4. killall -9 syslog-ng and restart syslog to get the system working again Actual Results: /var/log/messages stays empty A lot of cron zombie processes are in the process list - because cron couldn't write to syslog it is supposed to be dead and a new cron started every some minutes... After a while no login is possible Expected Results: /var/log/messages should be populated after bootup