I cannot say which version started failing, but restarting syslog-ng # pgrep -lfa syslog 3797 /usr/bin/python3.6 /usr/bin/fail2ban-server --async -b -s /run/fail2ban/fail2ban.sock -p /run/fail2ban/fail2ban.pid --loglevel INFO --logtarget /var/log/fail2ban.log --syslogsocket auto 4775 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid 29625 supervising syslog-ng # /etc/init.d/syslog-ng stop * Stopping fail2ban ... [ ok ] * Stopping syslog-ng ... * start-stop-daemon: no matching processes found [ ok ] # pgrep -lfa syslog 5177 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid 29625 supervising syslog-ng After manually killing all related processes: # pgrep -lfa syslog 5253 supervising syslog-ng 5254 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid # cat /run/syslog-ng.pid 5303 # cat /run/syslog-ng.pid 5366 # cat /run/syslog-ng.pid 5370 The process writing to /run/syslog-ng.pid is apparently not the supervisor.
Are you using the default configuration?
(In reply to Tomáš Mózes from comment #1) > Are you using the default configuration? No, this is a local service that additionally collects syslogs from other systems in the vicinity, so it spawns child processes as needed when these systems send log entries. That may be related, but I shouldn't think catching the proper supervisor PID would have any bearing on that.
Can you please share your configuration so I can reproduce it locally? Thank you
Created attachment 589758 [details] syslog-ng.conf
I see the same problem on other systems with a less complicated configuration. karsten ~ # /etc/init.d/syslog-ng --nodeps restart * Stopping syslog-ng ... [ ok ] * Checking your configfile (/etc/syslog-ng/syslog-ng.conf) ... [2019-09-14T13:15:55.474483] Plugin module not found in 'module-path'; module-path='/usr/lib/syslog-ng', module='http' [ ok ] * Starting syslog-ng ... [2019-09-14T13:15:55.773470] Plugin module not found in 'module-path'; module-path='/usr/lib/syslog-ng', module='http' [ ok ] karsten ~ # karsten ~ # pgrep -lfa syslog-ng 3529 supervising syslog-ng 15062 supervising syslog-ng 15063 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid 15072 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid karsten ~ # cat /run/syslog-ng.pid 15872 karsten ~ # pgrep -lfa syslog-ng 15130 supervising syslog-ng 15872 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid
I guess this has to do with opening and closing TCP connections: Sep 14 13:32:03 karsten syslog-ng[16057]: Syslog connection established; fd='17', server='AF_INET(192.168.242.68:6666)', local='AF_INET(0.0.0.0:0)' Sep 14 13:32:03 karsten syslog-ng[16057]: EOF occurred while idle; fd='17' Sep 14 13:32:03 karsten syslog-ng[16057]: Syslog connection closed; fd='17', server='AF_INET(192.168.242.68:6666)', time_reopen='60' Sep 14 13:35:04 karsten syslog-ng[16057]: Syslog connection failed; fd='4', server='AF_INET(192.168.242.68:6666)', error='Connection refused (239)', time_reopen='60' Sep 14 13:35:21 karsten dhcpcd[15903]: eth0: DHCPv6 REPLY: No Addresses Available Sep 14 13:36:04 karsten syslog-ng[16057]: Syslog connection failed; fd='18', server='AF_INET(192.168.242.68:6666)', error='Connection refused (239)', time_reopen='60' Sep 14 13:37:04 karsten syslog-ng[16057]: Syslog connection established; fd='18', server='AF_INET(192.168.242.68:6666)', local='AF_INET(0.0.0.0:0)' Sep 14 13:38:04 karsten syslog-ng[16057]: EOF occurred while idle; fd='18' Sep 14 13:38:04 karsten syslog-ng[16057]: Syslog connection closed; fd='18', server='AF_INET(192.168.242.68:6666)', time_reopen='60' Sep 14 13:39:04 karsten syslog-ng[16057]: Syslog connection failed; fd='4', server='AF_INET(192.168.242.68:6666)', error='Connection refused (239)', time_reopen='60' Sep 14 13:40:01 karsten CROND[16145]: (root) CMD (/usr/lib/sa/sa1 1 1) Sep 14 13:40:02 karsten syslog-ng[16148]: syslog-ng starting up; version='3.22.1' Sep 14 13:40:02 karsten syslog-ng[16148]: Syslog connection failed; fd='14', server='AF_INET(192.168.242.68:6666)', error='Connection refused (239)', time_reopen='60' Sep 14 13:41:02 karsten syslog-ng[16148]: Syslog connection established; fd='4', server='AF_INET(192.168.242.68:6666)', local='AF_INET(0.0.0.0:0)' Sep 14 13:41:04 karsten syslog-ng[16148]: EOF occurred while idle; fd='4' Sep 14 13:41:04 karsten syslog-ng[16148]: Syslog connection closed; fd='4', server='AF_INET(192.168.242.68:6666)', time_reopen='60' but that still wouldn't explain why the supervisor is no longer in control of the PID file. Bad (re)design?
I have what is probably a related bug/problem. I use a "destination program();" facility. Starting with 3.22.1, the behavior of the standard /etc/init.d/syslog-ng script was okay on start, but on stop, the supervising daemon restarted syslog-ng Checked with "ps aux | grep syslog-ng" Also, the program that was spawned by used of "destination program();" kept running. Even killing syslog-ng with pkill or similar, the program spawned by syslog-ng kept running. Keyworded 3.23.1-r1 and both symptoms (1. syslog-ng wasn't really stopped, pid file removed, but syslog-ng supervising daemon respawned syslog-ng, and 2. destination program was not killed when killing all of syslong-ng) were resolved. I believe the issue is limited to 3.22. It appeared for me sometime after August 14 upgrade from 3.17.2
Details with remarks. Prompt is user, machine, screen window [8], history number, working directory, # The relevant syslog-ng.conf entry that uses a homebrew "watch-logs" program is destination watch-logs { program("/usr/local/sbin/watch-logs" ts_format(unix)); }; root@hypoid-2 [8] 121 /root # /etc/init.d/syslog-ng start * Caching service dependencies ... [ ok ] * Checking your configfile (/etc/syslog-ng/syslog-ng.conf) ... [ ok ] * Starting syslog-ng ... [ ok ] root@hypoid-2 [8] 122 /root # ps aux | grep -e syslog-ng -e watch-l root 18802 0.0 0.0 8128 356 ? S 10:58 0:00 supervising syslog-ng root 18803 0.2 0.0 27816 6564 ? Ssl 10:58 0:00 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid root 18804 0.8 0.0 8684 3512 ? S 10:58 0:00 /bin/bash /usr/local/sbin/watch-logs root 18959 0.0 0.0 7816 792 pts/31 S+ 10:58 0:00 grep --color=auto -e syslog-ng -e watch-l [note sequential PID, 18802, 18803, 18804] root@hypoid-2 [8] 123 /root # /etc/init.d/syslog-ng stop * Stopping syslog-ng ... [ ok ] root@hypoid-2 [8] 124 /root # ps aux | grep -e syslog-ng -e watch-l root 18802 0.0 0.0 8128 2148 ? S 10:58 0:00 supervising syslog-ng root 18804 16.9 0.0 8804 3604 ? R 10:58 0:03 /bin/bash /usr/local/sbin/watch-logs root 19006 0.3 0.0 27816 6572 ? Ssl 10:58 0:00 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid root 19007 0.3 0.0 8684 3552 ? S 10:58 0:00 /bin/bash /usr/local/sbin/watch-logs root 19152 0.0 0.0 7816 772 pts/31 S+ 10:58 0:00 grep --color=auto -e syslog-ng -e watch-l [supervising syslog-ng, same PID as before, spawned a new [19006] instance of syslog-ng, and now there are TWO watch-logs [18808 and 19007], the second one spawned by syslog-ng 19006] root@hypoid-2 [8] 125 /root # pkill watch-logs root@hypoid-2 [8] 126 /root # ps aux | grep -e syslog-ng -e watch-l root 18802 0.0 0.0 8128 2148 ? S 10:58 0:00 supervising syslog-ng root 19006 0.0 0.0 37036 6780 ? Ssl 10:58 0:00 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng syslog-ng.conf --control /run/syslog-ng.ctl --persist-file /var/lib/syslog-ng/syslog-ng.persist --pidfile /run/syslog-ng.pid root 19163 0.6 0.0 8684 3484 ? S 10:58 0:00 /bin/bash /usr/local/sbin/watch-logs root 19309 0.0 0.0 7816 732 pts/31 S+ 10:58 0:00 grep --color=auto -e syslog-ng -e watch-l [original supervisor 18802 still at work, syslog-ng [19006] correctly noticed its "watch-logs" was gone, and spawned a fresh instance [19163]. Remember, all this after "/etc/init.d/syslog-ng stop"] A few commands to get everything to stop, kill syslog-ng first, then watch-logs root@hypoid-2 [8] 127 /root # /etc/init.d/syslog-ng status * status: stopped root@hypoid-2 [8] 128 /root # /etc/init.d/syslog-ng zap * Manually resetting syslog-ng to stopped state root@hypoid-2 [8] 129 /root # pkill syslog-ng watch-logs pkill: only one pattern can be provided root@hypoid-2 [8] 130 /root # pkill syslog-ng root@hypoid-2 [8] 131 /root # pkill watch-logs At this point workaround proved possible using "--process-mode background" (default is "safe-background), and a stop_post routine in the init script, manually killing watch-logs), but was not happy with the mess. Keyword and install 3.23.1-r1, took the default init script, and now "init stop" kills all of syslog-ng and watch-logs, as it did before 3.22.
Please attach your emerge --info app-admin/syslog-ng (mainly to see the USE flags for syslog-ng for better reproducibility). Jeroen, does it help if you start with process-mode=background?
I've failed to reproduce this issue so far (tested on 3.22.1 and 3.23.1-r1). app-admin/syslog-ng-3.22.1::gentoo was built with the following: USE="caps ipv6 tcpd -amqp -dbi -geoip -geoip2 -http -json -kafka -libressl -mongodb -pacct -python -redis -smtp -snmp -spoof-source -systemd" ABI_X86="(64)" PYTHON_SINGLE_TARGET="python3_6 -python2_7 -python3_5 -python3_7" PYTHON_TARGETS="python2_7 python3_6 python3_7 -python3_5" I started sending messages to this instance via tcp/udp in a endless loop (from 3 machines) and restarting syslog-ng on the central logging server. # /etc/init.d/syslog-ng restart * Stopping syslog-ng ... [ ok ] * Checking your configfile (/etc/syslog-ng/syslog-ng.conf) ... [ ok ] * Starting syslog-ng ... [2019-09-22T05:30:51.541812] WARNING: window sizing for tcp sources were changed in syslog-ng 3.3, the configuration value was divided by the value of max-connections(). The result was too small, clamping to value of min_iw_size_per_reader. Ensure you have a proper log_fifo_size setting to avoid message loss.; orig_log_iw_size='20', new_log_iw_size='100', min_iw_size_per_reader='100', min_log_fifo_size='5000' Same with --process-mode=background. Portage 2.3.76 (python 3.6.9-final-0, default/linux/amd64/17.1, gcc-9.2.0, glibc-2.29-r5, 4.14.143-gentoo x86_64) ================================================================= System uname: Linux-4.14.143-gentoo-x86_64-Intel-R-_Xeon-R-_CPU_E5-2620_v3_@_2.40GHz-with-gentoo-2.6 KiB Mem: 9971832 total, 913756 free KiB Swap: 0 total, 0 free Timestamp of repository gentoo: Sun, 22 Sep 2019 00:15:01 +0000 Head commit of repository gentoo: e7cf6aa4ee5d7e8a34dc026a82ea3dba48015c86 sh bash 5.0_p11 ld GNU ld (Gentoo 2.32 p2) 2.32.0 app-shells/bash: 5.0_p11::gentoo dev-java/java-config: 2.2.0-r4::gentoo dev-lang/perl: 5.30.0::gentoo dev-lang/python: 2.7.16::gentoo, 3.6.9::gentoo, 3.7.4-r1::gentoo dev-util/cmake: 3.15.3::gentoo sys-apps/baselayout: 2.6-r1::gentoo sys-apps/openrc: 0.42.1::gentoo sys-apps/sandbox: 2.18::gentoo sys-devel/autoconf: 2.69-r4::gentoo sys-devel/automake: 1.16.1-r1::gentoo sys-devel/binutils: 2.32-r1::gentoo sys-devel/gcc: 9.2.0::gentoo sys-devel/gcc-config: 2.1::gentoo sys-devel/libtool: 2.4.6-r5::gentoo sys-devel/make: 4.2.1-r4::gentoo sys-kernel/linux-headers: 5.3::gentoo (virtual/os-headers) sys-libs/glibc: 2.29-r5::gentoo ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="*" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-mtune=native -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/bind" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.2/ext-active/ /etc/php/apache2-php7.3/ext-active/ /etc/php/cgi-php7.2/ext-active/ /etc/php/cgi-php7.3/ext-active/ /etc/php/cli-php7.2/ext-active/ /etc/php/cli-php7.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-mtune=native -O2 -pipe" DISTDIR="/usr/portage/distfiles" ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-mtune=native -O2 -pipe" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-mtune=native -O2 -pipe" GENTOO_MIRRORS="http://tux.rainside.sk/gentoo/ http://gentoo.wheel.sk/" LANG="en_US.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j10" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="acl amd64 berkdb bzip2 cli crypt cxx dri fortran gdbm iconv ipv6 libtirpc multilib ncurses nptl openmp pam pcre readline seccomp split-usr ssl tcpd unicode xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 avx avx2 f16c fma3 pclmul popcnt sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6 python3_7" RUBY_TARGETS="ruby24 ruby25" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
(In reply to Tomáš Mózes from comment #9) > Please attach your emerge --info app-admin/syslog-ng (mainly to see the USE > flags for syslog-ng for better reproducibility). app-admin/syslog-ng-3.23.1-r1::gentoo was built with the following: USE="-amqp -caps -dbi -geoip -geoip2 -http -ipv6 -json -kafka -libressl -mongodb -pacct -python -redis -smtp -snmp -spoof-source -systemd -tcpd" PYTHON_SINGLE_TARGET="-python2_7 -python3_5 python3_6 (-python3_7)" PYTHON_TARGETS="python2_7 -python3_5 python3_6 (-python3_7)" This is the same across three machines here, all approximately stock (a few keyword apps, not the same across machines), all up to date. Two intel machines (Lenovo T420, Lenovo X10) and an AMD machine. The syslog-ng configuration does include network traffic, but that part works fine in all cases. For clients: destination d_rlogger {udp("cboldt.is-a-geek.net" port(514));}; filter f_rlogger { program("Rootkit|smartd|syslog-ng|tripwire"); }; log { source(src); filter(f_rlogger); destination(d_rlogger); }; and for host: source s_network { udp(port(514)); }; destination d_clients { file("/var/log/CLIENTS/$HOST"); }; log { source(s_network); destination(d_clients); }; The only failure symptom I get on "init restart" is an added instance of the companion watch-logs program that is used as a destination for log messages. In your config, everything will look normal on "init restart," the running instance of syslog-ng will be replaced by a new one. I get that behavior on "init stop," which is clearly not correct. Changing to --process-mode=background, the supervising part of syslog-ng is eliminated on startup, and "init stop" works to stop syslog-ng. But, only with 3.22, "init stop" does not kill the child watch-logs process. Whatever changed appears to me to be a process spawning and process killing issue. With default -process-mode=safe-background, syslog-ng is two processes: supervisor and syslog-ng. PID recorded by OpenRC init is NOT the supervisor, but a child process to the supervisor. The child process is the operating syslog-ng. With 3.22, "init stop" kills syslog-ng, and the supervisor starts a new one, making sure syslog-ng keeps running. With 3.23, PID recorded by OpenRC is the same as with 3.22 (NOT the supervisor), but after "init stop," both the supervisor and syslog-ng are gone. Try "init stop" followed by "ps ax | grep syslog-ng" I can provide the rest of my emerge-info if you want. Kept it out of here for clutter purposes. The process spawn/kill behavior was the same across all three machines. I see the issue as having a "process spawn/kill" root. In my case, killing syslog-ng (didn't matter whether started under --process-mode=background or as a child under --process-mode=safe-background) did not kill syslog-ng's child, watch-logs. It took specialized code in the init script to deal with that. Specialized in that it had to use the program name, watch-logs. Messy. Not portable to others.
Is it fixed with 3.24.1? Do you still need the "process-mode=background" workaround?
(In reply to Tomáš Mózes from comment #12) > Is it fixed with 3.24.1? Do you still need the "process-mode=background" > workaround? It is fixed for me, no workaround at all, with 3.23.1-r1, the most recent version since I sync'd on Saturday, October 19th. I'm happy to try 3.24.1 and report. Maybe you posted a link to or an ebuild for that and I haven't seen it yet.
Thanks, I'll close the bug in that case. Please resync the tree if you want to try 3.24.1.