watchdog-5.4 or watchdog 5.6 hangs when stopping the service. This is a big problem, wenn doing a system reboot. kill -9 stopps the deamon instantly, but "/etc/init.d/watchdog stop" just hangs at "* Stopping wachdog ..." forever, while it use 100% CPU. Reproducible: Always emerge --info Portage 2.1.8.3 (default/linux/x86/10.0/server, gcc-4.4.3, glibc-2.11.2-r0, 2.6.34-gentoo-r1 i686) ================================================================= System uname: Linux-2.6.34-gentoo-r1-i686-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_4200+-with-gentoo-1.12.13 Timestamp of tree: Thu, 19 Aug 2010 13:00:21 +0000 app-shells/bash: 4.0_p37 dev-java/java-config: 2.1.11 dev-lang/python: 2.6.5-r3, 3.1.2-r4 sys-apps/baselayout: 1.12.13 sys-apps/sandbox: 1.6-r2 sys-devel/autoconf: 2.65 sys-devel/automake: 1.11.1 sys-devel/binutils: 2.20.1-r1 sys-devel/gcc: 4.4.3-r2 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6b virtual/os-headers: 2.6.30-r1 ACCEPT_KEYWORDS="x86" ACCEPT_LICENSE="* -@EULA" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=i686 -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /var/bind" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -march=i686 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="assume-digests distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch" GENTOO_MIRRORS="http://distfiles.gentoo.org" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS="de" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="acl apache2 bash-completion bzip2 cli cracklib cups cxx dri fam iconv modules mudflap mysql network-cron nptl nptlonly openmp pcre pppd readline reflection session snmp spl sysfs truetype unicode urandom x86 xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="pyramid" LINGUAS="de" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa via vmware voodoo" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
1. what watchdog hardware+driver are you using? 2. Do older versions of the daemon work? 3. Please include your watchdog.conf: egrep -v '^#|^$' /etc/watchdog.conf
1) just the software watchdog :-) (running in a virtualbox vm, current version) 2) I tried the watchdog for the first time, so only could use version 5.4 or 5.6. There is a new version on the website already, but renaming the ebuild accordingly did not work. The result were errors in applying the patch during the emerge. 3) watchdog.conf max-load-1 = 24 max-load-5 = 18 max-load-15 = 12 watchdog-device = /dev/watchdog realtime = yes priority = 1 had the same without any max-load settings enabled as well. Thank you!
what process exactly is using 100% CPU ? what does the init.d process tree look like exactly ? what if you strace the cpu-hungry process ? or gdb it ?
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11053 root 20 0 3004 924 412 R 99.9 0.1 2:12.60 /bin/bash /sbin/runscript.sh /etc/init.d/watchdog stop strace: rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 ...repeating permanently. I do not know exactly how to run "/etc/init.d/watchdog stop" via gdb. if needed, could you give me the commands? If I just attach to the process, I get: Attaching to process 1792 Reading symbols from /bin/bash...(no debugging symbols found)...done. Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libncurses.so.5 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 0xb771d2e9 in malloc () from /lib/libc.so.6
what does `cat /proc/self/status` show ? also, try running `/etc/init.d/watchdog --debug stop >& log` and post that log as an attachment
# cat /proc/self/status Name: cat State: R (running) Tgid: 7205 Pid: 7205 PPid: 7049 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: 0 1 2 3 4 6 10 11 20 26 27 VmPeak: 1780 kB VmSize: 1780 kB VmLck: 0 kB VmHWM: 236 kB VmRSS: 236 kB VmData: 188 kB VmStk: 132 kB VmExe: 40 kB VmLib: 1392 kB VmPTE: 12 kB VmSwap: 0 kB Threads: 1 SigQ: 0/8048 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000000000000 CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff Cpus_allowed: 1 Cpus_allowed_list: 0 voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 2 Added the log-attachment. I did a CTRL-C after a few seconds, before the log was getting too big.
Created attachment 244341 [details] outpug of `/etc/init.d/watchdog --debug stop >& log`
that trace is helpful. it shows that the code hanging isnt in the watchdog or baselayout. it's a function called get_config which is called from get_delay. i noticed you have bootchart enabled ... i'd suggest you `emerge -C` that package and see if things work better. not that i can find what package exactly is declaring these functions as they dont appear to be part of bootchart either.
removing bootchart did not help. But I took a deeper look in the /etc/init.d/watchdog file. The functions get_config and get_delay are defined there. if I remove "--retry $(get_delay)" from the stop command in stop() { ebegin "Stopping watchdog" start-stop-daemon --stop \ --exec /usr/sbin/watchdog --pidfile /var/run/watchdog.pid --retry $(get_delay) eend $? } Then watchdog sometimes get stopped, sometimes not. I guess that for the "sometimes not" there is the delay function which does not work as it should. When the stopping fails, watchdog still gets stopped a bit later it seams.
yeah ok, i'm dumb. you're right of course. try this patch: --- files/watchdog-init.d 16 May 2009 16:59:26 -0000 1.2 +++ files/watchdog-init.d 24 Aug 2010 20:18:07 -0000 @@ -15,6 +15,7 @@ get_config() { echo $2 return fi + shift done echo /etc/watchdog.conf }
yaaaaaaaaaaaa! It seems to work now! :-)
thanks for testing ... ive committed it. there's a new version out i believe, so i'll save the rev bump for that. http://sources.gentoo.org/sys-apps/watchdog/files/watchdog-init.d?r1=1.2&r2=1.3