Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 333441 - sys-apps/watchdog: system hangs at "stopping watchdog"
Summary: sys-apps/watchdog: system hangs at "stopping watchdog"
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-19 13:39 UTC by mario
Modified: 2010-08-24 21:02 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
outpug of `/etc/init.d/watchdog --debug stop >& log` (log.zip,14.34 KB, text/plain)
2010-08-24 07:36 UTC, mario
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mario 2010-08-19 13:39:14 UTC
watchdog-5.4 or watchdog 5.6 hangs when stopping the service.
This is a big problem, wenn doing a system reboot.

kill -9 stopps the deamon instantly, but "/etc/init.d/watchdog stop" just hangs at "* Stopping wachdog ..." forever, while it use 100% CPU.





Reproducible: Always




emerge --info
Portage 2.1.8.3 (default/linux/x86/10.0/server, gcc-4.4.3, glibc-2.11.2-r0, 2.6.34-gentoo-r1 i686)
=================================================================
System uname: Linux-2.6.34-gentoo-r1-i686-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_4200+-with-gentoo-1.12.13
Timestamp of tree: Thu, 19 Aug 2010 13:00:21 +0000
app-shells/bash:     4.0_p37
dev-java/java-config: 2.1.11
dev-lang/python:     2.6.5-r3, 3.1.2-r4
sys-apps/baselayout: 1.12.13
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.65
sys-devel/automake:  1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.4.3-r2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6b
virtual/os-headers:  2.6.30-r1
ACCEPT_KEYWORDS="x86"
ACCEPT_LICENSE="* -@EULA"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=i686 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -march=i686 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="de"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="acl apache2 bash-completion bzip2 cli cracklib cups cxx dri fam iconv modules mudflap mysql network-cron nptl nptlonly openmp pcre pppd readline reflection session snmp spl sysfs truetype unicode urandom x86 xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="pyramid" LINGUAS="de" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa via vmware voodoo" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-08-23 05:12:14 UTC
1. what watchdog hardware+driver are you using?
2. Do older versions of the daemon work?
3. Please include your watchdog.conf: egrep -v '^#|^$' /etc/watchdog.conf
Comment 2 mario 2010-08-23 07:39:58 UTC
1) just the software watchdog :-)

(running in a virtualbox vm, current version)


2) I tried the watchdog for the first time, so only could use version 5.4 or 5.6.
There is a new version on the website already, but renaming the ebuild accordingly did not work. The result were errors in applying the patch during the emerge.

3) watchdog.conf
max-load-1              = 24
max-load-5              = 18
max-load-15             = 12
watchdog-device = /dev/watchdog
realtime                = yes
priority                = 1


had the same without any max-load settings enabled as well.



Thank you!
Comment 3 SpanKY gentoo-dev 2010-08-23 07:53:37 UTC
what process exactly is using 100% CPU ?  what does the init.d process tree look like exactly ?

what if you strace the cpu-hungry process ?  or gdb it ?
Comment 4 mario 2010-08-23 09:29:13 UTC
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11053 root      20   0  3004  924  412 R 99.9  0.1   2:12.60 /bin/bash /sbin/runscript.sh /etc/init.d/watchdog stop


strace:

rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

...repeating permanently.


I do not know exactly how to run "/etc/init.d/watchdog stop" via gdb.
if needed, could you give me the commands?

If I just attach to the process, I get:

Attaching to process 1792
Reading symbols from /bin/bash...(no debugging symbols found)...done.
Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0xb771d2e9 in malloc () from /lib/libc.so.6

Comment 5 SpanKY gentoo-dev 2010-08-24 02:05:16 UTC
what does `cat /proc/self/status` show ?

also, try running `/etc/init.d/watchdog --debug stop >& log` and post that log as an attachment
Comment 6 mario 2010-08-24 07:35:39 UTC
# cat /proc/self/status
Name:   cat
State:  R (running)
Tgid:   7205
Pid:    7205
PPid:   7049
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups: 0 1 2 3 4 6 10 11 20 26 27
VmPeak:     1780 kB
VmSize:     1780 kB
VmLck:         0 kB
VmHWM:       236 kB
VmRSS:       236 kB
VmData:      188 kB
VmStk:       132 kB
VmExe:        40 kB
VmLib:      1392 kB
VmPTE:        12 kB
VmSwap:        0 kB
Threads:        1
SigQ:   0/8048
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed:   1
Cpus_allowed_list:      0
voluntary_ctxt_switches:        0
nonvoluntary_ctxt_switches:     2



Added the log-attachment.

I did a CTRL-C after a few seconds, before the log was getting too big.
Comment 7 mario 2010-08-24 07:36:48 UTC
Created attachment 244341 [details]
outpug of `/etc/init.d/watchdog --debug stop >& log`
Comment 8 SpanKY gentoo-dev 2010-08-24 19:25:39 UTC
that trace is helpful.  it shows that the code hanging isnt in the watchdog or baselayout.  it's a function called get_config which is called from get_delay.  i noticed you have bootchart enabled ... i'd suggest you `emerge -C` that package and see if things work better.

not that i can find what package exactly is declaring these functions as they dont appear to be part of bootchart either.
Comment 9 mario 2010-08-24 19:46:51 UTC
removing bootchart did not help.
But I took a deeper look in the /etc/init.d/watchdog file.


The functions get_config and get_delay are defined there.

if I remove "--retry $(get_delay)" from the stop command in


stop() {
        ebegin "Stopping watchdog"
        start-stop-daemon --stop \
                --exec /usr/sbin/watchdog --pidfile /var/run/watchdog.pid
                --retry $(get_delay)
        eend $?
}



Then watchdog sometimes get stopped, sometimes not.

I guess that for the "sometimes not" there is the delay function which does not work as it should.


When the stopping fails, watchdog still gets stopped a bit later it seams.
Comment 10 SpanKY gentoo-dev 2010-08-24 20:18:20 UTC
yeah ok, i'm dumb.  you're right of course.  try this patch:

--- files/watchdog-init.d       16 May 2009 16:59:26 -0000      1.2
+++ files/watchdog-init.d       24 Aug 2010 20:18:07 -0000
@@ -15,6 +15,7 @@ get_config() {
                        echo $2
                        return
                fi
+               shift
        done
        echo /etc/watchdog.conf
 }
Comment 11 mario 2010-08-24 20:53:58 UTC
yaaaaaaaaaaaa!

It seems to work now! :-)
Comment 12 SpanKY gentoo-dev 2010-08-24 21:02:09 UTC
thanks for testing ... ive committed it.  there's a new version out i believe, so i'll save the rev bump for that.

http://sources.gentoo.org/sys-apps/watchdog/files/watchdog-init.d?r1=1.2&r2=1.3