Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 551900 - sys-apps/watchdog-5.14 clamps hardware timeout to 254s but not the interval
Summary: sys-apps/watchdog-5.14 clamps hardware timeout to 254s but not the interval
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal major
Assignee: Gentoo's Team for Core System packages
URL: https://sourceforge.net/p/watchdog/bu...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-12 16:53 UTC by Jan Fikar
Modified: 2015-07-13 06:05 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Fikar 2015-06-12 16:53:00 UTC
Today after upgrade of sys-apps/watchdog-5.14 and /etc/init.d/watchdog restart my machine was rebooted multiple times.

My /etc/watchdog.conf was like this:

max-load-1              = 512
min-memory              = 1
allocatable-memory      = 1
watchdog-device         = /dev/watchdog
watchdog-timeout        = 1000
interval                = 400
realtime                = yes
priority                = 1
pidfile                 = /var/run/sshd.pid


It seems, that the new watchdog-5.14 does not correctly set watchdog-timeout to requested 1000s, but instead it sets it to maximum of 254s as can be seen by (luckily I have ipmi watchdog)

>ipmitool mc watchdog get
Watchdog Timer Use:     SMS/OS (0x44)
Watchdog Timer Is:      Started/Running
Watchdog Timer Actions: Power Cycle (0x03)
Pre-timeout interval:   0 seconds
Timer Expiration Flags: 0x10
Initial Countdown:      254 sec
Present Countdown:      252 sec


It worked for sure with 1000s and watchdog-5.13-r1.

Temporary workaround is to set interval lower than about half of 254s, so I have it 100s now.
Comment 1 SpanKY gentoo-dev 2015-07-08 17:32:25 UTC
the change to add a limit of 254 happened here:
  http://sourceforge.net/p/watchdog/code/ci/12583e81eaa093dc1224df08c7de62541142c6c2/

although it's confusingly (wrongly?) listed as a "readability" commit

later on the limit has been raised to 600 seconds:
  http://sourceforge.net/p/watchdog/code/ci/1eee507a1fb7eb6a13a11816ed999b0271f3c613/

either way, it's clearly wrong to silently do a min() on the timeout and ignore the interval/etc...  i've reported this upstream so let's see what they have to say.
Comment 2 Jan Fikar 2015-07-09 13:13:35 UTC
Let's see, what they do. IMHO they should drop the max().

Seems, it is not clear for them, why would someone use large values of timeout and interval, so I'll try to explain my point of view, maybe it helps:

1. I don't want small watchdog-timeout, I don't care if the server is down for 15-30 minutes, I can wait. What is important is that meanwhile I have time to see something on the KVM console maybe. The default 60s gives me almost no time to do so even if I realize the server is dead immediately (which is seldom the case).

2. If watchdog-timeout is 1000s I feel that the default interval 1s is just waste of electric power. Better let the CPUs sleep. It theoretically would be enough to set it to something like 998s, but in reality I saw that sometimes single interval is missed somewhere (e.g. large load?) and you have an unwanted reboot. So I ended up with very stable formula:

interval = watchdog-timeout/2.5
Comment 3 Jan Fikar 2015-07-09 14:10:38 UTC
I meant "..drop the min()" of course :)
Comment 4 SpanKY gentoo-dev 2015-07-13 06:05:25 UTC
latest upstream git repo has deleted the max limit if you want to give it a try.  looks pretty straight forward to me.