01:11 <@radek> looks like that our /usr/sbin/run-crons does locking in a way that if for example anything from cron.daily/* runs longer than 60 minutes, then subsequent cron.hourly/* scripts are not being run. 01:12 <@radek> in other words, as long as cron.daily runs, no other cron.hourly (weekly or montlhy) will be run. this applies to all of them. 01:12 <@radek> should we consider it a bug? 01:12 <@radek> or feature? or maybe my conclusion is wrong? 01:12 <@beu> i would say that this is definitely bug worthy. 01:13 <@radek> for me it is. it's caused by fact that locking is global for whole run-crons while it should be separated for all classes (hourly,weekly, etc).
Guys, any progress on it ?
i had rewritten it on my machine when you bugged me on irc, but my src hard drive crashed taking the changes with it :/
Ouch, I missed two year anniversary ;-)
vapier: could you outline your new implementation? maybe i'll get around to code it...
ping
Guys, this is a CRITICAL bug. It results in ANY script in cron.hourly not being executed, as long as script from cron.daily is being executed (and takes longer than an hour). Other cross-locking issues are also caused by this bug, due to the fact that there is SINGLE lock for every type of cron (hourly, daily, weekly, monthly). This problem is generating _many_ different and weird to debug problems because of scripts in cron NOT being run, while there is nothing wrong with them. Noone is attributing those problems here, because its hard to debug.
Created attachment 280333 [details, diff] First draft of patch for per-base lockfiles Would something like this be suitable? Am open to suggestions/comments. :)
Hmm, indeed, I had a cron job unexpectedly hang on me, resulted in no cron jobs being run for two weeks until I noticed something was up as normal e-mails were not coming in anymore. I liked the silence :) but obviously some sort of longer term dysfunctionality notification mechanism would be nice at the very least.
doing per-dir locks really changes the problem from "jobs can starve all dirs indefinitely" to "jobs can starve their own class indefinitely". so if you screw up cron.daily, instead of blocking all of hourly/daily/weekly/monthly, you block cron.daily forever. checking Ubuntu, they have the same issue: they have a sep job line for each cron.xxx dir (just run `run-parts` on it), so if one cron.hourly script gets hung up, then no more cron.hourly job will run. but cron.daily and such get to keep running independently. if anacron is in use for launching the cron.xxx jobs, it has the same issue. no timeout is applied and they just let each category run forever. Fedora appears to behave the same here -- they use anacron w/out timeouts. adding per-category locks should be cheap and at least let us have parity with other distros. we can then look at adding a timeout option where every script is run through `timeout` and with a value befitting its category.
should be all set now in the tree; thanks for the report! Commit message: Split global lock up into one lock per /etc/cron.xxx dir http://sources.gentoo.org/sys-process/cronbase/cronbase-0.3.7.ebuild?rev=1.1 http://sources.gentoo.org/sys-process/cronbase/files/run-crons-0.3.7?rev=1.1