98189 – sys-process/cronbase: run-crons race conditions and other deficiencies

Bug 98189 - sys-process/cronbase: run-crons race conditions and other deficiencies

Summary: sys-process/cronbase: run-crons race conditions and other deficiencies

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All Linux

Importance:	High minor
Assignee:	Cron Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	169449
	Show dependency tree

Reported:	2005-07-07 00:45 UTC by Jaco Kroon
Modified:	2015-07-22 08:00 UTC (History)
CC List:	4 users (show)

See Also:	157547
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jaco Kroon 2005-07-07 00:45:22 UTC

There are at least one race condition between the default crontab and
/usr/sbin/run-crons.  This comes from the find ${LOCKDIR} -name cron.${BASE}
-exec rm {} \;, if the file gets rm'ed by one of the other entries in
/etc/crontab between it being found and -exec then I get bogus emails (500
machines mailing me with this is irritating to say the least).

Additionally there are other problems.  For example, say I need to modify my
system time back over the hour, then run-crons will rm that file and re-run
those crons.  This isn't a problem in most cases but it is deffinately bad.

A much simpler, and better solution imho is to use the run-parts program as
already suggested in two other bugs regarding run-crons.  This is as simple as
replacing

0  *  * * *     root    rm -f /var/spool/cron/lastrun/cron.hourly
1  3  * * *     root    rm -f /var/spool/cron/lastrun/cron.daily
15 4  * * 6     root    rm -f /var/spool/cron/lastrun/cron.weekly
30 5  1 * *     root    rm -f /var/spool/cron/lastrun/cron.monthly
*/10  *  * * *  root    test -x /usr/sbin/run-crons && /usr/sbin/run-crons 

in /etc/crontab with

0  *  * * *     root    /bin/run-parts /etc/cron.hourly
1  3  * * *     root    /bin/run-parts /etc/cron.daily
15 4  * * 6     root    /bin/run-parts /etc/cron.weekly
30 5  1 * *     root    /bin/run-parts /etc/cron.monthly

I suppose this lacks some locking, but even that is not foolproof in the
existing run-crons, the trap line

trap "rm -f ${LOCKFILE}" 0 1 2 3 15

can be replaced with

trap "rm -f ${LOCKFILE}" EXIT

to effectively cover all cases.  One can also use the dotlockfile utility to
create lock files:

LOCKFILE=${LOCKDIR}/lock
dotlockfile -l -r 2 -p "${LOCKFILE}" || exit $?
trap "rm -f ${LOCKFILE}" EXIT

This still has a small race-condition where the lock file can be created and we
get killed, but it really is miniscule.  This is what -p is for though, it does
the pid-checking thing.

Reproducible: Always
Steps to Reproduce:

Comment 1 SpanKY gentoo-dev

2005-07-07 05:48:14 UTC

that would mean that every cron package would require debian-utils ... and as
we've already seen, run-parts imposes obscure naming limitations

Comment 2 Jaco Kroon 2005-07-07 07:33:09 UTC

Fair enough.  Then at least add a &>/dev/null onto the end of the find command
to supress the obvious errors.

Also, to emulate the not running crons twice effect only delete files that are
more than three hours in the future.  Why three hours?  Well, from cron(8):

"Local time changes of less than three hours, such as those caused by the start
or end of Daylight Saving Time, are handled specially.   This only  applies  to
 jobs that run at a specific time and jobs that are run with a granularity
greater than one hour.  Jobs that run more frequently are scheduled normally."

It then proceeds to explain what happens with forward and backword movements as
well as adjustments greater than three hours.

As for "every cron would require debianutils", the following is provided by
"qpkg -q debianutils":

DEPENDED ON BY:
        app-portage/gentoolkit-0.2.0
        sys-apps/portage-2.0.51.19

So I don't think that is a problem.  As for the obscure naming restrictions:

files that end in a tilde:  These are generated by some editors as backup files.

.bak:  same thing.

.dpkg-*:  These are debian spesific and I can't see any reason why we should
ever encounter them

.pre_fcopy and .notslocate:  not sure what these are for but I really can't see
how we will ever encounter them.

.disabled:  this makes sense to ignore.

As for the fact that they should be executable, that imho makes a great deal of
sense.

Anyhow, just a suggestion.  Something to chew on.

Comment 3 SpanKY gentoo-dev

2005-07-07 08:08:56 UTC

read the run-parts manpage

by default it will only respect files that match ^[a-zA-Z0-9_-]$ ... that means
it ignores files that have a . or a space in them ... i'm not aware of cron
enforcing such restrictions

Comment 4 Jaco Kroon 2005-07-10 00:06:20 UTC

Ooh ok.  I must've been skimming again.  That is not good.  Ok.

The question then becomes what effects does that race cause, well, there are
three situations (taking the hourly crons as an example):

1.  rm's the lockfile before find finds it.
2.  rm's the lockfile after find found it but before -exec kicks in.
3.  rm's the lockfile after find rm'ed it.

No 1 is the desired behaviour.  However, as I can note from some of the mails
that get sent to me by hourly crons they sometimes runs on the hour, and
sometimes 10 minutes after the hour.  So rather than fix the race condition,
here is an alternative that avoids it.  Don't rm the files when the cron is
intended to be run, rm it one minute before.  For example, rm the file for
hourly crons 1 minute before the hour, as in:

59  *  * * *     root    rm -f /var/spool/cron/lastrun/cron.hourly

Since run-crons is only run every 10 minutes it will be run on the hour, and
since there is a minute between the two being executed there is no way (except
perhaps on an extremely busy machine that takes more than a minute to schedule a
task) that these two will ever interfere.

I do note that daily and weekly already has something to this effect, and on
monthly 10 minutes isn't going to make that big a difference, but I'd suggest
changing the 30 for monthly to 29 in any case.

Wasn't the granularity of run-crons /15 at some point?

Comment 5 Jaco Kroon 2008-05-31 23:06:41 UTC

Not fixed at all, but seemingly no one is going to actually do something about this, in spite of some dead-simple solutions provided.

Comment 6 SpanKY gentoo-dev

2008-06-01 01:15:27 UTC

closing the bug wont make it anymore "fixed"

Comment 7 Jaco Kroon 2008-06-01 11:56:06 UTC

You're absolutely correct, it won't.  However, it's been open for three years.  So that really suggests a WONTFIX.

Comment 8 Thilo Bangert (RETIRED) (RETIRED) gentoo-dev

2009-04-13 20:38:52 UTC

i will put in this for all system crontabs installed by the various crons...

--- crontab     26 Aug 2007 20:03:07 -0000      1.2
+++ crontab     13 Apr 2009 20:37:32 -0000
@@ -11,8 +11,8 @@
 HOME=/

 # check scripts in cron.hourly, cron.daily, cron.weekly and cron.monthly
-0  *  * * *    root    rm -f /var/spool/cron/lastrun/cron.hourly
-1  3  * * *    root    rm -f /var/spool/cron/lastrun/cron.daily
-15 4  * * 6    root    rm -f /var/spool/cron/lastrun/cron.weekly
-30 5  1 * *    root    rm -f /var/spool/cron/lastrun/cron.monthly
+59  *  * * *   root    rm -f /var/spool/cron/lastrun/cron.hourly
+9  3  * * *    root    rm -f /var/spool/cron/lastrun/cron.daily
+19 4  * * 6    root    rm -f /var/spool/cron/lastrun/cron.weekly
+29 5  1 * *    root    rm -f /var/spool/cron/lastrun/cron.monthly
 */10  *  * * * root    test -x /usr/sbin/run-crons && /usr/sbin/run-crons

that should make the mentioned race conditions a lot less likely.
objections?

Comment 9 Jaco Kroon 2009-04-13 21:17:48 UTC

No objections from my side.  Chances that the rm isn't scheduled in a minute is next to none (and if it doesn't get scheduled you've got much bigger problems than missing a cron).

Comment 10 SpanKY gentoo-dev

2009-04-13 23:53:38 UTC

might as well do it right the first time ...

as such, getting rid of the notion of "file exists and thus is a lock" is where we should start.  there is a `flock` utility we should leverage which is installed by util-linux and that gives us perfect control against races.  and it pushes the stale lock problem off into the kernel -- when the process dies/exits, the kernel releases the file lock.

Comment 11 Thilo Bangert (RETIRED) (RETIRED) gentoo-dev

2009-05-12 09:23:55 UTC

changed default system crontab for vixie-cron, bcron, fcron, dcron and cronie.

spanky: looking forward to the flock solution, though.

Comment 12 Jaco Kroon 2009-05-12 13:09:37 UTC

(In reply to comment #10)
> as such, getting rid of the notion of "file exists and thus is a lock" is where
> we should start.  there is a `flock` utility we should leverage which is
> installed by util-linux and that gives us perfect control against races.  and
> it pushes the stale lock problem off into the kernel -- when the process
> dies/exits, the kernel releases the file lock.

No, the file in this case is not a lock file, it's an indicator of when last the crons has run.  I originally recommended a run-parts solution, but alas, there were two GOOD reasons for NOT using it.  One is the naming limitations, and the other is described in the next paragraph.

The idea with the file is that we "touch" the file whenever the crons is run, and rm it at certain times, allowing the fact that it doesn't exist to allow the crons to run again.  This has the advantage that if cron misses a run (due to not being running or the server being off) it'll run at a later stage (ok, the rm didn't run either in this case, but this is what lines 65 thru 87 in /usr/sbin/run-crons takes care of).  It's that piece of code that was racing with the rm's at fixed times.  No amount of flock will stop this race, all it'll do is clear out the error message as produced by line 86 of the script (already taken care of with &>/dev/null, could also simply have passed -f to the rm).

What Thilo has done is the correct solution, imho.  I'd recommend resolving this bug at this time.

Comment 13 Thilo Bangert (RETIRED) (RETIRED) gentoo-dev

2009-05-12 19:05:01 UTC

jaco - thanks for sharing your thoughts. your analysis is spot on. hence i am  resolving this bug at this time.

the flock solution is still needed though, due to all the other races in run-crons. see also bug #169449.

thanks for your contribution.

Comment 14 SpanKY gentoo-dev

2009-12-19 14:47:42 UTC

Thilo: your changes to dcron broke it.  dcron does not support the random extensions that vcron and such have implemented; it only supports POSIX.

http://sources.gentoo.org/sys-process/dcron/files/crontab?r1=1.2&r2=1.3