170802 – sys-process/cronbase - run-crons script to resume cancelled cron-runs

Bug 170802 - sys-process/cronbase - run-crons script to resume cancelled cron-runs

Summary: sys-process/cronbase - run-crons script to resume cancelled cron-runs

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	New packages (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Cron Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	169449
	Show dependency tree

Reported:	2007-03-13 23:30 UTC by Christof Schulze
Modified:	2008-06-26 10:59 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
run-crons script (run-crons,3.61 KB, text/plain) 2007-03-14 03:58 UTC, Christof Schulze	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Christof Schulze 2007-03-13 23:30:59 UTC

I patched the run-crons script so that it is able to detect if a cron-job was not  run completely. It will then re-run it until it finishes with exit code 0.

especially useful for machines with frequent reboots.

If you have concerns about this, please let me know. I'd love this to be more official since it has proven useful to me.


Reproducible: Always




#!/bin/bash
#
# $Header: /var/cvsroot/gentoo-x86/sys-process/cronbase/files/run-crons-0.3.2,v 1.1 2005/03/09 12:51:34 ka0ttic Exp $
#
# 08 Mar 2005; Aaron Walker <ka0ttic@gentoo.org> run-crons:
#     Ignore the error messages from find caused by race conditions, since
#     we could care less about the error as long as the file has been removed.
#     See bug 8506.
#
# 06 May 2004; Aron Griffis <agriffis@gentoo.org> run-crons:
#     Make the locking actually work.  The old code was racy.
#     Thanks to Mathias Gumz in bug 45155 for some cleanups.
#
# 23 Jun 2002; Jon Nelson <jnelson@gentoo.org> run-crons:
#     fixed a race condition, where cron jobs and run-crons wanted to
#     delete touch files
#
# 20 Apr 2002; Thilo Bangert <bangert@gentoo.org> run-crons:
#     moved lastrun directory to /var/spool/cron/lastrun
#
# Author: Achim Gottinger <achim@gentoo.org>
#
# Mostly copied from SuSE
#
# this script looks into /etc/cron.[hourly|daily|weekly|monthly]
# for scripts to be executed. The info about last run is stored in
# /var/spool/cron/lastrun

LOCKDIR=/var/spool/cron/lastrun
LOCKFILE=${LOCKDIR}/lock

mkdir -p ${LOCKDIR}

trap 'trapfunc' 1 2 3 6 9 13 14 15


function trapfunc() {
	rm -f "${LOCKFILE}"
	exit 1
}

# Make sure we're not running multiple instances at once.
# Try twice to lock, otherwise give up.
for ((i = 0; i < 2; i = i + 1)); do
    ln -sn $$ ${LOCKFILE} 2>/dev/null && break

	# lock failed, check for a running process.
	# handle both old- and new-style locking.
	cronpid=$(readlink ${LOCKFILE} 2>/dev/null) ||
	cronpid=$(cat ${LOCKFILE} 2>/dev/null) ||
	continue	# lockfile disappeared? try again

	# better than kill -0 because we can verify that it's really
	# another run-crons process
	if [[ $(</proc/${cronpid}/cmdline) == $(</proc/$$/cmdline) ]] 2>/dev/null; then
		# whoa, another process is really running
		exit 0
	else
		rm -f ${LOCKFILE}
	fi
done

# Check to make sure locking was successful
if [[ ! -L ${LOCKFILE} ]]; then
	echo "Can't create or read existing ${LOCKFILE}, giving up"
	exit 1
fi

function isnotcomplete() {
	# checks if all jobs for the lockfile $@ were completed
	# returns 0 if there are cron jobs that were scheduled but not run eg 
	# due to downtime
	
	local SCRIPT
	
	# if lockfile does not exist the cron run was not complete for sure
	[[ -e ${LOCKDIR}/cron.$BASE ]] || {
		echo 0
		return
	}

	for SCRIPT in $CRONDIR/*
	do
		# if at least one script was not run - exit
		[[ $(grep -c $SCRIPT ${LOCKDIR}/cron.$BASE) -eq 0 ]] && {
			echo 0
			return
		}
	done

	echo 1
	return 1
}

for BASE in hourly daily weekly monthly
do
    CRONDIR=/etc/cron.${BASE}

    test -d $CRONDIR || continue

    if [ -e ${LOCKDIR}/cron.$BASE ]
    then
	case $BASE in
	    hourly)
		#>= 1 hour, 5 min -=> +65 min
		TIME="-cmin +65" ;;
	    daily)
		#>= 1 day, 5 min -=> +1445 min
		TIME="-cmin +1445"  ;;
	    weekly)
		#>= 1 week, 5 min -=> +10085 min
		TIME="-cmin +10085"  ;;
	    monthly)
		#>= 31 days, 5 min -=> +44645 min
		TIME="-cmin +44645" ;;
	esac
        find ${LOCKDIR} -name cron.$BASE $TIME -exec rm {} \; &>/dev/null || true
    fi

    # if there is no touch file, make one then run the scripts
    [ "$(isnotcomplete cron.$BASE)" -eq 0 ] && {
	touch ${LOCKDIR}/cron.$BASE

        set +e
        for SCRIPT in $CRONDIR/*
        do
            if [[ -x $SCRIPT && ! -d $SCRIPT ]]; then
		[[ $(grep -c $SCRIPT ${LOCKDIR/cron.$BASE}) -lt 1 ]] && {
			$SCRIPT && echo $SCRIPT >>${LOCKDIR}/cron.$BASE
		}
            fi
        done   
    }
done

# Clean out bogus cron.$BASE files with future times
touch ${LOCKDIR}
find ${LOCKDIR} -newer ${LOCKDIR} -exec /bin/rm -f {} \; &>/dev/null || true
rm -f "${LOCKFILE}"

Comment 1 Christof Schulze 2007-03-14 03:58:56 UTC

Created attachment 113236 [details]
run-crons script

Comment 2 Thilo Bangert (RETIRED) (RETIRED) gentoo-dev

2007-03-14 07:16:31 UTC

please attach your changes as a patch...
thanks!

Comment 3 Christof Schulze 2007-03-14 12:48:48 UTC

> 
> 
> function trapfunc() {
>       rm -f "${LOCKFILE}"
>       exit 1
> }
> 
61,62c68
< # Set a trap to remove the lockfile when we're finished
< trap "rm -f ${LOCKFILE}" 0 1 2 3 15
---
> trap 'trapfunc' 1 2 3 6 9 13 14 15
63a70,94
> function isnotcomplete() {
>       # checks if all jobs for the lockfile $@ were completed
>       # returns 0 if there are cron jobs that were scheduled but not run eg 
>       # due to downtime
> 
>       local SCRIPT
> 
>       # if lockfile does not exist the cron run was not complete for sure
>       [[ -e ${LOCKDIR}/cron.$BASE ]] || {
>               echo 0
>               return
>       }
> 
>       for SCRIPT in $CRONDIR/*
>       do
>               # if at least one script was not run - exit
>               [[ $(grep -c $SCRIPT ${LOCKDIR}/cron.$BASE) -eq 0 ]] && {
>                       echo 0
>                       return
>               }
>       done
> 
>       echo 1
>       return 1
> }
89c120
< 
---
>     
91,93c122,123
<     if [ ! -e ${LOCKDIR}/cron.$BASE ]
<     then
<         touch ${LOCKDIR}/cron.$BASE
---
>     [ "$(isnotcomplete cron.$BASE)" -eq 0 ] && {
>       touch ${LOCKDIR}/cron.$BASE
99c129,131
<                 $SCRIPT
---
>               [[ $(grep -c $SCRIPT ${LOCKDIR}/cron.$BASE) -lt 1 ]] && {
>                       $SCRIPT && echo $SCRIPT >>${LOCKDIR}/cron.$BASE
>               }
101,102c133,134
<         done
<     fi
---
>         done   
>     }
107a140
> rm -f "${LOCKFILE}"

Comment 4 Thilo Bangert (RETIRED) (RETIRED) gentoo-dev

2007-03-14 20:43:52 UTC

sorry for not being explicit: a unified diff (diff -u) as an attachment would help us a lot - in fact i think it is the preferred way.

anyway - this will do this time around.

Comment 5 Christof Schulze 2007-04-15 11:39:55 UTC

I do not want to get on your nerves because of this, but its a rather small change and there has been a cron release.
What can I do to contribute in order for this code to be accepted?

Comment 6 Florian D. 2007-06-24 14:37:26 UTC

(In reply to comment #0)
> I patched the run-crons script so that it is able to detect if a cron-job was
> not  run completely. It will then re-run it until it finishes with exit code 
> 0.

with the provided script, I still get an email every 10 minutes:

Job /usr/bin/test -x /usr/sbin/run-crons && /usr/sbin/run-crons terminated (exit status: 1) (mailing output)

I´m using fcron-3.0.2-r1

Comment 7 Christof Schulze 2007-06-24 17:29:41 UTC

which is absolutely correct. The Mail informs you that your cron-script exited with the exit-code 1. Please check your cron scripts for the exit status.

Of course a bug in the script is not completely unlikely, yet, it does exactly what is intended here, so I think it is more likely that your cron script (or mail command?) is broken.

Comment 8 SpanKY gentoo-dev

2007-06-24 18:43:45 UTC

lemme see if i get this ... you think cron should continue executing the same command over and over until it finishes with 0 exit status ?

Comment 9 Christof Schulze 2007-06-24 19:52:10 UTC

I do not want to state this in general, but there should be a way to determine if a job was complete or not.
If you just want to run a job and don't care if it is complete/runs without errors, then add exit 0 to the script.

Comment 10 SpanKY gentoo-dev

2007-06-24 19:55:16 UTC

in other words, your answer is "yes"

this is a break in convention from how cronjobs have always been handled and i really think it's pretty unintuitive

plus i dont see how this addresses the "i just rebooted in the middle of a cronjob run" issue you stated in the original comment

Comment 11 Christof Schulze 2007-06-24 20:13:56 UTC

please be constructive about this issue. What would be more intuitive?

Just to clarify my point of view:
Yes, earlier cronjobs were handled differently but personally I find it a lot better to get this kind of feedback if the script was successful or not. The user  should know if there was an error.

this addresses the issue because when terminating a cronjob via sigterm/sigkill, the job will usually not exit with exit code 0 therefore it will be re-started after the next reboot.

Comment 12 SpanKY gentoo-dev

2007-06-24 20:38:46 UTC

if you cant handle alternative viewpoints, then dont post bugs

there is a system for reporting errors ... it's called e-mail.  failed cronjobs send out e-mails to the administrator and that administrator checks the output.

simply putting "exit 0" at the end of the script accomplishes nothing as that doesnt take into consideration unexpected crashes in the script ... in other words, changing long standing behavior which affects.

this could also introduce subtle problems for people ... you cant assume anything about the script and whether it could be re-run so soon without admin intervention

that your proposed idea could address is half run sets of cronjobs ... in other words, if a cron set (say "daily") has twenty jobs and you reboot in the middle, then the restart would allow you to execute the ones that hadnt even been started yet

Comment 13 Christof Schulze 2007-06-25 17:24:55 UTC

> if you cant handle alternative viewpoints, then dont post bugs
no need to get offensive. I just wanted a hint of what to do instead of a comment like "this does not work". And I would certainly still like to read your ideas about this.

> there is a system for reporting errors ... it's called e-mail.  failed cronjobs
> send out e-mails to the administrator and that administrator checks the output.
The standard cronjobs that come with the most gentoo-packages (makewhatis, slocate etc) do not do this. They just bail out with an exit code. So its imho arguable whether email that were generated by each cronscript are the "standard" for notifications. Instead cron tells the user by combining the output of all jobs into one Mail. Nothing in the implementation changes anything concerning that scheme.

> simply putting "exit 0" at the end of the script accomplishes nothing as that
> doesnt take into consideration unexpected crashes in the script ... in other
> words, changing long standing behavior which affects.
Yes, to do it properly you should also use a trap. The point is: usually one wants to have his cronjobs run successfully so imho it makes sense to evaluate the status of the commands that were just run.

> this could also introduce subtle problems for people ... you cant assume
> anything about the script and whether it could be re-run so soon without admin
> intervention
I can see a problem there - but what happens is usually one crafts the cronscripts so they end with a proper exit code. If not then you will find out within a very short time that you should. Imho it can be expected that if a cronjob fails then this is some kind of temporary error. Otherwise one would not like to have it run in the unsupervised way cron runs things.
 
> that your proposed idea could address is half run sets of cronjobs ... in other
> words, if a cron set (say "daily") has twenty jobs and you reboot in the
> middle, then the restart would allow you to execute the ones that hadnt even
> been started yet
That is one part of it. I certainly would like to see both parts solved (also the killed cronjob part). 

In my opinion the approach I implemented is better in a some ways compared to the old approach. But changing things that people got used to over time should not be done hastily. I can see your point but it would be interesting to me what others think about this too.

Comment 14 Thilo Bangert (RETIRED) (RETIRED) gentoo-dev

2008-06-26 10:59:33 UTC

i am with vapier on this one. changing the current behavior of cron is big no go.

so - having established that: is there anything left on this bug that needs fixing? please reopen then. thanks.