Under heavy load onb a Cygwin machine, I have seen the error message for removing the old-style lock displayed (when it should not be displayed). This is because the ln -s locking method fails because ln -s in not atomic. The following patch provides what I believe is a safer locking mechanism. The check for and removal of a stale lock in the current version still has a race condition. I have altered the timing of the code to further reduce this race, but not eliminate it. This type of operation (check for and removal of a stale lock) is always difficult. Thanks, ...Karl --- keychain-orig 2005-03-10 10:33:18.000000000 -0800 +++ keychain 2005-03-30 14:18:56.205011200 -0800 @@ -322,41 +322,38 @@ tl_faking=true tl_start=0 tl_end=`expr $lockwait \* 10` tl_current=0 fi # Try to lock for $lockwait seconds while [ $lockwait -eq 0 -o $tl_current -lt $tl_end ]; do - if tl_error=`ln -s $$ "$lockf" 2>&1`; then + if tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1`; then havelock=true return 0 fi # advance our timer if [ $lockwait -gt 0 ]; then if $tl_faking; then tl_current=`expr $tl_current + 1` else tl_current=`now $tl_current` fi fi # check for old-style lock; unlikely - if [ -f "$lockf" ]; then + if [ -h "$lockf" ]; then error "please remove old-style lock: $lockf" return 1 fi # read the lock - tl_pid=`readlink "$lockf" 2>/dev/null` - if [ -z "$tl_pid" ]; then - tl_pid=`ls -l "$lockf" 2>/dev/null | awk '{print $NF}'` - fi + tl_pid=`cat "$lockf" 2>/dev/null` if [ -z "$tl_pid" ]; then # lock seems to have disappeared, try again continue fi # test for a stale lock kill -0 "$tl_pid" 2>/dev/null if [ $? != 0 ]; then @@ -365,18 +362,16 @@ # checked, then go ahead and remove the stale lock. Otherwise # remember the pid and try again. if [ "$tl_pid" = "$tl_oldpid" ]; then warn "removing stale lock for pid $tl_pid" rm -f "$lockf" else tl_oldpid="$tl_pid" fi - # try try again - continue fi # sleep for a bit to wait for the keychain process holding the lock sleep 0.1 >/dev/null 2>&1 && continue perl -e 'select(undef, undef, undef, 0.1)' \ >/dev/null 2>&1 && continue # adjust granularity of tl_current stepping $tl_faking && tl_current=`expr $tl_current + 9`
Created attachment 54902 [details, diff] improved patch to avoid the race in removing a stale lock
Created attachment 54908 [details, diff] improved patch...don't release the lock unless we have it
Karl, I finally integrated this patch into keychain-2.5.4, now released. I found a few problems in your patch, though... First, there is no need to subshell inside backquotes. The backquotes themselves introduce a subshell. So `(do something)` is not required, even if you want the umask to be restored following the backquoted expression. Second, you had tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1` which redirects stderr after redirecting stdout. The result is that stderr is lost instead of landing in tl_error. I changed this to tl_error=`umask 0377; echo $$ 2>&1 >"$lockf"` Third, you check for an old-style lock after you attempt to acquire the lock. This is a problem because the old-style lock in this case is a symlink, so you will find that you are able to write to the file! I moved the check for an old-style lock above the loop since there's really no reason to check it every time through the loop. I also renamed the file-based lock so that there's no chance of accidentally writing through the symlink. Fourth, I added a check for an empty lockfile. It would be an aberrant case, but the code as you had written it would spin on an empty lockfile and never be resolved. Or perhaps not so aberrant, since creating and writing to a lockfile is not an atomic operation any more than creating a symbolic link. One keychain could be scheduled out between file creation and writing. So it's good to catch this case. Fifth, I believe the lockr mechanism is unnecessary. Since the stale lock isn't removed until a second pass through the loop (using the $tl_oldpid check), there is already a read-test-read-test going on that should eliminate the chance of incorrectly removing a lock. Thanks for your contribution. You can see my final patch at http://www.gentoo.org/cgi-bin/viewcvs.cgi/keychain/keychain.sh?r1=1.20&r2=1.21&root=gentoo-src