87340 – keychain locking fault

Bug 87340 - keychain locking fault

Summary: keychain locking fault

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Hosted Projects
Classification:	Unclassified
Component:	Keychain (show other bugs)
Hardware:	All Linux

Importance:	High normal (vote)
Assignee:	Keychain Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-03-30 14:47 UTC by Karl Moerder
Modified:	2005-05-11 17:11 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
improved patch to avoid the race in removing a stale lock (keychain-patch,3.79 KB, patch) 2005-03-30 17:27 UTC, Karl Moerder	Details \| Diff
improved patch...don't release the lock unless we have it (keychain-patch,4.43 KB, patch) 2005-03-30 20:20 UTC, Karl Moerder	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Karl Moerder 2005-03-30 14:47:59 UTC

Under heavy load onb a Cygwin machine, I have seen the error message for removing the old-style lock displayed (when it should not be displayed). This is because the ln -s locking method fails because ln -s in not atomic. The following patch provides what I believe is a safer locking mechanism.

The check for and removal of a stale lock in the current version still has a race condition. I have altered the timing of the code to further reduce this race, but not eliminate it. This type of operation (check for and removal of a stale lock) is always difficult.

Thanks,

...Karl

--- keychain-orig	2005-03-10 10:33:18.000000000 -0800
+++ keychain	2005-03-30 14:18:56.205011200 -0800
@@ -322,41 +322,38 @@
         tl_faking=true
         tl_start=0
         tl_end=`expr $lockwait \* 10`
         tl_current=0
     fi
 
     # Try to lock for $lockwait seconds
     while [ $lockwait -eq 0 -o $tl_current -lt $tl_end ]; do
-        if tl_error=`ln -s $$ "$lockf" 2>&1`; then
+        if tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1`; then
             havelock=true
             return 0
         fi
 
         # advance our timer
         if [ $lockwait -gt 0 ]; then
             if $tl_faking; then
                 tl_current=`expr $tl_current + 1`
             else
                 tl_current=`now $tl_current`
             fi
         fi
 
         # check for old-style lock; unlikely
-        if [ -f "$lockf" ]; then
+        if [ -h "$lockf" ]; then
             error "please remove old-style lock: $lockf"
             return 1
         fi
 
         # read the lock
-        tl_pid=`readlink "$lockf" 2>/dev/null`
-        if [ -z "$tl_pid" ]; then 
-            tl_pid=`ls -l "$lockf" 2>/dev/null | awk '{print $NF}'`
-        fi
+        tl_pid=`cat "$lockf" 2>/dev/null`
         if [ -z "$tl_pid" ]; then
             # lock seems to have disappeared, try again
             continue
         fi
 
         # test for a stale lock
         kill -0 "$tl_pid" 2>/dev/null
         if [ $? != 0 ]; then
@@ -365,18 +362,16 @@
             # checked, then go ahead and remove the stale lock.  Otherwise
             # remember the pid and try again.
             if [ "$tl_pid" = "$tl_oldpid" ]; then
                 warn "removing stale lock for pid $tl_pid"
                 rm -f "$lockf"
             else
                 tl_oldpid="$tl_pid"
             fi
-            # try try again
-            continue
         fi
 
         # sleep for a bit to wait for the keychain process holding the lock
         sleep 0.1 >/dev/null 2>&1 && continue
         perl -e 'select(undef, undef, undef, 0.1)' \
             >/dev/null 2>&1 && continue
         # adjust granularity of tl_current stepping
         $tl_faking && tl_current=`expr $tl_current + 9`

Comment 1 Karl Moerder 2005-03-30 17:27:21 UTC

Created attachment 54902 [details, diff]
improved patch to avoid the race in removing a stale lock

Comment 2 Karl Moerder 2005-03-30 20:20:40 UTC

Created attachment 54908 [details, diff]
improved patch...don't release the lock unless we have it

Comment 3 Aron Griffis (RETIRED) gentoo-dev

2005-05-11 17:11:24 UTC

Karl, I finally integrated this patch into keychain-2.5.4, now released. I found a few problems in your patch, though...

First, there is no need to subshell inside backquotes. The backquotes themselves introduce a subshell. So `(do something)` is not required, even if you want the umask to be restored following the backquoted expression.

Second, you had
tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1`
which redirects stderr after redirecting stdout. The result is that stderr is lost instead of landing in tl_error. I changed this to
tl_error=`umask 0377; echo $$ 2>&1 >"$lockf"`

Third, you check for an old-style lock after you attempt to acquire the lock. This is a problem because the old-style lock in this case is a symlink, so you will find that you are able to write to the file! I moved the check for an old-style lock above the loop since there's really no reason to check it every time through the loop. I also renamed the file-based lock so that there's no chance of accidentally writing through the symlink.

Fourth, I added a check for an empty lockfile. It would be an aberrant case, but the code as you had written it would spin on an empty lockfile and never be resolved. Or perhaps not so aberrant, since creating and writing to a lockfile is not an atomic operation any more than creating a symbolic link. One keychain could be scheduled out between file creation and writing. So it's good to catch this case.

Fifth, I believe the lockr mechanism is unnecessary. Since the stale lock isn't removed until a second pass through the loop (using the $tl_oldpid check), there is already a read-test-read-test going on that should eliminate the chance of incorrectly removing a lock.

Thanks for your contribution. You can see my final patch at

http://www.gentoo.org/cgi-bin/viewcvs.cgi/keychain/keychain.sh?r1=1.20&r2=1.21&root=gentoo-src