Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 87340 - keychain locking fault
Summary: keychain locking fault
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: Keychain (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Keychain Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-30 14:47 UTC by Karl Moerder
Modified: 2005-05-11 17:11 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
improved patch to avoid the race in removing a stale lock (keychain-patch,3.79 KB, patch)
2005-03-30 17:27 UTC, Karl Moerder
Details | Diff
improved patch...don't release the lock unless we have it (keychain-patch,4.43 KB, patch)
2005-03-30 20:20 UTC, Karl Moerder
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Karl Moerder 2005-03-30 14:47:59 UTC
Under heavy load onb a Cygwin machine, I have seen the error message for removing the old-style lock displayed (when it should not be displayed). This is because the ln -s locking method fails because ln -s in not atomic. The following patch provides what I believe is a safer locking mechanism.

The check for and removal of a stale lock in the current version still has a race condition. I have altered the timing of the code to further reduce this race, but not eliminate it. This type of operation (check for and removal of a stale lock) is always difficult.

Thanks,

...Karl

--- keychain-orig	2005-03-10 10:33:18.000000000 -0800
+++ keychain	2005-03-30 14:18:56.205011200 -0800
@@ -322,41 +322,38 @@
         tl_faking=true
         tl_start=0
         tl_end=`expr $lockwait \* 10`
         tl_current=0
     fi
 
     # Try to lock for $lockwait seconds
     while [ $lockwait -eq 0 -o $tl_current -lt $tl_end ]; do
-        if tl_error=`ln -s $$ "$lockf" 2>&1`; then
+        if tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1`; then
             havelock=true
             return 0
         fi
 
         # advance our timer
         if [ $lockwait -gt 0 ]; then
             if $tl_faking; then
                 tl_current=`expr $tl_current + 1`
             else
                 tl_current=`now $tl_current`
             fi
         fi
 
         # check for old-style lock; unlikely
-        if [ -f "$lockf" ]; then
+        if [ -h "$lockf" ]; then
             error "please remove old-style lock: $lockf"
             return 1
         fi
 
         # read the lock
-        tl_pid=`readlink "$lockf" 2>/dev/null`
-        if [ -z "$tl_pid" ]; then 
-            tl_pid=`ls -l "$lockf" 2>/dev/null | awk '{print $NF}'`
-        fi
+        tl_pid=`cat "$lockf" 2>/dev/null`
         if [ -z "$tl_pid" ]; then
             # lock seems to have disappeared, try again
             continue
         fi
 
         # test for a stale lock
         kill -0 "$tl_pid" 2>/dev/null
         if [ $? != 0 ]; then
@@ -365,18 +362,16 @@
             # checked, then go ahead and remove the stale lock.  Otherwise
             # remember the pid and try again.
             if [ "$tl_pid" = "$tl_oldpid" ]; then
                 warn "removing stale lock for pid $tl_pid"
                 rm -f "$lockf"
             else
                 tl_oldpid="$tl_pid"
             fi
-            # try try again
-            continue
         fi
 
         # sleep for a bit to wait for the keychain process holding the lock
         sleep 0.1 >/dev/null 2>&1 && continue
         perl -e 'select(undef, undef, undef, 0.1)' \
             >/dev/null 2>&1 && continue
         # adjust granularity of tl_current stepping
         $tl_faking && tl_current=`expr $tl_current + 9`
Comment 1 Karl Moerder 2005-03-30 17:27:21 UTC
Created attachment 54902 [details, diff]
improved patch to avoid the race in removing a stale lock
Comment 2 Karl Moerder 2005-03-30 20:20:40 UTC
Created attachment 54908 [details, diff]
improved patch...don't release the lock unless we have it
Comment 3 Aron Griffis (RETIRED) gentoo-dev 2005-05-11 17:11:24 UTC
Karl, I finally integrated this patch into keychain-2.5.4, now released.  I found a few problems in your patch, though...

First, there is no need to subshell inside backquotes.  The backquotes themselves introduce a subshell.  So `(do something)` is not required, even if you want the umask to be restored following the backquoted expression.

Second, you had
   tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1`
which redirects stderr after redirecting stdout.  The result is that stderr is lost instead of landing in tl_error.  I changed this to
   tl_error=`umask 0377; echo $$ 2>&1 >"$lockf"`

Third, you check for an old-style lock after you attempt to acquire the lock.  This is a problem because the old-style lock in this case is a symlink, so you will find that you are able to write to the file!  I moved the check for an old-style lock above the loop since there's really no reason to check it every time through the loop.  I also renamed the file-based lock so that there's no chance of accidentally writing through the symlink.

Fourth, I added a check for an empty lockfile.  It would be an aberrant case, but the code as you had written it would spin on an empty lockfile and never be resolved.  Or perhaps not so aberrant, since creating and writing to a lockfile is not an atomic operation any more than creating a symbolic link.  One keychain could be scheduled out between file creation and writing.  So it's good to catch this case.

Fifth, I believe the lockr mechanism is unnecessary.  Since the stale lock isn't removed until a second pass through the loop (using the $tl_oldpid check), there is already a read-test-read-test going on that should eliminate the chance of incorrectly removing a lock.

Thanks for your contribution.  You can see my final patch at

http://www.gentoo.org/cgi-bin/viewcvs.cgi/keychain/keychain.sh?r1=1.20&r2=1.21&root=gentoo-src