Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 87340
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: Keychain Team <keychain@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Karl Moerder <kmoerder@broadbandinnovations.com>
Add CC:
CC:
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
keychain-patch improved patch to avoid the race in removing a stale lock patch Karl Moerder 2005-03-30 17:27 0000 3.79 KB Details | Diff
keychain-patch improved patch...don't release the lock unless we have it patch Karl Moerder 2005-03-30 20:20 0000 4.43 KB Details | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 87340 depends on: Show dependency tree
Bug 87340 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2005-03-30 14:47 0000
Under heavy load onb a Cygwin machine, I have seen the error message for
removing the old-style lock displayed (when it should not be displayed). This
is because the ln -s locking method fails because ln -s in not atomic. The
following patch provides what I believe is a safer locking mechanism.

The check for and removal of a stale lock in the current version still has a
race condition. I have altered the timing of the code to further reduce this
race, but not eliminate it. This type of operation (check for and removal of a
stale lock) is always difficult.

Thanks,

...Karl

--- keychain-orig       2005-03-10 10:33:18.000000000 -0800
+++ keychain    2005-03-30 14:18:56.205011200 -0800
@@ -322,41 +322,38 @@
         tl_faking=true
         tl_start=0
         tl_end=`expr $lockwait \* 10`
         tl_current=0
     fi

     # Try to lock for $lockwait seconds
     while [ $lockwait -eq 0 -o $tl_current -lt $tl_end ]; do
-        if tl_error=`ln -s $$ "$lockf" 2>&1`; then
+        if tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1`; then
             havelock=true
             return 0
         fi

         # advance our timer
         if [ $lockwait -gt 0 ]; then
             if $tl_faking; then
                 tl_current=`expr $tl_current + 1`
             else
                 tl_current=`now $tl_current`
             fi
         fi

         # check for old-style lock; unlikely
-        if [ -f "$lockf" ]; then
+        if [ -h "$lockf" ]; then
             error "please remove old-style lock: $lockf"
             return 1
         fi

         # read the lock
-        tl_pid=`readlink "$lockf" 2>/dev/null`
-        if [ -z "$tl_pid" ]; then 
-            tl_pid=`ls -l "$lockf" 2>/dev/null | awk '{print $NF}'`
-        fi
+        tl_pid=`cat "$lockf" 2>/dev/null`
         if [ -z "$tl_pid" ]; then
             # lock seems to have disappeared, try again
             continue
         fi

         # test for a stale lock
         kill -0 "$tl_pid" 2>/dev/null
         if [ $? != 0 ]; then
@@ -365,18 +362,16 @@
             # checked, then go ahead and remove the stale lock.  Otherwise
             # remember the pid and try again.
             if [ "$tl_pid" = "$tl_oldpid" ]; then
                 warn "removing stale lock for pid $tl_pid"
                 rm -f "$lockf"
             else
                 tl_oldpid="$tl_pid"
             fi
-            # try try again
-            continue
         fi

         # sleep for a bit to wait for the keychain process holding the lock
         sleep 0.1 >/dev/null 2>&1 && continue
         perl -e 'select(undef, undef, undef, 0.1)' \
             >/dev/null 2>&1 && continue
         # adjust granularity of tl_current stepping
         $tl_faking && tl_current=`expr $tl_current + 9`

------- Comment #1 From Karl Moerder 2005-03-30 17:27:21 0000 -------
Created an attachment (id=54902) [details]
improved patch to avoid the race in removing a stale lock

------- Comment #2 From Karl Moerder 2005-03-30 20:20:40 0000 -------
Created an attachment (id=54908) [details]
improved patch...don't release the lock unless we have it

------- Comment #3 From Aron Griffis (RETIRED) 2005-05-11 17:11:24 0000 -------
Karl, I finally integrated this patch into keychain-2.5.4, now released.  I
found a few problems in your patch, though...

First, there is no need to subshell inside backquotes.  The backquotes
themselves introduce a subshell.  So `(do something)` is not required, even if
you want the umask to be restored following the backquoted expression.

Second, you had
   tl_error=`(umask 0377; echo $$ > "$lockf") 2>&1`
which redirects stderr after redirecting stdout.  The result is that stderr is
lost instead of landing in tl_error.  I changed this to
   tl_error=`umask 0377; echo $$ 2>&1 >"$lockf"`

Third, you check for an old-style lock after you attempt to acquire the lock. 
This is a problem because the old-style lock in this case is a symlink, so you
will find that you are able to write to the file!  I moved the check for an
old-style lock above the loop since there's really no reason to check it every
time through the loop.  I also renamed the file-based lock so that there's no
chance of accidentally writing through the symlink.

Fourth, I added a check for an empty lockfile.  It would be an aberrant case,
but the code as you had written it would spin on an empty lockfile and never be
resolved.  Or perhaps not so aberrant, since creating and writing to a lockfile
is not an atomic operation any more than creating a symbolic link.  One
keychain could be scheduled out between file creation and writing.  So it's
good to catch this case.

Fifth, I believe the lockr mechanism is unnecessary.  Since the stale lock
isn't removed until a second pass through the loop (using the $tl_oldpid
check), there is already a read-test-read-test going on that should eliminate
the chance of incorrectly removing a lock.

Thanks for your contribution.  You can see my final patch at

http://www.gentoo.org/cgi-bin/viewcvs.cgi/keychain/keychain.sh?r1=1.20&r2=1.21&root=gentoo-src

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug