Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 58137
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: Gentoo's Team for Core System packages <base-system@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Nahor <nahor.j+gentoo@gmail.com>
Add CC:
CC:
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 58137 depends on: Show dependency tree
Bug 58137 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2004-07-23 18:21 0000
This is about the fix in bug 45155.

I wanted to use the lock mechanism in my own script and I noticed something wrong. Every other time, the script failed to run. Here is what happens:

1. Starts with a clean state (no lock file)
2. First iteration starts
   a. Since there is no lock yet, ln succeeds
3. Since there is no break, there is a second iteration
   a. ln now fails because the lock exists from the first iteration
   b. The test to see if the script is running succeeds because the script tests itself.
   c. the scrip aborts and the lock is present
4. Now we run the script again (the lock still exits)
5. First iteration
   a. ln fails because of the lock file
   b. the pid doesn't match anything so the lock is deleted
6. Second iteration
   a. ln succeeds
7. We are done with the two iteration, now the remaining of the script can run
8. At the end of the run, the trap runs and deletes the lock

For the next run, there won't be any lock so we start at step 1.

I fixed it on my end by adding a break in the case "$? == 0" after the ln command:
  ln ....
  if [[ $? != 0 ]]; then
    ...
  else
    break
  fi


Also (maybe I should enter a second bug for that?), AFAIK, Linux reuses the pids after a while. So maybe the lock is there and points to a pid that belongs to another process than a run-crons instance. Given that run-crons run every minutes, it's unlikely to happen but that doesn't make it impossible. What about checking the /proc/<pid>/exe and check that it points to /usr/sbin/run-crons instead of using "kill -0"?

------- Comment #1 From Aron Griffis (RETIRED) 2004-07-25 08:58:59 0000 -------
Thanks for pointing out these problems!  I've rewritten the locking a bit to
handle what you mentioned and a couple other problems.  Please let me know what
you think... it's in cronbase-0.3.1

Regarding your second point, /proc/$$/exe isn't quite right because it will
point to bash as the script interpreter.  I used /proc/$$/fd instead to see if
there is a file descriptor pointing to run-crons.  Nice idea. :-)

------- Comment #2 From Nahor 2004-07-25 12:12:18 0000 -------
There are still problems with the file descriptor: what if a process like
"less" for instance is displaying the file or even displaying the file
"/home/joe/run-crons.sh~"?

I noticed that the script is always descriptor 255. And then the exe should be
bash. So a test like 
  "$(readlink /proc/${cronpid}/exe)" == "/bin/bash" && "$(readlink
/proc/${cronpid}/fd/255)" == $(readlink /proc/$$/fd/255)" 
would be better I think (I don't know how to expand $0 to the fullpath, that
why I used "$(readlink /proc/$$/fd/255)")

Otherwise, what about "`cat /proc/${cronpid}/cmdline`" == "`cat
/proc/$$/cmdline`"? Simpler and should work as long as we can assume that
run-crons is always started the same way. 


On a side note, is the line "cronpid=$(cat ${LOCKFILE} 2>/dev/null) ||" really
necessary? I understand that it could help for people who are upgrading from an
old version but the likelyhood to have an old undeleted lock is low and if
there is one, they will get the message after the loop.

------- Comment #3 From Aron Griffis (RETIRED) 2004-07-26 16:11:46 0000 -------
> I noticed that the script is always descriptor 255. And then the exe should be
> bash.

That is bash internal implementation specific, so I'd rather not depend on it.

> Otherwise, what about "`cat /proc/${cronpid}/cmdline`" == "`cat 
> /proc/$$/cmdline`"? Simpler and should work as long as we can assume that 
> run-crons is always started the same way. 

Good idea, I changed it to

[[ $(</proc/${cronpid}/cmdline) == $(</proc/$$/cmdline) ]] 2>/dev/null

> On a side note, is the line "cronpid=$(cat ${LOCKFILE} 2>/dev/null) ||" really
> necessary?

Yes, backward compatibility is good when it isn't painful to implement, and
this is hardly painful.  The message is definitely not as good as handling the
situation transparently.

Thanks for the suggestions.  I didn't bother to bump the package version since
I don't believe this improvement warrants it.

------- Comment #4 From Nahor 2004-07-26 16:22:28 0000 -------
> Yes, backward compatibility is good when it isn't painful to implement
Hmm, I would add that it depend if it is a normal situation or not too. IMHO, having to cat is not normal (except, and I'm not even sure that it is, the first time after an upgrade, all the other time it means that another process is playing around with the lock).

But well, that's not my code and I can live with it :)

Thanks for fixing the bugs

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug