Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 63443 - portage can not obtain a lock when distfiles is on nfs
Summary: portage can not obtain a lock when distfiles is on nfs
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All All
: High major (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: Bug
: 64397 64625 65426 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-09-09 09:47 UTC by Sebastian Dröge
Modified: 2008-08-04 21:34 UTC (History)
7 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Workaround for distfiles directory on a vfat partition; requires an additonal FEATURES flag (portage-2.51_rc1-pi.patch,815 bytes, patch)
2004-09-23 14:05 UTC, Jan Pieczkowski
Details | Diff
Workaround for distfiles directory on a vfat partition, second instance (portage-2.51_rc1-pi2.patch,1.42 KB, patch)
2004-09-23 15:01 UTC, Jan Pieczkowski
Details | Diff
portage-2.0.51_rc4 lockfile debug output (portage_lock_debug.out,3.90 KB, text/plain)
2004-09-26 14:06 UTC, Herbie Hopkins (RETIRED)
Details
portage-2.0.51_rc4 lockfile debug output (portage_lock_debug.out,3.90 KB, text/plain)
2004-09-26 14:41 UTC, Herbie Hopkins (RETIRED)
Details
debug output (debug_output.txt,111.79 KB, text/plain)
2004-09-26 15:00 UTC, Charlie Brackett
Details
weeve's ncftp emerge log using second patch (ncftp-emerge.log,141.37 KB, text/plain)
2004-09-26 15:33 UTC, Jason Wever (RETIRED)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Dröge 2004-09-09 09:47:44 UTC
Hi,
when using portage 2.0.51_pre20 you cannot have your distfiles on nfs because nfs doesn't support locking with flock() AFAIK
at least I get an error message saying that it can't create a lock before downloading something
Comment 1 Brian Harring (RETIRED) gentoo-dev 2004-09-09 11:24:14 UTC
error message if you would, permissions, etc...
Basically, info please.
Comment 2 Sebastian Dröge 2004-09-09 11:42:19 UTC
ok, sorry

the distfiles on the server:
exportfs: /usr/portage/distfiles  192.168.0.0/24(all_squash,anonuid=250,anongid=250,rw,sync)

drwxrwx---   4 portage  portage 53248  9. Sep 20:37 distfiles

emerge gcc
Calculating dependencies ...done!
>>> emerge (1 of 1) sys-devel/gcc-3.4.1-r2 to /
Traceback (most recent call last):
  File "/usr/bin/emerge", line 2815, in ?
    mydepgraph.merge(mydepgraph.altlist())
  File "/usr/bin/emerge", line 1725, in merge
    retval=portage.doebuild(y,"merge",myroot,self.pkgsettings,edebug)
  File "/usr/lib/portage/pym/portage.py", line 2702, in doebuild
    if not fetch(fetchme, mysettings, listonly, fetchonly):
  File "/usr/lib/portage/pym/portage.py", line 1979, in fetch
    file_lock = portage_locks.lockfile(mysettings["DISTDIR"]+"/"+locks_in_subdir+"/"+myfile,wantnewlockfile=1)
  File "/usr/lib/portage/pym/portage_locks.py", line 74, in lockfile
    raise ie
IOError: [Errno 37] No locks available


when I umount the distfiles on the client everything works without problems and IMHO the nfs problem can be worked around by using fcntl() with F_SETLK instead of flock()
Comment 3 SpanKY gentoo-dev 2004-09-10 22:25:01 UTC
perhaps your nfs permissions are too strict to allow locking

ive used nfs distfiles for a very long time now and never had a problem ... but i mount it without squash options and as root
Comment 4 Sebastian Dröge 2004-09-11 04:40:40 UTC
Even with no_root_squash and 777 permissions I get this error message
Can you give me your exports line that works?

(BTW: with portage 2.0.50 it works this way)
Comment 5 Sebastian Dröge 2004-09-12 11:24:03 UTC
Solved... 2.6.9-rc1-mm1 was broken for nfs locks...

Btw, this was reported before in #37344
Comment 6 SpanKY gentoo-dev 2004-09-12 12:01:55 UTC
works for me
Comment 7 SpanKY gentoo-dev 2004-09-19 15:13:20 UTC
*** Bug 64625 has been marked as a duplicate of this bug. ***
Comment 8 Ulrich Plate (RETIRED) gentoo-dev 2004-09-21 08:47:19 UTC
Still trying to get my brain around this, but at least I'm seeing a consistent pattern. I'm having lock errors alright, slightly different from the one the original poster reported, and it's happening right after I have a freshly installed portage-2.0.51_rc1:

daimyo ~ # emerge =sys-apps/portage-2.0.50-r11
Calculating dependencies ...done!
>>> emerge (1 of 1) sys-apps/portage-2.0.50-r11 to /
*** Adjusting cvs-src permissions for portage user...
!!! Unable to chgrp of /usr/portage/distfiles to portage, continuing

Traceback (most recent call last):
  File "/usr/bin/emerge", line 2826, in ?
    mydepgraph.merge(mydepgraph.altlist())
  File "/usr/bin/emerge", line 1733, in merge
    retval=portage.doebuild(y,"merge",myroot,self.pkgsettings,edebug)
  File "/usr/lib/portage/pym/portage.py", line 2369, in doebuild
    if not fetch(fetchme, mysettings, listonly=listonly, fetchonly=fetchonly):
  File "/usr/lib/portage/pym/portage.py", line 1637, in fetch
    file_lock = portage_locks.lockfile(mysettings["DISTDIR"]+"/"+locks_in_subdir+"/"+myfile,wantnewlockfile=1)
  File "/usr/lib/portage/pym/portage_locks.py", line 48, in lockfile
    os.chown(lockfilename,os.getuid(),portage_data.portage_gid)
OSError: [Errno 1] Operation not permitted: '/usr/portage/distfiles/.locks/portage-2.0.50-r11.tar.bz2.portage_lockfile'

Then I just do the exact same thing again, and it works, compiles, installs, everything. Re-emerging current portage-2.0.51_rc1, emerge the old portage again, start all over, just like above. First time failure, second time success.


Kernel config (2.6.9-rc2-mm1):
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V4 is not set
# CONFIG_NFS_DIRECTIO is not set
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V4 is not set
# CONFIG_NFS_TCP is not set

The NFS share is on a corporate FreeBSD 4.6.2-release server, NFS version unknown. I don't have admin rights on that machine. The exports file looks like this:

/data2 -alldirs -maproot=pkgshare -network 193.41.125.0 -mask 255.255.255.0

rpcinfo -p:

   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100005    3   udp   1023  mountd
    100005    3   tcp   1023  mountd
    100005    1   udp   1023  mountd
    100005    1   tcp   1023  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100024    1   udp   1011  status
    100024    1   tcp   1022  status

nfsstat shows traffic only for a v3 client, I suppose that means the server speaks v3, too.

Tell me if you need any additional informations. Thanks for looking into this again!
Comment 9 SpanKY gentoo-dev 2004-09-21 16:55:25 UTC
like nick and the reporter mention, flock() isnt NFS friendly
Comment 10 Ulrich Plate (RETIRED) gentoo-dev 2004-09-22 02:10:33 UTC
So I've tested this on my laptop for comparison now, and the error is reproducible on that one, too. Different kernel (2.6.9-rc1-mm5) but with almost identical config, distfiles on the same NFS server, shorter cable length (the desktop is on a CAT5 to the adjacent building that's almost 100 meters long), same effect. Except that I've now discovered that I can repeat the emerge command as much as I want, the failure --> success pattern only shows when there's a single file being taken from distfiles. gcc for example never gets there because the lockfile error just rotates through the list of files to be uncompressed. I get 

OSError: [Errno 1] Operation not permitted: '/usr/portage/distfiles/.locks/gcc-3.4.2-manpages.tar.bz2.portage_lockfile'

first for manpages, then patches, then gcc proper, and back to manpages etc. Never the same file twice in a row, not always in the same order either, but always one file that cannot get locked. I can now only emerge things that rely on single distfiles (like groff a couple of minutes ago, that one compiled and installed cleanly on the second attempt, just like portage in comment #8 above).
Comment 11 Jason Stubbs (RETIRED) gentoo-dev 2004-09-22 18:27:09 UTC
*** Bug 64397 has been marked as a duplicate of this bug. ***
Comment 12 Paul Slinski 2004-09-23 10:04:12 UTC
I played a hunch and restarted the nfs daemon on the server and the problem was gone. Portage works as expected here now.
Comment 13 Jan Pieczkowski 2004-09-23 13:06:28 UTC
The same error appears if the distfiles are stored on a FAT32 partition, and symlinked to. As the "chown" command is not allowed on such, portage fails. A workaround could of course be passing the portage groupid while mounting, but i have other things on that disk and don't want to change the group settings (that'd totally break my system's config).

hacking the portage script in the appropriate lines (in pym/portage_locks.py) might do the trick, but i actually don't want to mess around with it if there's another way.

it would be really nice if you could solve that issue, i don't have the option of moving my distfiles dir to an ext2/3 partition. my suggestion is to provide an additional flag for /etc/make.conf (i.e. FEATURES="distfilesonfat") so that the chown safetycheck is left out.
Comment 14 Jan Pieczkowski 2004-09-23 14:05:11 UTC
Created attachment 40245 [details, diff]
Workaround for distfiles directory on a vfat partition; requires an additonal FEATURES flag

Only slight modifications were made to pym/portage_locks.py
it now searches for the flag "vfatdistfiles" in FEATURES (/etc/make.conf), and
won't try to chown the .locks/<filename> then. if the flag is not set, the
behaviour is the old. it works for me, so if you like it, use it :)
(please tell me if you do so, just curious if it was helpful!)

 pi~
Comment 15 Jan Pieczkowski 2004-09-23 14:55:51 UTC
Comment on attachment 40245 [details, diff]
Workaround for distfiles directory on a vfat partition; requires an additonal FEATURES flag

In pym/portage.py, there's another use of chown which breaks emerge directly
after downloading the ebuild's files (same error). I corrected this so that one
doesn't have to restart emerge again for the install to work. See new Patch.
Comment 16 Jan Pieczkowski 2004-09-23 15:01:31 UTC
Created attachment 40250 [details, diff]
Workaround for distfiles directory on a vfat partition, second instance

Directly after downloading the source files the emerge process would break up
because chown is called from pym/portage.py as well. I fixed that, also using
the 'vfatdistfiles' flag.
If it works for you as well or if you run into troubles, please also step by
here: http://forums.gentoo.org/viewtopic.php?t=224166 and post it there as
well.

regards,
 pi~
Comment 17 Marius Mauch (RETIRED) gentoo-dev 2004-09-23 16:01:02 UTC
Any reason you can't just make a ext2 loopback filesystem on your FAT disc ?
Comment 18 Nicholas Jones (RETIRED) gentoo-dev 2004-09-23 22:36:39 UTC
The idea behind portage_* files is to remove the portage module itself
from the picture. So you're circular dep there is bad.

Besides that... You've just potentially anhilated all lockfiles for
non-root users, include userpriv.

NFS does support locking, but NFSv2 is not very good at it.
NFSv3 is fine. You have to enable that when you build your kernel.

I'm working out a fix, but I need testers periodically.

Test #1: Change all calls to 'flock' to 'lockf'. Search and replace.

Dumb FS Option #1: Check the result from chown's exception.

Comment 19 Charlie Brackett 2004-09-23 23:01:57 UTC
I'm not completely sure which file you're talking about doing the search/replace in, but I replaced all instances of "fcntl.flock" to "fcntl.lockf" in /usr/lib/portage/pym/portage_locks.py.  It didn't solve the problem, but it at least gives me a traceback now.

emerge -f gnome
Calculating dependencies ...done!
>>> emerge (1 of 76) gnome-base/gail-1.6.6 to /
Traceback (most recent call last):
  File "/usr/bin/emerge", line 2826, in ?
    mydepgraph.merge(mydepgraph.altlist())
  File "/usr/bin/emerge", line 1694, in merge
    retval=portage.doebuild(y,"fetch",myroot,self.pkgsettings,edebug,("--pretend" in myopts),fetchonly=1)
  File "/usr/lib/portage/pym/portage.py", line 2369, in doebuild
    if not fetch(fetchme, mysettings, listonly=listonly, fetchonly=fetchonly):
  File "/usr/lib/portage/pym/portage.py", line 1637, in fetch
    file_lock = portage_locks.lockfile(mysettings["DISTDIR"]+"/"+locks_in_subdir+"/"+myfile,wantnewlockfile=1)
  File "/usr/lib/portage/pym/portage_locks.py", line 74, in lockfile
    raise ie
IOError: [Errno 37] No locks available
Comment 20 Nicholas Jones (RETIRED) gentoo-dev 2004-09-23 23:25:46 UTC
That's a lovely result. I guess I'll have to implement the hardlink test.

As for the vfat issues, I'm working that out with a "friendly_chown" function.
Comment 21 Jan Pieczkowski 2004-09-24 01:31:18 UTC
Yeah well it was only a quick'n'dirty hack, not a full-blown solution. But I needed something that would work _now_, and it does. Still, I'm waiting eagerly for a "real" solution from you devs.
I'm not very experienced in Python (doing mostly c++/bash/php stuff), and I didn't look into portage yet before ("just used it"). But as from what I've seen, I really like Python so far, and portage is HUGE! (in terms of "being good")

Anyway, waiting for the fix now.
 pi~
Comment 22 Ulrich Plate (RETIRED) gentoo-dev 2004-09-24 02:18:14 UTC
Fiddling around with this, my filesystem went limp when an 'emerge gcc' died horribly ("filesystem is read-only") upon unpacking to /var/tmp/portage. I sighed and went on a business trip, to return only last night... $deity bless ext3's recovery mechanism, I've got 2000 entries in lost+found, but at least I still have a working laptop. :) Meanwhile, the desktop is apparently more sturdy and still open for suggestions. I've checked two things:

1. Changed all occurrences of fcntl.flock to fcntl.lockf

Same error as reported earlier: OSError: [Errno 1] Operation not permitted: '/usr/portage/distfiles/.locks/blahblah.tar.gz.portage_lockfile'

2. Changed the share from NFS to Samba (the server allows both)

Same error, both with the old fcntl.flock and the new fcntl.lockf.
Comment 23 Nicholas Jones (RETIRED) gentoo-dev 2004-09-25 01:30:14 UTC
Locks are valid on samba... There is an issue with reexporting an
NFS share through samba that creates locking issues. Are you doing that?

OriginalServer <------------NFS---------------> You
OriginalServer <--NFS--> SomeServer <--Samba--> You




The lockf fix will be included in _rc2 along with the hardlink-shuffle.
Comment 24 Ulrich Plate (RETIRED) gentoo-dev 2004-09-25 13:53:25 UTC
I don't think it's doing that sort of ricocheted export. It's the same machine that serves the diistfiles directory (just below /data2/SHARE/PKGSHARE) as both NFS and Samba share. The smb.conf entry looks like this:

[PKGSHARE]
    comment = common download pool
    browseable = no
    writable = yes
    only user = no
    create mask = 0666
    path = /data2/SHARE/PKGSHARE
    guest ok = no
    oplocks = False
    hide dot files = yes
    valid users = +pkgshare

while /etc/exports states:

/data2 -alldirs -maproot=pkgshare -network 193.41.125.0 -mask 255.255.255.0

I only have read permission on those files, but I can ask the admin to help if there's anything that needs to be done server-side.
Comment 25 Nicholas Jones (RETIRED) gentoo-dev 2004-09-25 19:34:51 UTC
Well, that config makes it pretty obvious.

You disabled oplocks. Is there a particular reason?



portage-2.0.51_rc3 is out with NFSv2 fixes.
Comment 26 Nicholas Jones (RETIRED) gentoo-dev 2004-09-25 22:13:17 UTC
Ok. So NFSv2 fixes aren't in yet.
Working on that right now.
Comment 27 Charlie Brackett 2004-09-25 22:15:12 UTC
RC3 is actually worse for me.  Before the only issue was with downloading the distfiles, but now it won't let me emerge a package at all.

emerge -uD world
Calculating world dependencies                              ...done!            
>>> emerge (1 of 1) sys-apps/vixie-cron-4.1-r1 to /
Traceback (most recent call last):
  File "/usr/bin/emerge", line 2844, in ?
    mydepgraph.merge(mydepgraph.altlist())
  File "/usr/bin/emerge", line 1737, in merge
    retval=portage.doebuild(y,"merge",myroot,self.pkgsettings,edebug)
  File "/usr/lib/portage/pym/portage.py", line 2370, in doebuild
    if not fetch(fetchme, mysettings, listonly=listonly, fetchonly=fetchonly):
  File "/usr/lib/portage/pym/portage.py", line 1639, in fetch
    file_lock = portage_locks.lockfile(mysettings["DISTDIR"]+"/"+locks_in_subdir+"/"+myfile,wantnewlockfile=1)
  File "/usr/lib/portage/pym/portage_locks.py", line 80, in lockfile
    raise ie
IOError: [Errno 37] No locks available
Comment 28 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 06:26:44 UTC
just upgraded to portage-2.0.51_rc4 (from rc1) and am now experiencing locking problems with my nfs mounted distdir. I get no error as such but portage seems to be waiting on it's own lock file.

>>> emerge (1 of 5) x11-misc/shared-mime-info-0.15 to /
Hardlink lockfile: /mnt/nfs/portage/distfiles/.locks/shared-mime-info-0.15.tar.gz.portage_lockfile.hardlock-terminus-21045

Waiting on (hardlink) lockfile: (one '.' per 3 seconds)
   /mnt/nfs/portage/distfiles/.locks/shared-mime-info-0.15.tar.gz.portage_lockfile
......................

It will then just sit there forever. If I start an emerge then open another term and delete the lockfile portage will then carry on as normal, however cleaning the locks before I start an emerge has no effect. I had no problems with rc1 and have no problems if DISTDIR is on a local filesystem.
Comment 29 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 10:37:56 UTC
Just has a possible solution proposed...

emerge nfs-utils
Comment 30 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 10:46:34 UTC
You have to restart nfs/nfsmount after installing that package.

FEATURES=-distlocks
will let you merge whatever you'd like without concern for locks.


                                                                                                                                                                                                                                                                                                                                                                                                  Herbie: You had no problems on _rc1? You're certain that you were on
     _rc1? _rc1 had a much less robust/NFS-stable locking scheme. I'd
     be quite surprised if it was actually working. Can you post mount options?
Comment 31 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 10:47:58 UTC
*** Bug 65426 has been marked as a duplicate of this bug. ***
Comment 32 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 10:48:13 UTC
------- Additional Comment #1 From Marien Zwart 2004-09-26 10:37 PST -------

I had the same issue. 'FEATURES="-distlocks" emerge package' disables locking, allowing me to merge things. After merging nfs-utils on the client and doing /etc/init.d/netmount restart (and using /usr/lib/portage/bin/clean_locks --force /usr/portage/distfiles/.locks, but not sure that was necessary) things started working again. If this works for other people too, I suggest adding this to portage's output when waiting on locks.

Comment 33 Charlie Brackett 2004-09-26 11:13:56 UTC
RC4 looks a lot better, but still not quite right for me.

 emerge -uD world
Calculating world dependencies      ...done!                                  
>>> emerge (1 of 2) x11-misc/shared-mime-info-0.15 to /
Hardlink lockfile: /usr/portage/distfiles/.locks/shared-mime-info-0.15.tar.gz.portage_lockfile.hardlock-hostname-10138

Waiting on (hardlink) lockfile: (one '.' per 3 seconds)
   /usr/portage/distfiles/.locks/shared-mime-info-0.15.tar.gz.portage_lockfile
...............................


Portage doesn't appear to ever stop waiting (I waited probably more than 5 minutes on a different attempt).  I tried deleting shared-mime-info-0.15.tar.gz.portage_lockfile during the wait and the emerge then resumed normally.  Doing an emerge -f on the NFS server while the NFS client is waiting on the lockfile will also clear the lock and allow the emerge to resume.

This is occuring both when the distfiles exist and when they need to be downloaded.
Comment 34 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 11:18:39 UTC
yep, quite sure. Just tried regressing back to 2.0.51_rc1 which solved the problem. Upon emerging _rc4 again I get the output I posted above. Emerging/reemerging nfs-utils has no effect here.

server export options: rw,no_subtree_check,no_root_squash,async
client mount options: rw,rsize=8192,wsize=8192,nfsvers=3,hard
Comment 35 Sebastian Dröge 2004-09-26 11:26:59 UTC
Same here... rc1 works, rc4 doesn't
Comment 36 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 11:44:01 UTC
Everyone reporting that _rc4 is broken for them:

emerge nfs-utils ; /etc/init.d/nfsmount restart

If it doesn't work, then please post the following:
uname -a
mount | grep mountpoint_goes_here
How is it mounted? Re-export of another share?
Comment 37 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 11:44:59 UTC
You may need to run

/usr/lib/portage/bin/clean-locks --force
Comment 38 Charlie Brackett 2004-09-26 12:20:15 UTC
I re-emerged nfs-utils and started nfsmount.  I also ran /usr/lib/portage/bin/clean-locks --force.  Now the emerge hangs whether the distfile exists or not with the following output:

emerge -f gnome
Calculating dependencies ...done!                                     
>>> emerge (1 of 76) gnome-base/gail-1.6.6 to /

CTRL-C now has no effect, nor does deleting the lockfile (or doing an emerge -f on the server).


uname -a
Linux wod28910rn 2.6.8-gentoo-r4 #1 Mon Sep 13 04:53:12 EST 2004 i686 Intel(R) Pentium(R) 4 CPU 2.00GHz GenuineIntel GNU/Linux

mount | grep /usr/portage
192.168.0.3:/usr/portage on /usr/portage type nfs (rw,hard,intr,tcp,nfsvers=3,addr=192.168.0.3)

(on NFS server)
grep /usr/portage /etc/exports
/usr/portage 192.168.0.2(rw,no_root_squash,sync)

The NFS share is just the /usr/portage directory on the server's / partition.  The partition is reiser4 (kernel is gentoo-dev-sources with reiser4 patched in).
Comment 39 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 12:22:24 UTC
uname -a: Linux terminus 2.6.8-gentoo-r3 #3 Wed Sep 22 21:06:20 BST 2004 x86_64 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux
client mount opts: rw,rsize=8192,wsize=8192,nfsvers=3,hard
server export opts: rw,no_subtree_check,no_root_squash,async

Tried reemerging nfs-utils, restarting services, changing various mount options, running clean-locks, all on both client and server. Always get the same result, portage waits indefinitly for it's own lock file to disappear.
Comment 40 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 12:37:15 UTC
not sure if it's relevent or weather it is intended behaviour but upon starting an emerge portage create two lock files in distfiles/.locks. For example:

$ rm -f distfiles/.locks/*
$ emerge -f nfs-utils
Calculating dependencies  ...done!
>>> emerge (1 of 1) net-fs/nfs-utils-1.0.6-r4 to /
Hardlink lockfile: /mnt/nfs/portage/distfiles/.locks/nfs-utils-1.0.6.tar.gz.portage_lockfile.hardlock-terminus-10187

Waiting on (hardlink) lockfile: (one '.' per 3 seconds)
   /mnt/nfs/portage/distfiles/.locks/nfs-utils-1.0.6.tar.gz.portage_lockfile
......
(pressed Ctrl-C to stop the emerge here)
$ ls distfiles/.locks
nfs-utils-1.0.6.tar.gz.portage_lockfile
nfs-utils-1.0.6.tar.gz.portage_lockfile.hardlock-terminus-10208
Comment 41 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 13:34:41 UTC
Herbie: Two lockfiles is a locking technique. Create a system+process unique file and then using the desired lockfile name, hardlink it. If it succeeds and/or the link count on the unique file is 2, then you have the lock.

What is the underlying FS for the NFS mount?


Charlie: Could you try a non-reiser4 partition as a test please?



I'll post a patch shortly with tons of debug.
Comment 42 Jason Wever (RETIRED) gentoo-dev 2004-09-26 13:42:26 UTC
Linux excelsior.weeve.org 2.6.9-rc2 #9 Fri Sep 24 16:37:35 MDT 2004 sparc64 sun4u TI UltraSparc IIe (Hummingbird) GNU/Linux

mounted with options (rw,soft,intr,addr=192.168.0.1)

Using autofs to manage the mount point rather than nfs-utils.

Underlying FS is ext3
Comment 43 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 13:51:31 UTC
underlying filesystem is reiserfs (v3) here.
Comment 44 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 13:52:48 UTC
http://zarquon.twobit.net/gentoo/portage/portage_locks.py-2.0.51_rc4-debug.diff

patch /usr/lib/portage/pym/portage_locks.py < portage_locks.py-2.0.51_rc4-debug.diff

That will produce a tremendous amount of output on most portage operations.
Just log it all, and post it for me.
Comment 45 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 14:06:47 UTC
Created attachment 40478 [details]
portage-2.0.51_rc4 lockfile debug output
Comment 46 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 14:16:08 UTC
Ok. There's a typo in the original patch, I changed the patch on my server.

Herbie:
Look for mylsd in /usr/lib/portage/pym/portage_locks.py and change it to mylsf.


Before starting that output, please run /usr/lib/portage/bin/clean_locks --force
Comment 47 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 14:41:05 UTC
Created attachment 40479 [details]
portage-2.0.51_rc4 lockfile debug output
Comment 48 Charlie Brackett 2004-09-26 15:00:34 UTC
Created attachment 40481 [details]
debug output
Comment 49 Charlie Brackett 2004-09-26 15:02:22 UTC
I get the same results when using reiserfs v3.
Comment 50 Jason Wever (RETIRED) gentoo-dev 2004-09-26 15:33:24 UTC
Created attachment 40483 [details]
weeve's ncftp emerge log using second patch
Comment 51 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 15:55:14 UTC
http://zarquon.twobit.net/gentoo/portage/portage_locks.py-2.0.51_rc4-debug2.diff

All I really need is the 'Exception' line after "Lock failed"
Comment 52 Jason Wever (RETIRED) gentoo-dev 2004-09-26 16:25:31 UTC
What I get here is;

Exception:  [Errno 17] File exists
Comment 53 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 16:30:43 UTC
with debug2:
lockfile(): Calling hardlink_lockfile()
lockfile(): Hardlink: Attempting link.
lockfile(): Hardlink: Link failed.
Exception:  [Errno 17] File exists
lockfile(): hardlink_is_mine() Entered
Comment 54 Doug Goldstein (RETIRED) gentoo-dev 2004-09-26 16:45:44 UTC
Well... I had NFSv2 and now NFSv3 up to rc1 and everything was fine... with rc3 (never used rc2). I couldn't download or emerge anything. Same problems that everyone is getting with NFS in both v2 and v3... However now with rc4... I can't even emerge sync... Here's the error I get with that...

receiving file list ...
98728 files to consider
delete_one: rmdir "/usr/portage/distfiles" failed: Device or resource busy

Number of files: 98728
Number of files transferred: 0
Total file size: 76526147 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 2293018
Total bytes written: 184
Total bytes read: 2293139

wrote 184 bytes  read 2293139 bytes  97588.21 bytes/sec
total size is 76526147  speedup is 33.37
rsync error: some files could not be transferred (code 23) at main.c(1064)
Comment 55 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 16:52:20 UTC
Ok. On a clean _rc4, try this. I think it'll fix it.

http://zarquon.twobit.net/gentoo/portage/portage_locks.py-2.0.51_rc4.diff
Comment 56 Herbie Hopkins (RETIRED) gentoo-dev 2004-09-26 17:11:46 UTC
yep, that works for me :)
Comment 57 Charlie Brackett 2004-09-26 17:40:15 UTC
This patch works for me, except for when I use ctrl-c or kill the process while the lock is in place.  If this happens, the lock is not removed and I see the same behavior as before.  Executing "/usr/lib/portage/bin/clean_locks --force" does not remove the lock, but deleting the files in /usr/portage/distfiles/.locks does.


Also, is the lock intending to prevent multiple machines from downloading the same file at once, or just multiple processes on the same machine?  If it is the former, then I do see one other issue.  Neither the NFS server nor client honor the lock put in place by the other machine.  I haven't tested whether or not the NFS clients will honor locks by other clients.
Comment 58 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 17:49:29 UTC
That would be the catch with NFS locks and killing an app. Once we get portage
fully handling signals, it should always clean up after itself, but you can
still do things to prevent/break that.

The hardlink method uses reference/link counts. So it's not a FS layer lock.
It's an atomic lock due to the nature of linking. All locking is cooperative
anyway, but portage, assuming you update all boxes, will pay attention to
the lock.

I'll add another message about the clean_locks tool.
Comment 59 Jason Wever (RETIRED) gentoo-dev 2004-09-26 17:56:41 UTC
Works for me as well.
Comment 60 Paul Slinski 2004-09-26 18:37:58 UTC
rc4 fails here now. 

rc1 was broken but all I had to do was reload the nfs daemon and all was well (porage mounted nfsv3 on ext3).

Now when I try to emerge it just hangs with: 

Waiting on (hardlink) lockfile: (one '.' per 3 seconds)
   /usr/portage/distfiles/.locks/shared-mime-info-0.15.tar.gz.portage_lockfile
.........(infinity)

Any insight guys?
Comment 61 Nicholas Jones (RETIRED) gentoo-dev 2004-09-26 22:38:09 UTC
Paul: 2.0.51_rc5
Comment 62 Sebastian Dröge 2004-09-27 08:51:45 UTC
rc6 works partially... you now can merge with distfiles on nfs... but when you try emerge sync you get "delete_one: rmdir "/usr/portage/distfiles" failed: Device or resource busy" and emerge tries to sync until the maximum number of retries

any ideas?
Comment 63 Jason Stubbs (RETIRED) gentoo-dev 2004-09-27 16:16:51 UTC
bug 65519
Comment 64 Ulrich Plate (RETIRED) gentoo-dev 2004-09-28 02:13:33 UTC
Just tried _rc6, and it's still yelling at me if I run emerge -v, but it does the trick:

Calculating world dependencies ...done!
>>> emerge (1 of 10) media-libs/libexif-0.6.10 to /
*** Adjusting cvs-src permissions for portage user...
!!! Unable to chgrp of /usr/portage/distfiles to portage, continuing

Cannot chown a lockfile. This could cause inconvenience later.

That last line gets repeated for each file being unpacked whenever there are multiple files called by an ebuild. But it works over NFS again, that's the important bit. :)
Comment 65 Ulrich Plate (RETIRED) gentoo-dev 2004-09-28 12:57:34 UTC
Rotating through the different hosts I have got running Gentoo on, there's one that doesn't behave. Identical setup with distfiles on the same NFS share as the two others that work fine now, only significant differences being a 2.4.27 kernel rather than 2.6.9.something on the others, and the underlying filesystem being Reiser3 (instead of XFS and ext3 respectively). This is the result:

# emerge -uDv world
Calculating world dependencies ...done!
>>> emerge (1 of 17) dev-libs/expat-1.95.8 to /
*** Adjusting cvs-src permissions for portage user...
!!! Unable to chgrp of /usr/portage/distfiles to portage, continuing

Cannot chown a lockfile. This could cause inconvenience later.
Traceback (most recent call last):
  File "/usr/bin/emerge", line 2885, in ?
    mydepgraph.merge(mydepgraph.altlist())
  File "/usr/bin/emerge", line 1776, in merge
    retval=portage.doebuild(y,"merge",myroot,self.pkgsettings,edebug)
  File "/usr/lib/portage/pym/portage.py", line 2380, in doebuild
    if not fetch(fetchme, mysettings, listonly=listonly, fetchonly=fetchonly):
  File "/usr/lib/portage/pym/portage.py", line 1649, in fetch
    file_lock = portage_locks.lockfile(mysettings["DISTDIR"]+"/"+locks_in_subdir+"/"+myfile,wantnewlockfile=1)
  File "/usr/lib/portage/pym/portage_locks.py", line 114, in lockfile
    raise e
IOError: [Errno 13] Permission denied

Same thing after an 'emerge metadata', portage-2.0.51_rc6. Any ideas?
Comment 66 Nicholas Jones (RETIRED) gentoo-dev 2004-10-02 21:08:18 UTC
If you are still on 51_rc6: FEATURES=-distlocks emerge portage

If you are on other versions and this doesn't work, use a rescue
portage, and then update to _rc7 or later.
Comment 67 Charlie Brackett 2004-10-11 15:54:14 UTC
rc9 breaks things for me again.

emerge --oneshot binutils
Calculating dependencies ...done!
>>> emerge (1 of 1) sys-devel/binutils-2.15.92.0.2-r1 to 

Portage hangs here (the file already exists).

Could this (from the ChangeLog) have anything to do with it:
" 08 Oct 2004; Brian Harring <ferringb@gentoo.org> portage_locks.py: Reverted 
  to using flock by default- if it fails (unavailable), -then- use lockf, then
  hardlink."
Comment 68 Ulrich Plate (RETIRED) gentoo-dev 2004-10-12 06:49:27 UTC
I don't have that problem with rc9. Things work as expected on all three hosts I've tested it on, provided I have FEATURES="-distlocks" in /etc/make.conf. 

From seeing distlocks mentioned in /etc/make.conf.example, I suppose you intend to keep this beahviour indefinitely. Any plans on silencing that "!!! unable to chgrp" warning, then? Portage just isn't its same old friendly self unless it stops yelling at me everytime I emerge something... :) 
Comment 69 Charlie Brackett 2004-10-12 13:16:28 UTC
Of course it works correctly when using -distlocks, this prevents the problem code from executing.  It doesn't bother me at all that the locks aren't working correctly since I can just use the -distlocks feature, but this is still a bug so I'm reporting it.
Comment 70 Nicholas Jones (RETIRED) gentoo-dev 2004-10-14 03:20:33 UTC
Charlie, if you could post the full output, that would be appreciated.

ls -li /usr/portage/distfiles/.locks

Comment 71 Ulrich Plate (RETIRED) gentoo-dev 2004-10-14 04:47:43 UTC
I'm not Charlie, but here's what it looks like on my share:

$ ls -li /usr/portage/distfiles/.locks/
total 0
963201 -rw-rw----  1  600 600 0 Sep 28 21:30 expat-1.95.8.tar.gz.portage_lockfile
963206 -rw-rw-rw-  1 1012 600 0 Sep 24 11:10 frozen-bubble-client-0.0.3.tar.bz2.portage_lockfile
963205 -rw-rw----  1  600 600 0 Sep 22 10:53 gcc-3.4.2-manpages.tar.bz2.portage_lockfile
963204 -rw-rw----  1  600 600 0 Sep 22 10:52 gcc-3.4.2.tar.bz2.portage_lockfile
963209 -rw-rw----  1  600 600 0 Sep 25 01:29 gpgme-0.9.0.tar.gz.portage_lockfile
963214 -rw-rw----  1  600 600 0 Oct 14 11:34 kdebase-3.3.1.tar.bz2.portage_lockfile
963207 -rw-rw-rw-  1 1012 600 0 Sep 24 11:14 matritsa-0.1.2.tar.gz.portage_lockfile
963208 -rw-rw----  1  600 600 0 Sep 25 01:15 modutils-2.4.26.tar.bz2.portage_lockfile
963213 -rw-rw----  1  600 600 0 Oct 12 09:51 portage-2.0.51_rc7.tar.bz2.portage_lockfile
963212 -rw-rw----  1  600 600 0 Sep 28 10:50 ppp-2.4.2-mppe-mppc-1.1.patch.gz.portage_lockfile
963210 -rw-rw----  1  600 600 0 Oct  1 23:44 readline50-004.portage_lockfile
963211 -rw-rw----  1  600 600 0 Oct 12 09:28 shadow-4.0.4.1.tar.bz2.portage_lockfile
963202 -rw-rw----  1  600 600 0 Sep 24 10:51 winesetuptk-0.7.tar.gz.portage_lockfile
963203 -rw-rw-rw-  1 1012 600 0 Sep 24 11:04 xdelta-1.1.3.tar.gz.portage_lockfile
Comment 72 Nicholas Jones (RETIRED) gentoo-dev 2004-10-14 18:50:01 UTC
Ulrich: You can delete all those.
Comment 73 Charlie Brackett 2004-10-15 09:51:19 UTC
That was the entire output of the failed emerge.

Here is what my .locks dir looks like after a failed emerge of sgml-common:

ls -li /usr/portage/distfiles/.locks
total 0
5473702 -rw-rw----  1 root portage 0 Oct 15 12:46 sgml-common-0.6.3.tgz.portage_lockfile
Comment 74 Nicholas Jones (RETIRED) gentoo-dev 2004-10-18 19:16:17 UTC
I unfixed it for _rc10. Strange stuff really.
It shouldn't be broken.
Comment 75 Nicholas Jones (RETIRED) gentoo-dev 2004-10-20 14:32:13 UTC
Can anyone verify that CURRENT (2.0.51 or 2.0.51_rc10) fix the
problem once and for all?
Comment 76 Bjarke Istrup Pedersen (RETIRED) gentoo-dev 2004-10-20 14:35:41 UTC
It runs fine, except that the lock isn't removed if I cancel a download with CTRL+C
Comment 77 Charlie Brackett 2004-10-21 17:55:21 UTC
It works for me too.
Comment 78 Ulrich Plate (RETIRED) gentoo-dev 2004-10-22 01:02:16 UTC
No problems here, either, except for the warnings ("!!! Unable to chgrp..." and "Couldn't chown a lockfile. This could cause inconvenience...") Anyway, as you've released 2.0.51 already, I suppose you know it works... Congratulations!
Comment 79 TJH 2004-10-25 10:58:51 UTC
A similar bug exists when /usr/portage/distfiles is mounted via cifs. The emerge process fails with the following output.

>>> emerge (1 of 1) sys-kernel/gentoo-dev-sources-2.6.9-r1 to /
*** Adjusting cvs-src permissions for portage user...
Traceback (most recent call last):
  File "/usr/bin/emerge", line 2991, in ?
    mydepgraph.merge(mydepgraph.altlist())
  File "/usr/bin/emerge", line 1839, in merge
    retval=portage.doebuild(y,"merge",myroot,self.pkgsettings,edebug)
  File "/usr/lib/portage/pym/portage.py", line 2506, in doebuild
    if not fetch(fetchme, mysettings, listonly=listonly, fetchonly=fetchonly):
  File "/usr/lib/portage/pym/portage.py", line 1849, in fetch
    portage_locks.unlockfile(file_lock)
  File "/usr/lib/portage/pym/portage_locks.py", line 162, in unlockfile
    raise IOError, "Failed to unlock file '%s'\n" % lockfilename
IOError: Failed to unlock file '/usr/portage/distfiles/.locks/linux-2.6.9.tar.bz2.portage_lockfile'

This is under portage-2.0.51-r2
Comment 80 Jason Stubbs (RETIRED) gentoo-dev 2004-10-26 04:51:21 UTC
For the CIFS case, the traceback implies that a standard lockf lock was obtainable but was not unlockable afterward. This seems like either a bug or possibly just a deficiency in the file system itself. You can of course get around it by using FEATURES="-distlocks".
Comment 81 Brian Harring (RETIRED) gentoo-dev 2005-02-28 01:00:47 UTC
This was fixed a while back.
flock, then lockf, then hardlink is the locking approach.  If hardlink isn't possible, well, you're screwed :)
Comment 82 Archimedes Trajano 2008-08-04 20:36:00 UTC
I seem to get this problem on the latest ~x86 version of portage.
Comment 83 Archimedes Trajano 2008-08-04 21:15:09 UTC
The -distlocks in Comment #80 helped fixed the problem on 2.2 though
Comment 84 Archimedes Trajano 2008-08-04 21:15:51 UTC
speaking on the topic of locks why not put the locks in /var/lock?
Comment 85 Alec Warner (RETIRED) archtester gentoo-dev Security 2008-08-04 21:34:03 UTC
(In reply to comment #84)
> speaking on the topic of locks why not put the locks in /var/lock?
> 

Locks are supposed to prevent readers from reading incomplete files and prevent multiple writers.  If the locks are in /var/locks instead of NFS than if distfiles are shared over the network the entire purpose of locking is lost as /var/locks is not a shared resource; so multiple writers or uninformed readers are possible.