I was trying to upgrade my system and having parallel-fetch enabled for a few months without problems, this turned out: ~ # emerge -vDatu system These are the packages that would be merged, in reverse order: Calculating system dependencies... done! [nomerge ] media-fonts/font-alias-1.0.1 USE="-debug" [nomerge ] media-fonts/font-adobe-75dpi-1.0.0 USE="nls" [nomerge ] x11-apps/bdftopcf-1.0.0 USE="-debug" [ebuild U ] x11-libs/libXfont-1.2.2 [1.2.0-r2] USE="ipv6 -debug" 583 kB Total size of downloads: 583 kB Would you like to merge these packages? [Yes/No] >>> Emerging (1 of 1) x11-libs/libXfont-1.2.2 to / >>> Downloading 'ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/libXfont-1.2.2.tar.bz2' --22:45:13-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/libXfont-1.2.2.tar.bz2 => `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2' Resolving ftp.ecc.u-tokyo.ac.jp... 133.11.205.121 Connecting to ftp.ecc.u-tokyo.ac.jp|133.11.205.121|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /GENTOO/distfiles ... done. ==> PASV ... done. ==> RETR libXfont-1.2.2.tar.bz2 ... done. Length: 597,652 (584K) (unauthoritative) 0% [ ] 0 --.--K/s 22:45:43 (0.00 B/s) - Data connection: Connection timed out; Data transfer aborted. Retrying. --22:45:58-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/libXfont-1.2.2.tar.bz2 (try: 2) => `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2' ==> CWD not required. ==> PASV ... done. ==> RETR libXfont-1.2.2.tar.bz2 ... done. /usr/portage-distfiles/libXfont-1.2.2.tar.bz2 has sprung into existence. Retrying. --22:46:00-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/libXfont-1.2.2.tar.bz2 (try: 3) => `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2.1' Connecting to ftp.ecc.u-tokyo.ac.jp|133.11.205.121|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /GENTOO/distfiles ... done. ==> PASV ... done. ==> RETR libXfont-1.2.2.tar.bz2 ... done. Length: 597,652 (584K) (unauthoritative) 100%[==========================================================================================================================================================================>] 597,652 3.22M/s 22:46:00 (3.21 MB/s) - `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2.1' saved [597652] >>> Downloading 'ftp://ftp.gg3.net/pub/linux/gentoo/distfiles/libXfont-1.2.2.tar.bz2' --22:46:00-- ftp://ftp.gg3.net/pub/linux/gentoo/distfiles/libXfont-1.2.2.tar.bz2 => `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2.2' Resolving ftp.gg3.net... 220.157.235.239, 2001:240:516:1::e2 Connecting to ftp.gg3.net|220.157.235.239|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /pub/linux/gentoo/distfiles ... done. ==> PASV ... done. ==> RETR libXfont-1.2.2.tar.bz2 ... done. Length: 597,652 (584K) (unauthoritative) 100%[==========================================================================================================================================================================>] 597,652 --.--K/s 22:46:03 (5.07 MB/s) - `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2.2' saved [597652] >>> Downloading 'http://xorg.freedesktop.org/releases/individual/lib/libXfont-1.2.2.tar.bz2' --22:46:03-- http://xorg.freedesktop.org/releases/individual/lib/libXfont-1.2.2.tar.bz2 => `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2.3' Resolving xorg.freedesktop.org... 131.252.208.36 Connecting to xorg.freedesktop.org|131.252.208.36|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 597,652 (584K) [application/x-tar] 100%[==========================================================================================================================================================================>] 597,652 429.24K/s 22:46:09 (428.44 KB/s) - `/usr/portage-distfiles/libXfont-1.2.2.tar.bz2.3' saved [597652/597652] !!! Couldn't download 'libXfont-1.2.2.tar.bz2'. Aborting. On a second try, libXfont-1.2.2.tar.bz2 was successfully (one more time) fetched and compiled. I guess it is some kind of concurrent issue (i.e. parallel :-) or not understanding wget's error codes. Any ideas? ~ # emerge --info Portage 2.1.1 (default-linux/x86/2006.1/desktop, gcc-4.1.1, glibc-2.4-r3, 2.6.17-K01_PIII_laptop i686) ================================================================= System uname: 2.6.17-K01_PIII_laptop i686 Intel(R) Pentium(R) III Mobile CPU 800MHz Gentoo Base System version 1.12.4 Last Sync: Wed, 13 Sep 2006 05:20:01 +0000 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled] app-admin/eselect-compiler: [Not Present] dev-lang/python: 2.3.5, 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: [Not Present] dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r5 ACCEPT_KEYWORDS="x86" ACCEPT_LICENSE="" ANT_HOME="/usr/share/ant-core" ARCH="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=pentium3" CHOST="i686-pc-linux-gnu" ...
parallel-fetch works essentially by forking and running `emerge -e [all other non-interactive parameters]`. Distfiles should be locked before running FETCH_COMMAND unless FEATURES="-distlocks". Is that the case? (PS: Likely see you at OSC!)
parallel-fetch automatically disables itself if distlock is disabled, but maybe something else has gone wrong with the file locking. What type of filesystem is /usr/portage-distfiles/ on? Is the problem reproducible?
The problem seems to be not reproducible, probably it was a very sopecific timing when the file "appeared". ~ # portageq envvar FEATURES autoconfig ccache distlocks metadata-transfer noinfo parallel-fetch sandbox sfperms strict userpriv ~ # mount |grep portage gentoo:/usr/portage on /usr/portage type nfs (ro,sync,nfsvers=3,addr=192.168.1.1) gentoo:/usr/portage-distfiles on /usr/portage-distfiles type nfs (rw,intr,nfsvers=3,addr=192.168.1.1) I didn't have problems with this setup for a few months already... (P.S. jstubbs: See ya @OSC!)
The plain file locking code is well tested (since bug #121819), so it's apparently falling back to hardlink locking on your nfs filesystem.
In svn r4446 I've fixed a flaw in the unlocking code that could cause a lockfile to be removed even though it is currently locked by another process. Other than that, I don't see anything else that would potentially lead to what you've observed. You can verify that you're really using the hardlink code if you look and the names of the lock files that are created in $DISTDIR/.locks during fetch. You should see files there that end with .hardlock-<your hostname>-<emerge pid>.
OK, so it seems partly reproducible here, I had it again. I still cannot see why. Any way to turn more debugging on? Here it comes: >>> Emerging (1 of 1) media-fonts/kochi-substitute-20030809-r3 to / >>> Downloading 'ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/kochi-substitute-20030809.tar.bz2' --00:43:20-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/kochi-substitute-20030809.tar.bz2 => `/usr/portage-distfiles/kochi-substitute-20030809.tar.bz2' Resolving ftp.ecc.u-tokyo.ac.jp... 133.11.205.121 Connecting to ftp.ecc.u-tokyo.ac.jp|133.11.205.121|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /GENTOO/distfiles ... done. ==> PASV ... done. ==> RETR kochi-substitute-20030809.tar.bz2 ... done. Length: 8,675,822 (8.3M) (unauthoritative) 0% [ ] 0 --.--K/s 00:43:50 (0.00 B/s) - Data connection: Connection timed out; Control connection closed. Retrying. --00:44:22-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/kochi-substitute-20030809.tar.bz2 (try: 2) => `/usr/portage-distfiles/kochi-substitute-20030809.tar.bz2' Connecting to ftp.ecc.u-tokyo.ac.jp|133.11.205.121|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /GENTOO/distfiles ... done. ==> PASV ... done. ==> RETR kochi-substitute-20030809.tar.bz2 ... done. /usr/portage-distfiles/kochi-substitute-20030809.tar.bz2 has sprung into existence. Retrying. --00:44:24-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/kochi-substitute-20030809.tar.bz2 (try: 3) => `/usr/portage-distfiles/kochi-substitute-20030809.tar.bz2.1' Connecting to ftp.ecc.u-tokyo.ac.jp|133.11.205.121|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /GENTOO/distfiles ... done. ==> PASV ... done. ==> RETR kochi-substitute-20030809.tar.bz2 ... done. Length: 8,675,822 (8.3M) (unauthoritative) 100%[==================================================================================>] 8,675,822 3.46M/s [snip a few times] 00:44:46 (5.44 MB/s) - `/usr/portage-distfiles/kochi-substitute-20030809.tar.bz2.5' saved [8675822/8675822] !!! Couldn't download 'kochi-substitute-20030809.tar.bz2'. Aborting. As long as locks are concerned, I stopped a now succeeding download and this is what I had: /usr/portage-distfiles/.locks/kochi-substitute-20030809.tar.bz2.portage_lockfile At this time no other system was using the (NFS shared) /usr/portage-distfiles directory. It seems that I started having this since upgrade to portage-2.1.1 will try tomorrow on another pair of boxen with similar (NFS exported) setup. Is it wise to revert to portage-2.1?
(In reply to comment #6) > Any way to turn more debugging on? Not really. You can look at /var/log/emerge-fetch.log to see what the parallel-fetch output was though. > As long as locks are concerned, I stopped a now succeeding download and this is > what I had: > /usr/portage-distfiles/.locks/kochi-substitute-20030809.tar.bz2.portage_lockfile > If it's using hardlink locking, you'll see an additional lockfile the starts with the same name as above but has .hardlock-<your hostname>-<emerge pid> appended to the end. If you don't see the hardlock file, then it's using lockf rather than hardlinks. > At this time no other system was using the (NFS shared) /usr/portage-distfiles > directory. > > It seems that I started having this since upgrade to portage-2.1.1 will try > tomorrow on another pair of boxen with similar (NFS exported) setup. > > Is it wise to revert to portage-2.1? > There have been no changes in the file locking code between 2.1-r2 and 2.1.1.
Hmm, I tried to emerge something else (dia) on the same system and check what is going on... So: ~ # rm /usr/portage-distfiles/dia* /var/log/emerge* ~ # emerge dia -a These are the packages that would be merged, in order: Calculating dependencies... done! [ebuild R ] app-office/dia-0.95.1 USE="png zlib -debug -gnome -python" 4,104 kB Total size of downloads: 4,104 kB Would you like to merge these packages? [Yes/No] >>> Emerging (1 of 1) app-office/dia-0.95.1 to / >>> Downloading 'ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/dia-0.95-1.tar.bz2' --07:11:26-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/dia-0.95-1.tar.bz2 => `/usr/portage-distfiles/dia-0.95-1.tar.bz2' Resolving ftp.ecc.u-tokyo.ac.jp... 133.11.205.121 Connecting to ftp.ecc.u-tokyo.ac.jp|133.11.205.121|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /GENTOO/distfiles ... done. ==> PASV ... done. ==> RETR dia-0.95-1.tar.bz2 ... done. Length: 4,203,188 (4.0M) (unauthoritative) 0% [ ] 0 --.--K/s [1]+ Stopped emerge -v dia -a ~ # pstree -Aau 19719 emerge -O /usr/bin/emerge -v dia -a `-wget --passive --dns-timeout=10 --connect-timeout=15 --read-timeout=30 ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/dia-0.95-1.tar.bz2 -P /usr/portage-distfiles ~ # fg emerge -v dia -a Control connection closed. Retrying. --07:15:20-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/dia-0.95-1.tar.bz2 (try: 2) => `/usr/portage-distfiles/dia-0.95-1.tar.bz2' Connecting to ftp.ecc.u-tokyo.ac.jp|133.11.205.121|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD /GENTOO/distfiles ... done. ==> PASV ... done. ==> RETR dia-0.95-1.tar.bz2 ... done. /usr/portage-distfiles/dia-0.95-1.tar.bz2 has sprung into existence. Retrying. --07:15:22-- ftp://ftp.ecc.u-tokyo.ac.jp/GENTOO/distfiles/dia-0.95-1.tar.bz2 (try: 3) => `/usr/portage-distfiles/dia-0.95-1.tar.bz2.1' [1]+ Stopped emerge -v dia -a ~ # ll /usr/portage-distfiles/.locks/ total 0 -rw-rw---- 1 root portage 0 Sep 15 07:11 dia-0.95-1.tar.bz2.portage_lockfile ~ # portageq envvar FEATURES autoconfig ccache distlocks metadata-transfer noinfo parallel-fetch sandbox sfperms strict userpriv ~ # ll /var/log/ -d drwxr-xr-x 11 root root 832 Sep 15 07:08 /var/log/ ~ # ll /var/log/emerge* -rw-rw---- 1 portage portage 1355501 Sep 15 07:11 /var/log/emerge.log ~ # date Fri Sep 15 07:28:10 JST 2006 ~ # ll /usr/portage-distfiles/.locks/ total 0 -rw-rw---- 1 root portage 0 Sep 15 07:11 dia-0.95-1.tar.bz2.portage_lockfile The points: 1. /var/log/emerge-fetch.log does not get created after I deleted it. 2. I am not using hardlinking aparently (one file in the .locks dir) 3. After "has sprung into existence", there is "Retrying." ... why? 4. The lock file does not get cleared after the "has sprung into existence" MSG
(In reply to comment #8) > The points: > 1. /var/log/emerge-fetch.log does not get created after I deleted it. parallel-fetch is only triggered when there is more than one package in the merge list. > 2. I am not using hardlinking aparently (one file in the .locks dir) > 3. After "has sprung into existence", there is "Retrying." ... why? > 4. The lock file does not get cleared after the "has sprung into existence" MSG wget hasn't returned control to portage yet, so the same lockfile remains. Anyway, this doesn't seems to have anything to do with file locking or parallel-fetch. It's just that wget isn't behaving the way that we want it to.
Are you using something special for FETCHCOMMAND and RESUMECOMMAND? Something must have changed to make wget behave differently for you than it used to, and it's got nothing to do with portage.
# grep COMMAND /etc/make.conf FETCHCOMMAND="/usr/bin/wget --passive --dns-timeout=10 --connect-timeout=15 --read-timeout=30 \${URI} -P \${DISTDIR}" RESUMECOMMAND="${FETCHCOMMAND} --continue" This has not changed for at least a year for me. So it is wget or the NFS code itself... Any more pointers to explore?
Somebody's reported a similar issue in bug 145601.