>>> www-client/firefox-4.0-r3 merged. rm: cannot remove `/home/jolexa/portage/linux-64/var/tmp/portage/www-client/firefox-4.0-r3/temp': Directory not empty Could be NFS PORTAGE_TMPDIR related, regardless it is a regression. Not exactly sure where the regression started though.
+ ls -la /home/jolexa/portage/linux-64/var/tmp/portage/sys-apps/less-440/temp total 228 drwxrwxr-x 3 jolexa minstaff 4096 Apr 20 17:44 . drwxrwxr-x 6 jolexa minstaff 4096 Apr 20 2011 .. -rw-r--r-- 1 jolexa minstaff 50 Apr 20 17:44 70less -rw-rw---- 1 jolexa minstaff 127041 Apr 20 17:44 build.log -rw-rw-r-- 1 jolexa minstaff 600 Apr 20 17:44 eclass-debug.log -rw-r--r-- 1 jolexa minstaff 5 Apr 20 17:44 .ecompress.suffix -rw-rw-r-- 1 jolexa minstaff 89222 Apr 20 17:44 environment -rw-r--r-- 1 jolexa minstaff 7245 Apr 20 17:44 lesspipe drwxrwxr-x 2 jolexa minstaff 4096 Apr 20 2011 logging -rw-r--r-- 1 jolexa minstaff 113 Apr 20 17:44 prepallman.filelist + rm -rf /home/jolexa/portage/linux-64/var/tmp/portage/sys-apps/less-440/temp rm: cannot remove `/home/jolexa/portage/linux-64/var/tmp/portage/sys-apps/less-440/temp': Directory not empty
If no one else has this problem, I'll assume it to be nfs related. Although, there are no .nfs files, not sure.
Ah, now seen in stable portage on Gentoo Linux. I guess it is a regression. @dev-portage team, ideas?
Maybe a regression in coreutils. It's 'rm' that's failing, so there's nothing portage can do about it.
hmmm, is this an rm on NFS by chance?
(In reply to comment #5) > hmmm, is this an rm on NFS by chance? Yes, (comment #2)
then it's probably the nfs handles that prevent rm from completing (the .nfsXXXXXX files that you sometimes see, and disappear almost always automatically as well)
That's what I assumed too, but 1) this is a regression due to something because it only started around 18252 timeframe, 2) comment #1 is me debugging by placing a ls call right before the rm.
Maybe there's some way to configure NFS so that it doesn't create these .nfsXXXXXX files. Aside from that, the only other solution that I can think of is to call rm multiple times, but that's kind of ugly.
I have the same problem. However, by the time I look in that temp directory, it IS empty. If it was an 'rm -r' that failed, then for me, a second rm would apparently succeed. Is there perhaps a timing/caching issue that the content files are all deleted, but when it goes to remove the directory itself, it appears that something is still there? I also don't see any .nfs files anywhere - where should I be looking for them? I really wonder if the underlying cause of the various errors with nfs4 mounted PORTAGE_TMPDIR are somehow related? (I'm currenlty on portage 2.1.10.144 on amd64)
(In reply to comment #10) > Is there perhaps a timing/caching issue that the content > files are all deleted, but when it goes to remove the directory itself, it > appears that something is still there? I guess it's either those .nfsXXXXXX files that I'm not really familiar with, or the actual files that were supposed to have been removed. I doubt that it's the files that are being removed, because it seems obvious that those directory entries should be eliminated before unlink() returns (though it's possible that an NFS bug prevents that from happening). > I also don't see any .nfs files anywhere - where should I be looking for them? I'm not really familiar with these files. If they exist, my guess is that they are hard to observe because they are likely to disappear before you get a chance to observe them. > I really wonder if the underlying cause of the various errors with nfs4 mounted > PORTAGE_TMPDIR are somehow related? Well, you really ought to check with upstream NFS developers. If anyone would know, it would be them.
Is there any AIX machine involved in your NFS setup? Just curious, as I remember those .nfsXXXXXX files with AIX only...
I've seen them on Solaris (very rare) and Linux too.
(In reply to comment #11) > Well, you really ought to check with upstream NFS developers. If anyone would > know, it would be them. Have the Gentoo NFS packagers looked at any of these problems yet? No, I don't think it's a packaging issue, but they might be more likely to focus in on exactly where the problem is, and be a better link to upstream. These bugs (338547, 400679, and this one) are now against portage - is there any point in opening one against nfs?
(In reply to comment #14) > (In reply to comment #11) > > Well, you really ought to check with upstream NFS developers. If anyone would > > know, it would be them. > Have the Gentoo NFS packagers looked at any of these problems yet? No, I don't > think it's a packaging issue, but they might be more likely to focus in on > exactly where the problem is, and be a better link to upstream. These bugs > (338547, 400679, and this one) are now against portage - is there any point in > opening one against nfs? Okay, adding net-fs herd to CC.
(In reply to comment #14) > (In reply to comment #11) > > Well, you really ought to check with upstream NFS developers. If anyone would > > know, it would be them. > Have the Gentoo NFS packagers looked at any of these problems yet? No, I don't > think it's a packaging issue, but they might be more likely to focus in on > exactly where the problem is, and be a better link to upstream. These bugs > (338547, 400679, and this one) are now against portage - is there any point in > opening one against nfs? The NFS implementation is not to blame. Therefore the Gentoo NFS team will not be of any help. 1) I started this report, my NFS is provided by a NetApp filer...NOT Gentoo related. 2) In comment #1, I placed a "ls" call right before the rm call, there was no .nfs files at *that* time of execution. 3) .nfs files exist on all OS's, not AIX/Solaris/Linux/etc specific 4) A useful test would be to debug portage with a "lsof" call before/during/after the rm call. That is, .../usr/lib/portage/bin/phase-functions.sh, there is a rm -rf "${T}" call. Sorry Jack, you are jumping to wrong conclusions/assumptions.
(In reply to comment #16) > 4) A useful test would be to debug portage with a "lsof" call > before/during/after the rm call. That is, > .../usr/lib/portage/bin/phase-functions.sh, there is a rm -rf "${T}" call. And voila, here we see that emerge is keep a file open itself when it is trying to rm the directory. + ls -la /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp total 32 drwxrwxr-x 3 jolexa minstaff 4096 Feb 2 10:58 . drwxrwxr-x 6 jolexa minstaff 4096 Feb 2 10:58 .. -rw-rw---- 1 jolexa minstaff 2875 Feb 2 10:58 build.log -rw-rw-r-- 1 jolexa minstaff 33 Feb 2 10:58 eclass-debug.log -rw-rw-r-- 1 jolexa minstaff 12582 Feb 2 10:58 environment drwxrwxr-x 2 jolexa minstaff 4096 Feb 2 10:58 logging -rw-r--r-- 1 jolexa minstaff 0 Feb 2 10:58 prepallman.filelist + rm -rf /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp rm: cannot remove `/home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp': Directory not empty + ls -la /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp total 12 drwxrwxr-x 2 jolexa minstaff 4096 Feb 2 10:58 . drwxrwxr-x 6 jolexa minstaff 4096 Feb 2 10:58 .. -rw-rw---- 1 jolexa minstaff 3552 Feb 2 10:58 .nfs000000000188df5f00000268 + lsof /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp/.nfs000000000188df5f00000268 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME emerge 4948 jolexa 6w REG 0,25 3839 25747295 /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp/.nfs000000000188df5f00000268 + set +x
Okay, so it's $T/build.log then? We can fix that.
(In reply to comment #18) > Okay, so it's $T/build.log then? We can fix that. Yup, i've just done some fancy lsof checking after removing individual files. It is definately $T/build.log, I can provide (lengthy) proof if needed. I'll test a patch when available.
darkside - I certainly admit I'm close to being in over my head, but I've taken several of Zac's comments to imply he thinks at least some of these problems (remember, there are now three different bugs for portage problems with nfs) are due to nfs behaving differently from a local file system. Also - I was thinking in terms of the Gentoo nfs client, not necessarily server. Anyway - it sounds like you now have a handle on the specific issue in this case. I'd love to do the same level of debugging on the other two, but they do not occur as consistently. However, the issue with sudo getting installed without setuid set if PORTAGE_TMPDIR is nfs, IS consistent, and probably can be tracked down in a similar manner.
(In reply to comment #20) > are due to nfs behaving differently from a local file system. Also - I was Naturally. NFS is a whole different beast, mainly due to file locking and latency. You will nearly *always* hit edge cases in your code once you introduce NFS. (A clear example of this is that python flock() doesn't work on NFS, but I digress from the topic now)
(In reply to comment #11) > (In reply to comment #10) > > Is there perhaps a timing/caching issue that the content > > files are all deleted, but when it goes to remove the directory itself, it > > appears that something is still there? > > I guess it's either those .nfsXXXXXX files that I'm not really familiar with, > or the actual files that were supposed to have been removed. I doubt that it's > the files that are being removed, because it seems obvious that those directory > entries should be eliminated before unlink() returns (though it's possible that > an NFS bug prevents that from happening). > > > I also don't see any .nfs files anywhere - where should I be looking for them? I had these .nfs files appear in $EPREFIX/usr/portage when using Gentoo Prefix on Solaris 10. My solution was to use emerge-websync. (In reply to comment #11) > I'm not really familiar with these files. If they exist, my guess is that they > are hard to observe because they are likely to disappear before you get a > chance to observe them. > > > I really wonder if the underlying cause of the various errors with nfs4 mounted > > PORTAGE_TMPDIR are somehow related? > > Well, you really ought to check with upstream NFS developers. If anyone would > know, it would be them. I can state that NFS issues are reproducible on Solaris 10, so any issues with .nfsXXXXXX files is not specific to Gentoo's NFS implementation.
It turns out that there was a fix committed for the same issue a few years ago, but apparently it's not functioning as intended: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=d82225a6c9e6702bcdb4022649f6666981565359
For testing purposes, I put a line like this in /etc/portage/bashrc: elog EBUILD_PHASE=$EBUILD_PHASE lsof: $(lsof | grep "$PORTAGE_LOG_FILE") I didn't see any lsof output during he clean phase. I wonder if it's a latency issue with NFS, which causes the .nfsXXX file to continue to exist after portage has closed build.log?
As a workaround, you can set PORT_LOGDIR so that the build logs are written in a separate directory.
(In reply to comment #24) > For testing purposes, I put a line like this in /etc/portage/bashrc: > > elog EBUILD_PHASE=$EBUILD_PHASE lsof: $(lsof | grep "$PORTAGE_LOG_FILE") > > I didn't see any lsof output during he clean phase. I wonder if it's a latency > issue with NFS, which causes the .nfsXXX file to continue to exist after > portage has closed build.log? * EBUILD_PHASE=clean lsof: emerge 23123 jolexa 6u REG 0,25 4195 25747295 /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp/build.log (<redacted>)
There is an explanation of the cause of this issue at the following blog post: http://logicalshift.blogspot.com/2010/07/left-over-nfs-dot-nfs-files.html
*** Bug 537116 has been marked as a duplicate of this bug. ***
Believe it or not, this still happens. It's really annoying. I read somewhere that it shouldn't happen any more with NFSv4.1. You have to force that version with vers=4.1, but even when I tried that, it still happened. :(
Still happens when mounted via NFS to another computer for (much faster) build. ... >>> dev-lang/rust-1.58.0 merged. >>> Regenerating /etc/ld.so.cache... /bin/rm: cannot remove '/var/tmp/portage/dev-lang/rust-1.58.0/temp': Directory not empty >>> Auto-cleaning packages...