Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 364143 - sys-apps/portage-2.2.01.18252 leaves temp dir on NFS after successful emerge
Summary: sys-apps/portage-2.2.01.18252 leaves temp dir on NFS after successful emerge
Status: CONFIRMED
Alias: None
Product: Gentoo/Alt
Classification: Unclassified
Component: Prefix Support (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
: 537116 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-04-19 16:43 UTC by Jeremy Olexa (darkside) (RETIRED)
Modified: 2022-01-15 08:32 UTC (History)
6 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-04-19 16:43:12 UTC
>>> www-client/firefox-4.0-r3 merged.
rm: cannot remove `/home/jolexa/portage/linux-64/var/tmp/portage/www-client/firefox-4.0-r3/temp': Directory not empty

Could be NFS PORTAGE_TMPDIR related, regardless it is a regression. Not exactly sure where the regression started though.
Comment 1 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-04-20 22:49:06 UTC
+ ls -la /home/jolexa/portage/linux-64/var/tmp/portage/sys-apps/less-440/temp
total 228
drwxrwxr-x 3 jolexa minstaff   4096 Apr 20 17:44 .
drwxrwxr-x 6 jolexa minstaff   4096 Apr 20  2011 ..
-rw-r--r-- 1 jolexa minstaff     50 Apr 20 17:44 70less
-rw-rw---- 1 jolexa minstaff 127041 Apr 20 17:44 build.log
-rw-rw-r-- 1 jolexa minstaff    600 Apr 20 17:44 eclass-debug.log
-rw-r--r-- 1 jolexa minstaff      5 Apr 20 17:44 .ecompress.suffix
-rw-rw-r-- 1 jolexa minstaff  89222 Apr 20 17:44 environment
-rw-r--r-- 1 jolexa minstaff   7245 Apr 20 17:44 lesspipe
drwxrwxr-x 2 jolexa minstaff   4096 Apr 20  2011 logging
-rw-r--r-- 1 jolexa minstaff    113 Apr 20 17:44 prepallman.filelist
+ rm -rf /home/jolexa/portage/linux-64/var/tmp/portage/sys-apps/less-440/temp
rm: cannot remove `/home/jolexa/portage/linux-64/var/tmp/portage/sys-apps/less-440/temp': Directory not empty
Comment 2 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-05-03 16:20:58 UTC
If no one else has this problem, I'll assume it to be nfs related. Although, there are no .nfs files, not sure.
Comment 3 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-08-22 15:58:37 UTC
Ah, now seen in stable portage on Gentoo Linux. I guess it is a regression.

@dev-portage team, ideas?
Comment 4 Zac Medico gentoo-dev 2011-08-22 16:35:09 UTC
Maybe a regression in coreutils. It's 'rm' that's failing, so there's nothing portage can do about it.
Comment 5 Fabian Groffen gentoo-dev 2011-09-02 17:25:55 UTC
hmmm, is this an rm on NFS by chance?
Comment 6 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-09-02 17:32:24 UTC
(In reply to comment #5)
> hmmm, is this an rm on NFS by chance?

Yes, (comment #2)
Comment 7 Fabian Groffen gentoo-dev 2011-09-02 17:38:28 UTC
then it's probably the nfs handles that prevent rm from completing (the .nfsXXXXXX files that you sometimes see, and disappear almost always automatically as well)
Comment 8 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2011-09-02 17:46:32 UTC
That's what I assumed too, but 1) this is a regression due to something because it only started around 18252 timeframe, 2) comment #1 is me debugging by placing a ls call right before the rm.
Comment 9 Zac Medico gentoo-dev 2011-09-02 17:58:59 UTC
Maybe there's some way to configure NFS so that it doesn't create these .nfsXXXXXX files. Aside from that, the only other solution that I can think of is to call rm multiple times, but that's kind of ugly.
Comment 10 Jack 2012-02-02 13:40:32 UTC
I have the same problem.  However, by the time I look in that temp directory, it IS empty.  If it was an 'rm -r' that failed, then for me, a second rm would apparently succeed.  Is there perhaps a timing/caching issue that the content files are all deleted, but when it goes to remove the directory itself, it appears that something is still there?

I also don't see any .nfs files anywhere - where should I be looking for them?

I really wonder if the underlying cause of the various errors with nfs4 mounted PORTAGE_TMPDIR are somehow related?  

(I'm currenlty on portage 2.1.10.144 on amd64)
Comment 11 Zac Medico gentoo-dev 2012-02-02 13:57:26 UTC
(In reply to comment #10)
> Is there perhaps a timing/caching issue that the content
> files are all deleted, but when it goes to remove the directory itself, it
> appears that something is still there?

I guess it's either those .nfsXXXXXX files that I'm not really familiar with, or the actual files that were supposed to have been removed. I doubt that it's the files that are being removed, because it seems obvious that those directory entries should be eliminated before unlink() returns (though it's possible that an NFS bug prevents that from happening).

> I also don't see any .nfs files anywhere - where should I be looking for them?

I'm not really familiar with these files. If they exist, my guess is that they are hard to observe because they are likely to disappear before you get a chance to observe them.

> I really wonder if the underlying cause of the various errors with nfs4 mounted
> PORTAGE_TMPDIR are somehow related?  

Well, you really ought to check with upstream NFS developers. If anyone would know, it would be them.
Comment 12 Michael Haubenwallner (RETIRED) gentoo-dev 2012-02-02 14:27:31 UTC
Is there any AIX machine involved in your NFS setup?

Just curious, as I remember those .nfsXXXXXX files with AIX only...
Comment 13 Fabian Groffen gentoo-dev 2012-02-02 14:28:46 UTC
I've seen them on Solaris (very rare) and Linux too.
Comment 14 Jack 2012-02-02 15:21:24 UTC
(In reply to comment #11)
> Well, you really ought to check with upstream NFS developers. If anyone would
> know, it would be them.
Have the Gentoo NFS packagers looked at any of these problems yet?  No, I don't think it's a packaging issue, but they might be more likely to focus in on exactly where the problem is, and be a better link to upstream.  These bugs (338547, 400679, and this one) are now against portage - is there any point in opening one against nfs?
Comment 15 Zac Medico gentoo-dev 2012-02-02 16:18:03 UTC
(In reply to comment #14)
> (In reply to comment #11)
> > Well, you really ought to check with upstream NFS developers. If anyone would
> > know, it would be them.
> Have the Gentoo NFS packagers looked at any of these problems yet?  No, I don't
> think it's a packaging issue, but they might be more likely to focus in on
> exactly where the problem is, and be a better link to upstream.  These bugs
> (338547, 400679, and this one) are now against portage - is there any point in
> opening one against nfs?

Okay, adding net-fs herd to CC.
Comment 16 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-02-02 16:46:43 UTC
(In reply to comment #14)
> (In reply to comment #11)
> > Well, you really ought to check with upstream NFS developers. If anyone would
> > know, it would be them.
> Have the Gentoo NFS packagers looked at any of these problems yet?  No, I don't
> think it's a packaging issue, but they might be more likely to focus in on
> exactly where the problem is, and be a better link to upstream.  These bugs
> (338547, 400679, and this one) are now against portage - is there any point in
> opening one against nfs?

The NFS implementation is not to blame. Therefore the Gentoo NFS team will not be of any help.

1) I started this report, my NFS is provided by a NetApp filer...NOT Gentoo related.
2) In comment #1, I placed a "ls" call right before the rm call, there was no .nfs files at *that* time of execution.
3) .nfs files exist on all OS's, not AIX/Solaris/Linux/etc specific
4) A useful test would be to debug portage with a "lsof" call before/during/after the rm call. That is, .../usr/lib/portage/bin/phase-functions.sh, there is a rm -rf "${T}" call.

Sorry Jack, you are jumping to wrong conclusions/assumptions.
Comment 17 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-02-02 17:00:22 UTC
(In reply to comment #16)

> 4) A useful test would be to debug portage with a "lsof" call
> before/during/after the rm call. That is,
> .../usr/lib/portage/bin/phase-functions.sh, there is a rm -rf "${T}" call.

And voila, here we see that emerge is keep a file open itself when it is trying to rm the directory.

+ ls -la /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp
total 32
drwxrwxr-x 3 jolexa minstaff  4096 Feb  2 10:58 .
drwxrwxr-x 6 jolexa minstaff  4096 Feb  2 10:58 ..
-rw-rw---- 1 jolexa minstaff  2875 Feb  2 10:58 build.log
-rw-rw-r-- 1 jolexa minstaff    33 Feb  2 10:58 eclass-debug.log
-rw-rw-r-- 1 jolexa minstaff 12582 Feb  2 10:58 environment
drwxrwxr-x 2 jolexa minstaff  4096 Feb  2 10:58 logging
-rw-r--r-- 1 jolexa minstaff     0 Feb  2 10:58 prepallman.filelist
+ rm -rf /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp
rm: cannot remove `/home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp': Directory not empty
+ ls -la /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp
total 12
drwxrwxr-x 2 jolexa minstaff 4096 Feb  2 10:58 .
drwxrwxr-x 6 jolexa minstaff 4096 Feb  2 10:58 ..
-rw-rw---- 1 jolexa minstaff 3552 Feb  2 10:58 .nfs000000000188df5f00000268
+ lsof /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp/.nfs000000000188df5f00000268
COMMAND  PID   USER   FD   TYPE DEVICE SIZE     NODE NAME
emerge  4948 jolexa    6w   REG   0,25 3839 25747295 /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp/.nfs000000000188df5f00000268
+ set +x
Comment 18 Zac Medico gentoo-dev 2012-02-02 17:13:58 UTC
Okay, so it's $T/build.log then? We can fix that.
Comment 19 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-02-02 17:17:55 UTC
(In reply to comment #18)
> Okay, so it's $T/build.log then? We can fix that.

Yup, i've just done some fancy lsof checking after removing individual files. It is definately $T/build.log, I can provide (lengthy) proof if needed. I'll test a patch when available.
Comment 20 Jack 2012-02-02 21:07:24 UTC
darkside - I certainly admit I'm close to being in over my head, but I've taken several of Zac's comments to imply he thinks at least some of these problems (remember, there are now three different bugs for portage problems with nfs) are due to nfs behaving differently from a local file system.  Also - I was thinking in terms of the Gentoo nfs client, not necessarily server.

Anyway - it sounds like you now have a handle on the specific issue in this case.  I'd love to do the same level of debugging on the other two, but they do not occur as consistently.  However, the issue with sudo getting installed without setuid set if PORTAGE_TMPDIR is nfs, IS consistent, and probably can be tracked down  in a similar manner.
Comment 21 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-02-02 21:16:53 UTC
(In reply to comment #20)
> are due to nfs behaving differently from a local file system.  Also - I was

Naturally. NFS is a whole different beast, mainly due to file locking and latency. You will nearly *always* hit edge cases in your code once you introduce NFS. (A clear example of this is that python flock() doesn't work on NFS, but I digress from the topic now)
Comment 22 Richard Yao (RETIRED) gentoo-dev 2012-02-02 23:56:37 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Is there perhaps a timing/caching issue that the content
> > files are all deleted, but when it goes to remove the directory itself, it
> > appears that something is still there?
> 
> I guess it's either those .nfsXXXXXX files that I'm not really familiar with,
> or the actual files that were supposed to have been removed. I doubt that it's
> the files that are being removed, because it seems obvious that those directory
> entries should be eliminated before unlink() returns (though it's possible that
> an NFS bug prevents that from happening).
> 
> > I also don't see any .nfs files anywhere - where should I be looking for them?

I had these .nfs files appear in $EPREFIX/usr/portage when using Gentoo Prefix on Solaris 10. My solution was to use emerge-websync.

(In reply to comment #11)
> I'm not really familiar with these files. If they exist, my guess is that they
> are hard to observe because they are likely to disappear before you get a
> chance to observe them.
> 
> > I really wonder if the underlying cause of the various errors with nfs4 mounted
> > PORTAGE_TMPDIR are somehow related?  
> 
> Well, you really ought to check with upstream NFS developers. If anyone would
> know, it would be them.

I can state that NFS issues are reproducible on Solaris 10, so any issues with .nfsXXXXXX files is not specific to Gentoo's NFS implementation.
Comment 23 Zac Medico gentoo-dev 2012-02-03 20:34:04 UTC
It turns out that there was a fix committed for the same issue a few years ago, but apparently it's not functioning as intended:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=d82225a6c9e6702bcdb4022649f6666981565359
Comment 24 Zac Medico gentoo-dev 2012-02-03 21:17:19 UTC
For testing purposes, I put a line like this in /etc/portage/bashrc:

  elog EBUILD_PHASE=$EBUILD_PHASE lsof: $(lsof | grep "$PORTAGE_LOG_FILE")

I didn't see any lsof output during he clean phase. I wonder if it's a latency issue with NFS, which causes the .nfsXXX file to continue to exist after portage has closed build.log?
Comment 25 Zac Medico gentoo-dev 2012-02-03 21:18:25 UTC
As a workaround, you can set PORT_LOGDIR so that the build logs are written in a separate directory.
Comment 26 Jeremy Olexa (darkside) (RETIRED) archtester gentoo-dev Security 2012-02-06 14:42:39 UTC
(In reply to comment #24)
> For testing purposes, I put a line like this in /etc/portage/bashrc:
> 
>   elog EBUILD_PHASE=$EBUILD_PHASE lsof: $(lsof | grep "$PORTAGE_LOG_FILE")
> 
> I didn't see any lsof output during he clean phase. I wonder if it's a latency
> issue with NFS, which causes the .nfsXXX file to continue to exist after
> portage has closed build.log?

 * EBUILD_PHASE=clean lsof: emerge 23123 jolexa 6u REG 0,25 4195 25747295 /home/jolexa/portage/linux-64/var/tmp/portage/virtual/pager-0/temp/build.log (<redacted>)
Comment 27 Richard Yao (RETIRED) gentoo-dev 2012-02-06 21:05:00 UTC
There is an explanation of the cause of this issue at the following blog post:

http://logicalshift.blogspot.com/2010/07/left-over-nfs-dot-nfs-files.html
Comment 28 Octavian 2015-01-20 17:33:54 UTC
*** Bug 537116 has been marked as a duplicate of this bug. ***
Comment 29 James Le Cuirot gentoo-dev 2021-09-06 22:01:17 UTC
Believe it or not, this still happens. It's really annoying. I read somewhere that it shouldn't happen any more with NFSv4.1. You have to force that version with vers=4.1, but even when I tried that, it still happened. :(
Comment 30 Morton Pellung 2022-01-15 08:32:45 UTC
Still happens when mounted via NFS to another computer for (much faster) build.

...
>>> dev-lang/rust-1.58.0 merged.
>>> Regenerating /etc/ld.so.cache...
/bin/rm: cannot remove '/var/tmp/portage/dev-lang/rust-1.58.0/temp': Directory not empty
>>> Auto-cleaning packages...