Bug 282928

Summary:	dev-util/catalyst doesn't work over NFS
Product:	Gentoo Hosted Projects	Reporter:	Raúl Porcel (RETIRED) <armin76>
Component:	Catalyst	Assignee:	Gentoo Catalyst Developers <catalyst>
Status:	CONFIRMED ---
Severity:	normal	CC:	darkside, kumba, robink
Priority:	High
Version:	unspecified
Hardware:	All
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---

Description Raúl Porcel (RETIRED) gentoo-dev

2009-08-27 16:38:13 UTC

Just execute catalyst -f stage1.spec, and /var/tmp/catalyst/tmp being a nfs mount.

Referenced SEEDCACHE does not appear to be a directory, trying to untar...
No Valid Resume point detected, cleaning up...
Removing AutoResume Points: ...
Emptying directory /var/tmp/catalyst/tmp/default/.autoresume-stage1-armv4l-20090827/
Emptying directory /var/tmp/catalyst/tmp/default/stage1-armv4l-20090827/
Traceback (most recent call last):
File "/usr/lib/catalyst/catalyst", line 208, in build_target
    mytarget.run()
File "modules/generic_stage_target.py", line 1260, in run
    apply(getattr(self,x))
File "modules/generic_stage_target.py", line 712, in unpack
    self.clear_chroot()
File "modules/generic_stage_target.py", line 1532, in clear_chroot
    shutil.rmtree(myemp)
File "/usr/lib/python2.5/shutil.py", line 184, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
File "/usr/lib/python2.5/shutil.py", line 182, in rmtree
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/var/tmp/catalyst/tmp/default/stage1-armv4l-20090827/'
!!! catalyst: Error encountered during run of target stage1
Catalyst aborting....
lockfile does not exist '/var/tmp/catalyst/tmp/default/stage1-armv4l-20090827/.catalyst_lock'

This happens with catalyst-9999 and previous versions. Searching through the web it looks like its a shutil.rmtree issue. However using system's rm works fine.

Thanks

Comment 1 Raúl Porcel (RETIRED) gentoo-dev

2009-08-27 16:48:05 UTC

See http://bugs.skolelinux.no/show_bug.cgi?id=1025
http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/rdiff-backup-23/solution-for-the-nfs-related-problems-with-rmdir-63122/

Comment 2 Andrew Gaffney (RETIRED) gentoo-dev

2009-08-27 20:26:37 UTC

The problem here is probably the .nfsXXXXXXXXXXXXXX files that appear "randomly" in NFS-mounted directories. There are two different solutions proposed in those 2 links, and neither of them are particularly "good".

1) Catch the OSError and ignore it. This obviously isn't a good idea. The directory is cleared for a reason, and we can't guarantee it's a .nfsXXXXXX file causing the issue without walking the directory tree

2) Try again and hope it works. For this one, we could have it try 5 times, perhaps with a 1s delay in between, which *should* give enough time for the .nfsXXXXX file to disappear. If we can't remove the dir after 5 tries, bail

Comment 3 Raúl Porcel (RETIRED) gentoo-dev

2009-09-07 13:11:53 UTC

(In reply to comment #2)
> The problem here is probably the .nfsXXXXXXXXXXXXXX files that appear
> "randomly" in NFS-mounted directories. There are two different solutions
> proposed in those 2 links, and neither of them are particularly "good".
> 
> 1) Catch the OSError and ignore it. This obviously isn't a good idea. The
> directory is cleared for a reason, and we can't guarantee it's a .nfsXXXXXX
> file causing the issue without walking the directory tree
> 
> 2) Try again and hope it works. For this one, we could have it try 5 times,
> perhaps with a 1s delay in between, which *should* give enough time for the
> .nfsXXXXX file to disappear. If we can't remove the dir after 5 tries, bail
> 

2 looks okay, at least better than 1.

Comment 4 Raúl Porcel (RETIRED) gentoo-dev

2010-05-29 15:38:10 UTC

*poke*

Comment 5 Raúl Porcel (RETIRED) gentoo-dev

2010-12-05 16:59:03 UTC

slacker!

Comment 6 Jeremy Olexa (darkside) (RETIRED) archtester

2011-02-08 21:10:58 UTC

(In reply to comment #2)
> The problem here is probably the .nfsXXXXXXXXXXXXXX files that appear
> "randomly" in NFS-mounted directories. There are two different solutions
> proposed in those 2 links, and neither of them are particularly "good".
> 
> 1) Catch the OSError and ignore it. This obviously isn't a good idea. The
> directory is cleared for a reason, and we can't guarantee it's a .nfsXXXXXX
> file causing the issue without walking the directory tree
> 
> 2) Try again and hope it works. For this one, we could have it try 5 times,
> perhaps with a 1s delay in between, which *should* give enough time for the
> .nfsXXXXX file to disappear. If we can't remove the dir after 5 tries, bail

Neither will work because it is catalyst itself that is holding the file/dir open..

% lsof /mnt/stagebuilding/tmp/catalyst/tmp/default/stage1-armv7a-20110208/.nfs000000000006106800000017
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
catalyst 20736 root    3uW  REG   0,17        0 397416 /mnt/stagebuilding/tmp/catalyst/tmp/default/stage1-armv7a-20110208/.nfs000000000006106800000017

% ps aux|grep 20736
root     20736  0.1  1.7  11260  7376 pts/1    S+   20:59   0:00 /usr/bin/python2.6 -OO /usr/lib/catalyst/catalyst -a -p -c /etc/catalyst/catalyst.conf -f stage1.spec

Other ideas?

Comment 7 Jeremy Olexa (darkside) (RETIRED) archtester

2012-03-06 15:51:35 UTC

For a long time, I've made the lockdir call quite stupid in my overlay's version of catalyst.

http://git.overlays.gentoo.org/gitweb/?p=dev/darkside.git;a=blob;f=dev-util/catalyst/files/0001-generic_stage_target.py-stupify-the-LockDir-call.patch

Comment 8 SpanKY gentoo-dev

2015-10-11 17:28:47 UTC

there was some initial hardlink support in the lockdir module, but it's been removed to simplify things greatly:
http://gitweb.gentoo.org/proj/catalyst.git/commit/?id=f083637554bf5668ec856c56cfaaa76bb343d941

if we want to restore that, it should be by:
 - create a new HardlinkLock class in snakeoil.osutils
 - use same API as snakeoil.osutils.FsLock
 - change LockDir to pick FsLock or HardlinkLock based on FS type in __init__
 - nothing else needs to change :)

Comment 9 Matt Turner gentoo-dev

2020-03-28 23:25:31 UTC

*** Bug 707698 has been marked as a duplicate of this bug. ***