Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 137269 - infinite recursion at line 128 of portage_locks
Summary: infinite recursion at line 128 of portage_locks
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core - Interface (emerge) (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS, REGRESSION
Depends on:
Blocks: 136244 137445
  Show dependency tree
 
Reported: 2006-06-19 09:12 UTC by Benedicto Sérgio de Almeida Santiago
Modified: 2006-06-21 12:42 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg (dmesg.txt,20.06 KB, text/plain)
2006-06-19 09:14 UTC, Benedicto Sérgio de Almeida Santiago
Details
only recurse for exactly 0 hardlinks (nlink.patch,413 bytes, patch)
2006-06-19 13:45 UTC, Zac Medico
Details | Diff
snippet more complete (bug5.txt,10.87 KB, text/plain)
2006-06-20 00:48 UTC, Benedicto Sérgio de Almeida Santiago
Details
use wantnewlockfile=1 for /usr/lib/portage/config (wantnewlockfile.patch,482 bytes, patch)
2006-06-20 11:31 UTC, Zac Medico
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Benedicto Sérgio de Almeida Santiago 2006-06-19 09:12:13 UTC
Really faster, but effectively not easier the new install CD.
After everything working fine, tends customized a lot of things, I ran the emerge 
Comment 1 Benedicto Sérgio de Almeida Santiago 2006-06-19 09:12:13 UTC
Really faster, but effectively not easier the new install CD.
After everything working fine, tends customized a lot of things, I ran the emerge sync command.
After that, to the that I believe, the things got complicated.
I already recompiled manually the python 2, but the problem persists.
When I ran the emerge command, be a simple one emerge bin86, or emerge  transcode, or emerge -uaDV world, I don't get results, always obtaining the following fault:
...
File "/usr/lib/portage/pym/portage_locks.py", line 128, in lockfile
    lockfilename,myfd,unlinkfile,locking_method = lockfile(mypath,wantnewlockfile,unlinkfile)
  File "/usr/lib/portage/pym/portage_locks.py", line 64, in lockfile
    if not os.path.exists(os.path.dirname(mypath)):
  File "/usr/lib/python2.4/posixpath.py", line 119, in dirname
    return split(p)[0]
  File "/usr/lib/python2.4/posixpath.py", line 79, in split
    if head and head != '/'*len(head):
RuntimeError: maximum recursion depth exceeded in cmp

emerge info:
Gentoo Base System version 1.6.14
Portage 2.1 (default-linux/x86/2006.0, gcc-3.4.4, glibc-2.3.5-r2, 2.6.15-gentoo-r1bsasantiago i686)
=================================================================
System uname: 2.6.15-gentoo-r1bsasantiago i686 Intel(R) Pentium(R) 4 CPU 1.60GHz
dev-lang/python:     2.4.2
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/gcc-config: 1.3.12-r6
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O2 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/terminfo"
CXXFLAGS="-march=i686 -O2 -pipe"
DISTDIR="/distfiles"
FEATURES="autoconfig ccache collision-protect digest distcc distlocks fixpackages metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X alsa apache2 apm arts audiofile avi berkdb bitmap-fonts bzip2 cdr cli crypt ctype cups dba dri eds elibc_glibc emboss encode esd ethereal exif expat fam fastbuild foomaticdb force-cgi-redirect fortran ftp gd gdbm gif glut gmp gnome gpm gstreamer gtk gtk2 gtkhtml guile idn imlib ipv6 isdnlog jpeg kde kernel_linux lcms libg++ libwww mad memlimit mikmod mng motif mozilla mp3 mpeg ncurses nls nptl ogg opengl oss pam pcre pdflib perl png posix pppd python qt quicktime readline reflection samba sdl session simplexml slang soap sockets spell spl ssl tcltk tcpd tiff tokenizer truetype truetype-fonts type1-fonts udev userland_GNU vorbis x86 xml xml2 xmms xorg xsl xv zlib"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS

Regards and congratulations to all, by the new CD Install more faster.
Comment 2 Benedicto Sérgio de Almeida Santiago 2006-06-19 09:14:52 UTC
Created attachment 89541 [details]
dmesg
Comment 3 Mike Doty (RETIRED) gentoo-dev 2006-06-19 09:16:11 UTC
not infra
Comment 4 Jason Stubbs (RETIRED) gentoo-dev 2006-06-19 09:45:57 UTC
(In reply to comment #0)
> File "/usr/lib/portage/pym/portage_locks.py", line 128, in lockfile
>     lockfilename,myfd,unlinkfile,locking_method =
> lockfile(mypath,wantnewlockfile,unlinkfile)
>   File "/usr/lib/portage/pym/portage_locks.py", line 64, in lockfile
>     if not os.path.exists(os.path.dirname(mypath)):
>   File "/usr/lib/python2.4/posixpath.py", line 119, in dirname
>     return split(p)[0]
>   File "/usr/lib/python2.4/posixpath.py", line 79, in split
>     if head and head != '/'*len(head):
> RuntimeError: maximum recursion depth exceeded in cmp

In the case of "maximum recursion depth exceed" you should get many (100s) more  lines than you've shown here. Is the above traceback all that was reported?
Comment 5 Zac Medico gentoo-dev 2006-06-19 11:16:18 UTC
Run /usr/lib/portage/bin/clean_locks to see if that helps.  What type of filesystem is /distfiles on?
Comment 6 Benedicto Sérgio de Almeida Santiago 2006-06-19 12:18:00 UTC
(In reply to comments #3 and #4)
comment #3
more lines (pay attention please -> ... , and so on)

comment #4
> Run /usr/lib/portage/bin/clean_locks to see if that helps.  What type of
> filesystem is /distfiles on?

(In reply to comment #4)
> Run /usr/lib/portage/bin/clean_locks to see if that helps.  What type of
> filesystem is /distfiles on?

I've ran emerge -u portage (I think the latest portage)

Here the clean_locks result:
localhost / # /usr/lib/portage/bin/clean_locks
PORTAGE_GPG_DIR is invalid. Removing gpg from FEATURES.

You must specify directories with hardlink-locks to clean.
You may optionally specify --force, which will remove all
of the locks, even if we can't establish if they are in use.
Please attempt cleaning without force first.

/usr/lib/portage/bin/clean_locks /distfiles/.locks
/usr/lib/portage/bin/clean_locks --force /distfiles/.locks
Comment 7 Benedicto Sérgio de Almeida Santiago 2006-06-19 12:46:01 UTC
Excuse me, and in the 2 forceds options, both results in a 0 locks
Comment 8 Zac Medico gentoo-dev 2006-06-19 13:15:42 UTC
I didn't see an answer for the "What type of filesystem is /distfiles on?" question.  To work around this, you can add -distlocks to FEATURES in make.conf.
Comment 9 Zac Medico gentoo-dev 2006-06-19 13:45:56 UTC
Created attachment 89567 [details, diff]
only recurse for exactly 0 hardlinks

I've seen a report of similar endless recursion due to a stale lock on an nfs filesystem.  Anyway, this patch should prevent recursion in some cases where it is inappropriate.  Please test it (without -distlocks in FEARTURES, of course).  If you save the patch as /tmp/nlink.patch then you can apply it as follows:

cd /usr/lib/portage
patch -p0 < /tmp/nlink.patch

Does that solve the problem?
Comment 10 Benedicto Sérgio de Almeida Santiago 2006-06-19 13:56:45 UTC
(In reply to comment #8)
> Created an attachment (id=89567) [edit]
> only recurse for exactly 0 hardlinks
> I've seen a report of similar endless recursion due to a stale lock on an nfs
> filesystem.  Anyway, this patch should prevent recursion in some cases where it
> is inappropriate.  Please test it (without -distlocks in FEARTURES, of course).
>  If you save the patch as /tmp/nlink.patch then you can apply it as follows:
> cd /usr/lib/portage
> patch -p0 < /tmp/nlink.patch
> Does that solve the problem?

I've added to FEATURES  -distlocks, but the same problem persist.
My filesystem (default in new CDInstall) ext3
Soon after I will install the patch and later I will announce 
Comment 11 Benedicto Sérgio de Almeida Santiago 2006-06-19 14:14:27 UTC
(In reply to comment #8)
...
> I've seen a report of similar endless recursion due to a stale lock on an nfs
> filesystem.  Anyway, this patch should prevent recursion in some cases where it
> is inappropriate.  Please test it (without -distlocks in FEARTURES, of course).
>  If you save the patch as /tmp/nlink.patch then you can apply it as follows:
> 
> cd /usr/lib/portage
> patch -p0 < /tmp/nlink.patch
> 
> Does that solve the problem?
> 

Sorry, but I haven't the nlink.patch (Nor in the /tmp, nor with slocate command)
May I search it and download?
Comment 12 Zac Medico gentoo-dev 2006-06-19 14:24:30 UTC
(In reply to comment #0)
>   File "/usr/lib/python2.4/posixpath.py", line 79, in split
>     if head and head != '/'*len(head):
> RuntimeError: maximum recursion depth exceeded in cmp

That seems like a python bug.  Apparently, the recursion problem isn't in portage itself.

> I already recompiled manually the python 2, but the problem persists.

Did you recompile python by hand or what (since portage isn't working)?

(In reply to comment #10)
> Sorry, but I haven't the nlink.patch

It's attached to this bug, but I don't think it will help you.  Your python seems to be broken.

Comment 13 Benedicto Sérgio de Almeida Santiago 2006-06-19 14:34:39 UTC
(In reply to comment #11)
> (In reply to comment #0)
> >   File "/usr/lib/python2.4/posixpath.py", line 79, in split
> >     if head and head != '/'*len(head):
> > RuntimeError: maximum recursion depth exceeded in cmp
> 
> That seems like a python bug.  Apparently, the recursion problem isn't in
> portage itself.
> 
> > I already recompiled manually the python 2, but the problem persists.
> 
> Did you recompile python by hand or what (since portage isn't working)?
> 
> (In reply to comment #10)
> > Sorry, but I haven't the nlink.patch
> 
> It's attached to this bug, but I don't think it will help you.  Your python
> seems to be broken.
> 
I've recompiled Python2 mannualy with:
tar xzf /usr/portage/distfiles/Python-2 ...
cd /Python-2 ..
./configure --with-fpectl --ifodir=/usr/share/info/ --mandir=/usr/share/man
make
make install prefix==/usr
rm /usr/bin/python 2>/dev/null

It's OK?
(Do you see the file system above ? Ext3 - default in the new CDInstall)

Comment 14 Jason Stubbs (RETIRED) gentoo-dev 2006-06-19 19:16:27 UTC
(In reply to comment #5)
> (In reply to comments #3 and #4)
> comment #3
> more lines (pay attention please -> ... , and so on)

I was paying attention. The snippet you've provided doesn't show where the recursion loop starts and ends.
Comment 15 Benedicto Sérgio de Almeida Santiago 2006-06-20 00:48:39 UTC
Created attachment 89605 [details]
snippet more complete

The begin, the loop in portage_locks.py (line 178), and the end
Comment 16 Benedicto Sérgio de Almeida Santiago 2006-06-20 00:56:19 UTC
Comment on attachment 89605 [details]
snippet more complete

Jason, I feel a lot, I recognize your interest and that is paying attention
Comment 17 Benedicto Sérgio de Almeida Santiago 2006-06-20 00:59:20 UTC
Comment on attachment 89605 [details]
snippet more complete

Jason, I'm sorry, I recognize your interest and that you are paying attention.
Comment 18 Zac Medico gentoo-dev 2006-06-20 01:33:58 UTC
(In reply to comment #14)

The attachement clearly shows a recursion loop at line 128 of portage_locks, so it's clearly a portage issue (your python install seems fine).  Have you tried that patch yet that I've attached to this bug? It may help.  It seems that you have a stale lockfile for /var/lib/portage/config.
Comment 19 Jason Stubbs (RETIRED) gentoo-dev 2006-06-20 07:46:38 UTC
What does 'ls -l /var/lib/portage/config' give you? Looking at the code, it would seem that it is hardlinked somewhere.

        if type(lockfilename) == types.StringType and \
                myfd != HARDLINK_FD and os.fstat(myfd).st_nlink != 1:
                # The file was deleted on us... Keep trying to make one...

st_nlink != 1 translating to the file being deleted seems like a bad assumption to me. If we're not using hardlinks and st_nlink somehow becomes 2 or more, it might be better to dump an inconsistency error with the location of the lockfile and then fail... More importantly, a user might have a valid reason for hardlinking a file that portage might want to lock. Perhaps we should be using auxillary lock files (wantnewlockfile=1) rather than locking files directly?
Comment 20 Benedicto Sérgio de Almeida Santiago 2006-06-20 08:25:37 UTC
(In reply to comment #18)
> What does 'ls -l /var/lib/portage/config' give you? Looking at the code, it

It is a file, not a link:
santiago@localhost / $ ls -l /var/lib/portage/config
-rw-rw----  2 root portage 438 Jun 16 20:55 /var/lib/portage/config

Comment 21 Jason Stubbs (RETIRED) gentoo-dev 2006-06-20 08:37:43 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > What does 'ls -l /var/lib/portage/config' give you? Looking at the code, it
> 
> It is a file, not a link:
> santiago@localhost / $ ls -l /var/lib/portage/config
> -rw-rw----  2 root portage 438 Jun 16 20:55 /var/lib/portage/config

That first 2 there indicates that the physical file is available via two names within the file system - what is commonly known as hardlinked. I take it there is no /var/lib/portage/config.hardlink-$(hostname)-#### file? If not, you may want to fsck that filesystem. If that still doesn't help, I'm not sure how to go about finding the second link... A quick workaround in that case would be to:

# cd /var/lib/portage
# mv config config.old
# cp config.old config
# chown root:portage config
# chmod 660 config
Comment 22 Jason Stubbs (RETIRED) gentoo-dev 2006-06-20 08:45:57 UTC
Ok, a quick search showed that find's -samefile option will do it. So, switch to the root of that filesystem (for example /var if /var is mounted separately) and then run `find . -samefile /var/lib/portage/config`. You'd probably be better off timewise doing the fsck first though.
Comment 23 Zac Medico gentoo-dev 2006-06-20 09:53:24 UTC
(In reply to comment #18)
> st_nlink != 1 translating to the file being deleted seems like a bad assumption

I've already changed it to st_nlink == 0 in svn r3540.

> to me. If we're not using hardlinks and st_nlink somehow becomes 2 or more, it
> might be better to dump an inconsistency error with the location of the
> lockfile and then fail... More importantly, a user might have a valid reason
> for hardlinking a file that portage might want to lock. Perhaps we should be
> using auxillary lock files (wantnewlockfile=1) rather than locking files
> directly?

Yeah.  Without an auxillary lockfile, the number of hardlinks is not dependable, so I think wantnewlockfile=1 is a good idea.
Comment 24 Zac Medico gentoo-dev 2006-06-20 11:31:07 UTC
Created attachment 89650 [details, diff]
use wantnewlockfile=1 for /usr/lib/portage/config

This is fixed in svn r3543.
Comment 25 Zac Medico gentoo-dev 2006-06-20 12:20:30 UTC
(In reply to comment #18)
> might be better to dump an inconsistency error with the location of the
> lockfile and then fail...

I'd don't like dumping errors unless it's absolutely necessary, because error messages lead to bug reports.  If we can handle the situation sanely without dumping an error, that would be nice.  We shouldn't be relying on the number of hardlinks unless wantnewlockfile=1, in which case a consistency check + error message is probably a good idea.

Comment 26 Jason Stubbs (RETIRED) gentoo-dev 2006-06-20 17:56:43 UTC
How do files get hardlink-locked now? The code that does the automatic switch seems to be removed... This is a good thing seeing that local vs NFSv3 clients vs NFSv4 clients all accessing the same location won't all choose the same locking mechanism under the automatic-switch scheme.

I agree that dumping of error messages should be prevented wherever possible, but if something is inconsist in an unforetold way... However, in this case there don't seem to be any unknown states. I was just thinking out loud before.
Comment 27 Zac Medico gentoo-dev 2006-06-21 00:13:51 UTC
This has been released in 2.1-r1.
Comment 28 Benedicto Sérgio de Almeida Santiago 2006-06-21 08:49:21 UTC
(In reply to comment #20 and others)

Excuse me, but yesterday I was very busy and I could not participate. 
Today, researching, I attempt besides to the workaround of the comment #20 (Jason), to have arrived more close to the problem.  Leaning to flow of ambiguous reference in the config hard link. (one for the folder linux.2.15.r5, other for Linux.2.16.20, because I have the two and I must have tried an update for the newest kernell)
I don't imagine if it elapsed of my inability, or... 
OK, 
See please the make.conf.example instead make.conf bellow: 

cat /var/lib/portage

/etc/etc-update.conf 7507d3a31a80c6ddd3e91673b8e36a46
/etc/skel/.bashrc 31989efc0a6237652d344f7f6fce15cc
/etc/make.conf.example ab74092bd8bfe30db5528d19582bf357
/etc/bash/bashrc addd22b9c7174a5220a350dc84cb738f
/etc/X11/gdm/gdm.conf 9b7d5b5b5bd8be39880524954eeeea80
/etc/cups/printers.conf 636d04ba41e72924d7e8ca8d5e5b7989
/etc/make.globals aae270f80bf3b9ae25254736f29a10b6
/etc/cups/classes.conf d8b385817fb41a6466686ccd97a61dca

The workaround:
mv ..old ;  cp cp make.conf make.conf.example
seems to have solved.
When I tried an emerge 
Comment 29 Benedicto Sérgio de Almeida Santiago 2006-06-21 08:49:21 UTC
(In reply to comment #20 and others)

Excuse me, but yesterday I was very busy and I could not participate. 
Today, researching, I attempt besides to the workaround of the comment #20 (Jason), to have arrived more close to the problem.  Leaning to flow of ambiguous reference in the config hard link. (one for the folder linux.2.15.r5, other for Linux.2.16.20, because I have the two and I must have tried an update for the newest kernell)
I don't imagine if it elapsed of my inability, or... 
OK, 
See please the make.conf.example instead make.conf bellow: 

cat /var/lib/portage

/etc/etc-update.conf 7507d3a31a80c6ddd3e91673b8e36a46
/etc/skel/.bashrc 31989efc0a6237652d344f7f6fce15cc
/etc/make.conf.example ab74092bd8bfe30db5528d19582bf357
/etc/bash/bashrc addd22b9c7174a5220a350dc84cb738f
/etc/X11/gdm/gdm.conf 9b7d5b5b5bd8be39880524954eeeea80
/etc/cups/printers.conf 636d04ba41e72924d7e8ca8d5e5b7989
/etc/make.globals aae270f80bf3b9ae25254736f29a10b6
/etc/cups/classes.conf d8b385817fb41a6466686ccd97a61dca

The workaround:
mv ..old ;  cp cp make.conf make.conf.example
seems to have solved.
When I tried an emerge u xxx, seems OK.

Remains to know how I will correct and why this happened.
If I will create a new hard link of the config done address for make.conf instead of make.conf.example (copy that I created of make.conf), or...??

Comment 30 Benedicto Sérgio de Almeida Santiago 2006-06-21 08:52:31 UTC
(In reply to comment #27)
Editing:
I forgot to say that config.old didn't exist
Comment 31 Zac Medico gentoo-dev 2006-06-21 12:42:35 UTC
(In reply to comment #27)
> Remains to know how I will correct and why this happened.
> If I will create a new hard link of the config done address for make.conf
> instead of make.conf.example (copy that I created of make.conf), or...??

make.conf and make.conf.example have absolutely nothing to do with this.  The only  unresolved question is how you got 2 hardlinks to /usr/lib/portage/config.  It's not important to us how that happened to you.  If you're interested in finding out why that file had 2 hardlinks, then use the find command as suggested in comment #21.