Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 80846

Summary: collision-protect does not handle symlinks
Product: Portage Development Reporter: Thomas Bettler <thomas.bettler>
Component: Core - ConfigurationAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal CC: aklhfex, ansla80, askwar, cedk, corsair, dmakovey, gentoo, gentoo, jieryn, m.langer798, pupeno, roma1390, sascha-gentoo-bugzilla, shrdlu, SourLover, tcunha
Priority: High Keywords: InVCS
Version: unspecified   
Hardware: All   
OS: All   
See Also: https://bugs.gentoo.org/show_bug.cgi?id=927640
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 193766, 147007, 149110    
Attachments: collision-samefile-check.patch
avoid unnecessary collisions via device and i-node numbers

Description Thomas Bettler 2005-02-05 04:27:07 UTC
collision-protect feature in portage should - IMHO - handle softlinks the same as the actual files.

collision-protect interrupts the current installation if a softlink (not owned by a package i.e. /usr/X11R6/lib/somewhere) pointing to a file (owned by a package i.e. /usr/lib/somewhere) would be overwritten. Instead I expect that the installation proceeds, since there will only be a link overwritten where its target is owned by the package currently installing

Happened while remerging kdelibs-3.3.2-r2

This is my opinion, but maybe there is a reason for current behaviour or an even better solution.
btw. I have no clue how portage handles installing to already linked targets. Is the link killed and file installed or is the file written to the target?
And what happens if not the file itself but one of the parent directories is a soft link?

Portage 2.0.51-r15 (default-linux/x86/2004.3, gcc-3.3.5, glibc-2.3.4.20040808-r1, 2.6.10-gentoo-r6+win4lin i686)
=================================================================
System uname: 2.6.10-gentoo-r6+win4lin i686 Mobile Intel(R) Pentium(R) 4 - M CPU 1.80GHz
Gentoo Base System version 1.6.8
Python:              dev-lang/python-2.3.4 [2.3.4 (#1, Nov 30 2004, 04:40:47)]
distcc 2.16 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled]
dev-lang/python:     2.3.4
sys-devel/autoconf:  2.59-r6, 2.13
sys-devel/automake:  1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.4
sys-devel/binutils:  2.15.92.0.2-r1
sys-devel/libtool:   1.5.10-r4
virtual/os-headers:  2.6.8.1-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-O3 -march=pentium4 -fomit-frame-pointer -mmmx -msse -msse2 -mfpmath=sse -funroll-loops -fprefetch-loop-arrays -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=pentium4 -fomit-frame-pointer -mmmx -msse -msse2 -mfpmath=sse -funroll-loops -fprefetch-loop-arrays -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="collision-protect autoaddcvs autoconfig ccache distcc distlocks fixpackages maketest parallel-fetch sandbox sfperms strict"
GENTOO_MIRRORS="ftp://mirror.switch.ch/mirror/gentoo/ http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j2 -s"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 X aalib acl acpi alsa apache2 apm arts audiofile avi bash-completion berkdb bitmap-fonts canna cdr crypt cscope cups curl dga directfb divx4linux dvd encode esd f77 fam fbcon flac flash font-server foomaticdb fortran freewnn gdbm ggi gif gphoto2 gpm gstreamer gtk gtk2 gtkhtml guile imagemagick imap imlib ipv6 java jikes jpeg junit kde libg++ libwww mad maildirmbox mcal mikmod mmx mng motif motiv mozilla mpeg mysql nas ncurses nls nptl oggvorbis opengl oss pam pcmcia pda pdflib perl pic png pnp ppds prelude python qt quicktime readline ruby samba scanner sdl slang slp speex spell sse ssl svga symlink tcltk tcpd tetex theora tiff truetype truetype-fonts trusted type1-fonts usb v4l v4l2 wxwindows xine xinerama xml xml2 xmms xosd xprint xv xvid zlib"
Unset:  ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS
Comment 1 Marius Mauch (RETIRED) gentoo-dev 2005-02-06 06:25:04 UTC
There will always be problems with collision-protect, and symlinks are a nasty issue. The problems with following symlinks are circular symlinks and symlinks pointing to top level directories (and probably a few more). Depending how symlinks are used by the individual package you need one way or the other, but for implementation you have to pick one of them, and generally the current solution causes less problems.
The third option would be to completely ignore all symlinks (and their subtrees).
Comment 2 Thomas Bettler 2005-02-06 17:08:14 UTC
But won't it be an acceptable solution for collision-protect to check if the two different paths point to the same inode on the same filesystem? In this manner that bug could be resolved AFAIK and collision-protect would become a usable feature again.
Comment 3 Marius Mauch (RETIRED) gentoo-dev 2005-02-07 08:43:02 UTC
Well, they won't point to the same inode (that's only only true for hardlinks).
Comment 4 Thomas Bettler 2005-02-09 10:08:34 UTC
Note for softlinks:
This will also aply for softlinks, but only if they are on a upper (parent) level - and that's exactly the issue I mean.
But if you insist we close this bug again....and imagine there is no problem.
Comment 5 Donnie Berkholz (RETIRED) gentoo-dev 2005-03-14 19:09:27 UTC
*** Bug 85066 has been marked as a duplicate of this bug. ***
Comment 6 Thomas Bettler 2006-01-02 14:21:42 UTC
So it might be wise to dereference the links
(as it is done by 'ls -L')
Comment 7 Alec Warner (RETIRED) archtester gentoo-dev Security 2006-01-02 14:52:10 UTC
generally not a good idea to CC random developers, especially when they are already on the alias that the bug is assigned to ;)
Comment 8 Alexander Skwar 2006-01-02 21:53:18 UTC
*** Bug 117363 has been marked as a duplicate of this bug. ***
Comment 9 Thomas Bettler 2006-01-03 09:44:39 UTC
Pardon, I did't realize that.
Comment 10 Thomas Bettler 2006-01-03 09:52:36 UTC
Created attachment 76091 [details, diff]
collision-samefile-check.patch

After messing round some hours in the portage code I got this.
For me it works fine for the described scenario.

Can you please review my patch? I'm not a coder.
Thank you very much - hope it is of use
Comment 11 Brian Harring (RETIRED) gentoo-dev 2006-01-05 00:53:56 UTC
thomas, can you fire this off to gentoo-portage-dev@gentoo.org ml and shop it around for some feedback?

Bugs is high traffic with only portage devs on the alias, I'd like to see this patch tested by devs/users a bit for feedback.

Thanks
Comment 12 SpanKY gentoo-dev 2006-05-25 14:37:33 UTC
*** Bug 134215 has been marked as a duplicate of this bug. ***
Comment 13 Jakub Moc (RETIRED) gentoo-dev 2006-06-22 12:41:48 UTC
*** Bug 137644 has been marked as a duplicate of this bug. ***
Comment 14 Jakub Moc (RETIRED) gentoo-dev 2006-06-23 12:37:10 UTC
*** Bug 137767 has been marked as a duplicate of this bug. ***
Comment 15 Jakub Moc (RETIRED) gentoo-dev 2006-06-27 11:52:43 UTC
*** Bug 138239 has been marked as a duplicate of this bug. ***
Comment 16 Jakub Moc (RETIRED) gentoo-dev 2006-08-25 05:53:44 UTC
*** Bug 145063 has been marked as a duplicate of this bug. ***
Comment 17 Andrei Slavoiu 2006-08-27 01:54:36 UTC
Looking at the proposed patch, it looks pretty CPU intensive for packages with thousands of files. Wouldn't a better approach (though more intrusive) be to only store real paths and then also use real paths when searching for the duplicates? This would only have a O(n) complexity compared to the O(n*n) of the suggested patch.
Comment 18 Thomas Bettler 2006-08-28 02:11:01 UTC
Thank you for this concept. Sounds great - feel free to get in touch with the developers and integrate it.

Just a short note to my patch: It checks all the others package's files (n) only for the files not matching the recorded file (m).
Conclusion: So it won't be (n*n) - it is (n*m) where m is the count of files moved from the original path and linked back. [But yes, it will be n==m if all the files have moved]

Anyway, I prefer your proposed optimal solution, as I like the idea of recording [and comparing] only the absolute path and not any symlinks. As this needs only a change in the recording, it seems compatible with earlier and later versions of portage. Go for it...
Comment 19 Andrei Slavoiu 2006-08-28 04:55:48 UTC
(In reply to comment #18)
> Go for it...
Ahem, ok, maybe I'll give it a try one of this days, though my experience with python (and portage) programming is practically non existent. But there must be a first time for everything, just don't hold your breath waiting for the patch :)
Comment 20 Zac Medico gentoo-dev 2006-09-03 22:22:44 UTC
(In reply to comment #17)
> Looking at the proposed patch, it looks pretty CPU intensive for packages with
> thousands of files. Wouldn't a better approach (though more intrusive) be to
> only store real paths and then also use real paths when searching for the
> duplicates?

It's not necessary to store all the real paths in order to improve performance.  We can resolve all the real paths on the fly and cache them for the duration of the collision-protect phase of he merge.  That should be more robust and less intrusive that storing the real paths.
Comment 21 Andrei Slavoiu 2006-09-04 07:00:23 UTC
(In reply to comment #20)
> It's not necessary to store all the real paths in order to improve performance.
Yes, good point. This will eliminate the need for a `FEATURES=-collision-protect emerge -e world` to update all the paths in the database.
Comment 22 Zac Medico gentoo-dev 2006-09-09 16:19:43 UTC
Created attachment 96534 [details, diff]
avoid unnecessary collisions via device and i-node numbers

This is fixed in svn r4431 (patch applies against 2.1.1).  The concept is a identical to Thomas's patch except that device and inode numbers are cached for performance reasons.
Comment 23 Thomas Bettler 2006-09-10 04:00:37 UTC
Great! I love to see this getting fixed. Thank you.
Comment 24 Jakub Moc (RETIRED) gentoo-dev 2006-09-15 11:37:07 UTC
*** Bug 147711 has been marked as a duplicate of this bug. ***
Comment 25 Zac Medico gentoo-dev 2006-09-15 20:30:50 UTC
This has been released in 2.1.2_pre1.
Comment 26 Marius Mauch (RETIRED) gentoo-dev 2006-09-19 05:14:30 UTC
Haven't followed this bug closely, bu I really hope you don't follow absolute symlinks as that causes a very nasty performance hit (and was one of the reasons why symlink checking was disabled in the first place).
Comment 27 Zac Medico gentoo-dev 2006-09-19 11:00:14 UTC
(In reply to comment #26)
> Haven't followed this bug closely, bu I really hope you don't follow absolute
> symlinks as that causes a very nasty performance hit (and was one of the
> reasons why symlink checking was disabled in the first place).

The performance hit of the chosen implementation is negligible.  It uses st_dev and st_ino pairs as unique file identifiers in the dblink.isowner() method.  The st_dev and st_ino pair is exactly what os.path.samefile() uses to check if two files are the same (neglecting path).  The st_dev and st_ino pairs (from stat results) are cached inside of the dblink class so that the stat calls are reduced to a minimum (and us a set for fast lookup).
Comment 28 SpanKY gentoo-dev 2006-09-21 06:46:30 UTC
*** Bug 148476 has been marked as a duplicate of this bug. ***
Comment 29 Jakub Moc (RETIRED) gentoo-dev 2006-09-23 02:44:44 UTC
*** Bug 148512 has been marked as a duplicate of this bug. ***
Comment 30 Jakub Moc (RETIRED) gentoo-dev 2006-10-09 11:50:10 UTC
*** Bug 150574 has been marked as a duplicate of this bug. ***
Comment 31 Jakub Moc (RETIRED) gentoo-dev 2007-03-26 20:57:22 UTC
*** Bug 172341 has been marked as a duplicate of this bug. ***