collision-protect feature in portage should - IMHO - handle softlinks the same as the actual files.
collision-protect interrupts the current installation if a softlink (not owned by a package i.e. /usr/X11R6/lib/somewhere) pointing to a file (owned by a package i.e. /usr/lib/somewhere) would be overwritten. Instead I expect that the installation proceeds, since there will only be a link overwritten where its target is owned by the package currently installing
Happened while remerging kdelibs-3.3.2-r2
This is my opinion, but maybe there is a reason for current behaviour or an even better solution.
btw. I have no clue how portage handles installing to already linked targets. Is the link killed and file installed or is the file written to the target?
And what happens if not the file itself but one of the parent directories is a soft link?
Portage 2.0.51-r15 (default-linux/x86/2004.3, gcc-3.3.5, glibc-184.108.40.20640808-r1, 2.6.10-gentoo-r6+win4lin i686)
System uname: 2.6.10-gentoo-r6+win4lin i686 Mobile Intel(R) Pentium(R) 4 - M CPU 1.80GHz
Gentoo Base System version 1.6.8
Python: dev-lang/python-2.3.4 [2.3.4 (#1, Nov 30 2004, 04:40:47)]
distcc 2.16 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled]
sys-devel/autoconf: 2.59-r6, 2.13
sys-devel/automake: 1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.4
CFLAGS="-O3 -march=pentium4 -fomit-frame-pointer -mmmx -msse -msse2 -mfpmath=sse -funroll-loops -fprefetch-loop-arrays -pipe"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=pentium4 -fomit-frame-pointer -mmmx -msse -msse2 -mfpmath=sse -funroll-loops -fprefetch-loop-arrays -pipe"
FEATURES="collision-protect autoaddcvs autoconfig ccache distcc distlocks fixpackages maketest parallel-fetch sandbox sfperms strict"
GENTOO_MIRRORS="ftp://mirror.switch.ch/mirror/gentoo/ http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
USE="x86 X aalib acl acpi alsa apache2 apm arts audiofile avi bash-completion berkdb bitmap-fonts canna cdr crypt cscope cups curl dga directfb divx4linux dvd encode esd f77 fam fbcon flac flash font-server foomaticdb fortran freewnn gdbm ggi gif gphoto2 gpm gstreamer gtk gtk2 gtkhtml guile imagemagick imap imlib ipv6 java jikes jpeg junit kde libg++ libwww mad maildirmbox mcal mikmod mmx mng motif motiv mozilla mpeg mysql nas ncurses nls nptl oggvorbis opengl oss pam pcmcia pda pdflib perl pic png pnp ppds prelude python qt quicktime readline ruby samba scanner sdl slang slp speex spell sse ssl svga symlink tcltk tcpd tetex theora tiff truetype truetype-fonts trusted type1-fonts usb v4l v4l2 wxwindows xine xinerama xml xml2 xmms xosd xprint xv xvid zlib"
Unset: ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS
There will always be problems with collision-protect, and symlinks are a nasty issue. The problems with following symlinks are circular symlinks and symlinks pointing to top level directories (and probably a few more). Depending how symlinks are used by the individual package you need one way or the other, but for implementation you have to pick one of them, and generally the current solution causes less problems.
The third option would be to completely ignore all symlinks (and their subtrees).
But won't it be an acceptable solution for collision-protect to check if the two different paths point to the same inode on the same filesystem? In this manner that bug could be resolved AFAIK and collision-protect would become a usable feature again.
Well, they won't point to the same inode (that's only only true for hardlinks).
Note for softlinks:
This will also aply for softlinks, but only if they are on a upper (parent) level - and that's exactly the issue I mean.
But if you insist we close this bug again....and imagine there is no problem.
*** Bug 85066 has been marked as a duplicate of this bug. ***
So it might be wise to dereference the links
(as it is done by 'ls -L')
generally not a good idea to CC random developers, especially when they are already on the alias that the bug is assigned to ;)
*** Bug 117363 has been marked as a duplicate of this bug. ***
Pardon, I did't realize that.
Created attachment 76091 [details, diff]
After messing round some hours in the portage code I got this.
For me it works fine for the described scenario.
Can you please review my patch? I'm not a coder.
Thank you very much - hope it is of use
thomas, can you fire this off to email@example.com ml and shop it around for some feedback?
Bugs is high traffic with only portage devs on the alias, I'd like to see this patch tested by devs/users a bit for feedback.
*** Bug 134215 has been marked as a duplicate of this bug. ***
*** Bug 137644 has been marked as a duplicate of this bug. ***
*** Bug 137767 has been marked as a duplicate of this bug. ***
*** Bug 138239 has been marked as a duplicate of this bug. ***
*** Bug 145063 has been marked as a duplicate of this bug. ***
Looking at the proposed patch, it looks pretty CPU intensive for packages with thousands of files. Wouldn't a better approach (though more intrusive) be to only store real paths and then also use real paths when searching for the duplicates? This would only have a O(n) complexity compared to the O(n*n) of the suggested patch.
Thank you for this concept. Sounds great - feel free to get in touch with the developers and integrate it.
Just a short note to my patch: It checks all the others package's files (n) only for the files not matching the recorded file (m).
Conclusion: So it won't be (n*n) - it is (n*m) where m is the count of files moved from the original path and linked back. [But yes, it will be n==m if all the files have moved]
Anyway, I prefer your proposed optimal solution, as I like the idea of recording [and comparing] only the absolute path and not any symlinks. As this needs only a change in the recording, it seems compatible with earlier and later versions of portage. Go for it...
(In reply to comment #18)
> Go for it...
Ahem, ok, maybe I'll give it a try one of this days, though my experience with python (and portage) programming is practically non existent. But there must be a first time for everything, just don't hold your breath waiting for the patch :)
(In reply to comment #17)
> Looking at the proposed patch, it looks pretty CPU intensive for packages with
> thousands of files. Wouldn't a better approach (though more intrusive) be to
> only store real paths and then also use real paths when searching for the
It's not necessary to store all the real paths in order to improve performance. We can resolve all the real paths on the fly and cache them for the duration of the collision-protect phase of he merge. That should be more robust and less intrusive that storing the real paths.
(In reply to comment #20)
> It's not necessary to store all the real paths in order to improve performance.
Yes, good point. This will eliminate the need for a `FEATURES=-collision-protect emerge -e world` to update all the paths in the database.
Created attachment 96534 [details, diff]
avoid unnecessary collisions via device and i-node numbers
This is fixed in svn r4431 (patch applies against 2.1.1). The concept is a identical to Thomas's patch except that device and inode numbers are cached for performance reasons.
Great! I love to see this getting fixed. Thank you.
*** Bug 147711 has been marked as a duplicate of this bug. ***
This has been released in 2.1.2_pre1.
Haven't followed this bug closely, bu I really hope you don't follow absolute symlinks as that causes a very nasty performance hit (and was one of the reasons why symlink checking was disabled in the first place).
(In reply to comment #26)
> Haven't followed this bug closely, bu I really hope you don't follow absolute
> symlinks as that causes a very nasty performance hit (and was one of the
> reasons why symlink checking was disabled in the first place).
The performance hit of the chosen implementation is negligible. It uses st_dev and st_ino pairs as unique file identifiers in the dblink.isowner() method. The st_dev and st_ino pair is exactly what os.path.samefile() uses to check if two files are the same (neglecting path). The st_dev and st_ino pairs (from stat results) are cached inside of the dblink class so that the stat calls are reduced to a minimum (and us a set for fast lookup).
*** Bug 148476 has been marked as a duplicate of this bug. ***
*** Bug 148512 has been marked as a duplicate of this bug. ***
*** Bug 150574 has been marked as a duplicate of this bug. ***
*** Bug 172341 has been marked as a duplicate of this bug. ***