Summary: | unpacking a tar with a "./" entry into $WORKDIR changes dir timestamp | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Hugo Mildenberger <Hugo.Mildenberger> |
Component: | Core - Ebuild Support | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | Hugo.Mildenberger, kanelxake, pms |
Priority: | High | Keywords: | InVCS |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00021.html | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 431026 | ||
Attachments: |
Output of emerge --info =sys-devel/portage-2.2_rc67
original lzma archive fix WORDIR ownership after unpack fix WORDIR ownership after unpack |
Created attachment 242385 [details]
Output of emerge --info =sys-devel/portage-2.2_rc67
Well no, it isn't portage, epatch or patch. "tar -xof" called from inside of _unpack_tar changes mtime of parent directory, but only when confronted with the decompressed output of vapier's "gdb-7.1-patches-1.tar.lzma" archive. Then tar does not only modifies mtime, but in addition even changes the ownership of the parent directory. I'm using app-arch/tar-1.23-r4. tar was called from _unpack_tar() residing in /usr/lib/portage/bin/ebuild.sh: _unpack_tar() { 358 if [ "${y}" == "tar" ]; then 359 $1 -dc "$srcdir$x" | tar xof - 360 assert "$myfail" 361 else 362 $1 -dc "${srcdir}${x}" > ${x%.*} || die "$myfail" 363 fi 364 } reproduce.sh: #!/bin/sh mkdir -p tmp/work pushd tmp/work lzma -t /usr/portage/distfiles/gdb-7.1-patches-1.tar.lzma \ && echo "lzma test ok" || echo "lzma test error $?" lzma -dc /usr/portage/distfiles/gdb-7.1-patches-1.tar.lzma \ > gdb-7.1-patches-1.tar \ && echo "lzma decompress ok" || echo "lzma error $?" tar -tvf gdb-7.1-patches-1.tar \ && echo "tar test ok" || echo "tar test failed: $?" touch ../work && ls -ld ../work && tar -xof gdb-7.1-patches-1.tar \ && ls -ld ../work echo echo "if you run this as as root, tar -xf (without -o) even " echo "changes ownership of .. and all newly created " echo "created files therein to 8282:users" echo touch ../work && ls -ld ../work && tar -xf gdb-7.1-patches-1.tar \ && ls -ld ../work ls -l ../work popd If run the above script as a normal user, then only the timestamp of the work directory will change. But if you run this as as root, tar -xf (without -o) even changes ownership of work and all newly created created files therein to 8282:users. I doubt that this is normal behaviour. At least all this doesn't happen running tar -xjf /usr/bin/distfiles/gdb-7.1.tar.bz2. $ sudo sh reproduce.sh /home/hm/tmp/work /home/hm lzma test ok lzma decompress ok drwxr-xr-x vapier/users 0 2010-03-19 02:51 ./ drwxr-xr-x vapier/users 0 2010-03-19 02:51 ./extra/ -rw-r--r-- vapier/users 23113 2010-03-19 02:51 ./extra/gdbinit.sample drwxr-xr-x vapier/users 0 2010-03-19 02:51 ./patch/ -rw-r--r-- vapier/users 773 2010-03-19 02:51 ./patch/05_all_readline-headers.patch -rw-r--r-- vapier/users 1815 2010-03-19 02:51 ./patch/80_all_gdb-6.5-dwarf-stack-overflow.patch -rw-r--r-- vapier/users 959 2010-03-19 02:51 ./patch/20_all_gdb-tdep-opcode-include-workaround.patch -rw-r--r-- vapier/users 2734 2010-03-19 02:51 ./README.Gentoo.patches tar test ok drwxr-xr-x 2 root root 4096 Aug 13 13:52 ../work drwxr-xr-x 4 root root 4096 Mar 19 02:51 ../work if you run this as as root, tar -xf (without -o) even changes ownership of .. and all newly created created files therein to 8282:users drwxr-xr-x 4 root root 4096 Aug 13 13:52 ../work drwxr-xr-x 4 8282 users 4096 Mar 19 02:51 ../work total 52 -rw-r--r-- 1 8282 users 2734 Mar 19 02:51 README.Gentoo.patches drwxr-xr-x 2 8282 users 4096 Mar 19 02:51 extra -rw-r--r-- 1 root root 40960 Aug 13 13:52 gdb-7.1-patches-1.tar drwxr-xr-x 2 8282 users 4096 Mar 19 02:51 patch /home/hm $ sha256sum /usr/portage/distfiles/gdb-7.1-patches-1.tar.lzma d2efe1ee66110e4e0c55bbe4365380bdb6e159c45ea849a1e329ac293b4e7e3c /usr/portage/distfiles/gdb-7.1-patches-1.tar.lzma I'm attaching gdb-7.1-patches-1.tar.lzma. Somebody please could reassign this bug accordingly. Created attachment 242759 [details]
original lzma archive
this isnt anything special to tar or any tarball. any tarball that was made by doing `tar cf foo.tar .` will exhibit this issue. not a bug in tar, or the tarball in question, nor are the changed user/group behavior unexpected. git has this, gdb has this, gnuconfig has this, jpeg has this, and who knows how much more. i thought there had been a similar bug in the past, but cant seem to find it. i'll leave it to the portage devs to decide whether to codify expected behavior in PMS or just change things or say "WORKSFORME". (In reply to comment #4) Duh, that dot entry is a subtile thing. Running tar --exclude='.' --no-wildcards or similar options does not work, because as soon as tar eventually hits the '.' directory, all files belonging to that directory are also excluded, which is the entire archive in this case. That is actually documented behaviour: http://www.gnu.org/software/tar/manual/tar.html#SEC111 Periods (‘.’) or forward slashes (‘/’) are not considered special for wildcard matches. However, if a pattern completely matches a directory prefix of a matched string, then it matches the full matched string: thus, excluding a directory also excludes all the files beneath it. So either a tar switch addressing this is needed, or portage could touch $WORKDIR after unpacking any archive, and may check ownership too. Besides the annoyances stemming from a time warped workdir, I really dont't like that unpacking a tar archive containing a dot entry changing e.g. root's home directory ownership. Color me naive, but I doubt that most people would expect this when unpacking a tar archive. > git has this, gdb has this, gnuconfig has this, jpeg has this, and who knows > how much more. The current gdb-7.1 archive at least does not contain a dot entry in the archive's root directory. Certainly you can't stop archive maintainers from explicitly including the dot directory entry. However, if they do, the ".." entry will still be stripped from pathnames in the resulting archive, for good reasons. if you want to ask upstream tar to extend support, then go for it. personally, i dont have a problem with this behavior. as for portage behavior, that still comes back to the PMS. In response to my question on bug-tar mailing list, Sergey Poznyakoff proposed the following solution: tar -xf some-archive.tar --no-wildcards --no-recursion --exclude '.' Asked if this could be made the default behaviour of tar one day, he said: No, it would create a major backward incompatibility. What I can do is to create a shortcut option, which would stand for these three, e.g. --exclude-dot. OK, but the question of expected behavior wrt PMS is still in the air (In reply to comment #8) > OK, but the question of expected behavior wrt PMS is still in the air $ ping pms-bugs@gentoo.org ... I'd be inclined to say that you can't rely upon any particular behaviour here. (In reply to comment #10) > I'd be inclined to say that you can't rely upon any particular behaviour here. +1. PMS doesn't say anything about it. Do we need to specify that it's not specified? ;-) Unfortunately, some people seem to think that "PMS says nothing about it" means "it's legal to rely upon whatever a particular version of Portage does on one particular machine", so we probably should... I have been hitted by this too, most common seen in ebuilds where there is need for a bootstrap tarball created by "emerge -b", since it also creates those "./" entries in the tarballs. (In reply to comment #11) > (In reply to comment #10) > > I'd be inclined to say that you can't rely upon any particular behaviour here. > > +1. PMS doesn't say anything about it. > Do we need to specify that it's not specified? ;-) > How intelligent. If you guys ever get your hands on an old DEC manual, you could instantaneously learn how a clean system architecture looks like. And obviously one needs to specify access rights for the build tree in a general way. That point is only very sloppily addressed on page 63 of http://distfiles.gentoo.org/distfiles/pms-3.pdf : unpack Unpacks one or more source archives, in order, into the current directory. After unpacking, must ensure that all filesystem objects inside the current working directory (but not the current working directory itself) have permissions a+r,u+w,go-w and that all directories under the current working directory additionally have permissions a+x. But permissions of $(WORKDIR) aren't specified at all. (In reply to comment #14) > How intelligent. If you guys ever get your hands on an old DEC manual, you > could instantaneously learn how a clean system architecture looks like. We don't have a clean system architecture to begin with. Everything we're doing is built on top of semi-POSIX-compliant tools, and POSIX is chock full of implementation defined behaviour. There's no point us trying to mandate a particular behaviour when 'tar' is free to do whatever it feels like. (In reply to comment #15) > We don't have a clean system architecture to begin with. Everything we're doing > is built on top of semi-POSIX-compliant tools, and POSIX is chock full of > implementation defined behaviour. There's no point us trying to mandate a > particular behaviour when 'tar' is free to do whatever it feels like. > Regarding tar, relief is already available (see Comment #7). But specifying a sound permission scheme is independent from tar doing things this or that way. Permissions of ${WORKDIR} is currently only implicitely defined by portage's implementation (0700, with owner portage or root). Turning this into an explicit standard would therefore fill one of the many gaps in the PMS. Tar is free to change its behaviour whenever it wants. And workdir's permissions are not implicitly defined. They're undefined. There's even no POSIX specification for tar at all, only for pax (which isn't really popular). Besides, there are other unpack utilities like unzip, and we have to live with their behaviour too. Therefore, I see only two possibilities: a) WORKDIR timestamp is unspecified. This is the current behaviour. b) The PM should update the WORKDIR's timestamp (and permissions?) after the unpack phase. This would probably require an EAPI bump. (In reply to comment #17) > Tar is free to change its behaviour whenever it wants. Much like PMS and her authors, I presume. > And workdir's permissions are not implicitly defined. They're undefined. This must be so that /etc does not feel the sole directory on the tree without permissions. But given that indeterminancy, exactly what particular problem was addressed by the exclusion "but not the current working directory itself" within the definition for unpack from PMS-3: unpack Unpacks one or more source archives, in order, into the current directory. After unpacking, must ensure that all filesystem objects inside the current working directory (but not the current working directory itself) have permissions a+r,u+w,go-w and that all directories under the current working directory additionally have permissions a+x. (In reply to comment #18) > There's even no POSIX specification for tar at all, only for pax (which isn't > really popular). Besides, there are other unpack utilities like unzip, and we > have to live with their behaviour too. That is not at all an argument against a specification. A specification defines a desired situation. How to cope with obstructionist is purely a technical question, which finally boils down to a maximum of one line of shell code. > Therefore, I see only two possibilities: > a) WORKDIR timestamp is unspecified. This is the current behaviour. To the contratry, portage actually relies on a correct timestamp > b) The PM should update the WORKDIR's timestamp (and permissions?) after the > unpack phase. This would probably require an EAPI bump. Why? Unix tools should avoid unexpected behaviour. 98% of the packets don't touch $WORKDIR itself. No interface is changed, but a bug corrected. (In reply to comment #19) > (In reply to comment #17) > > Tar is free to change its behaviour whenever it wants. > > Much like PMS and her authors, I presume. Nope. PMS documents what ebuilds can rely upon and what package managers must ensure. Since there's no defined behaviour here, and behaviour isn't consistent, ebuilds cannot rely upon any particular behaviour and the spec does not mandate any particular behaviour. If you want a certain behaviour to become mandatory, that's something for a new EAPI. (In reply to comment #19) > (In reply to comment #17) > > Tar is free to change its behaviour whenever it wants. > > Much like PMS and her authors, I presume. First rule if you want to convince someone: * Be snarky. All the time. This is fixed in git: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=7a54703a8b28326bd428327f68e726238eba02df *** Bug 318325 has been marked as a duplicate of this bug. *** (In reply to comment #22) > This is fixed in git: > > http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit; > h=7a54703a8b28326bd428327f68e726238eba02df Zac, a second problem not yet addressed by your patch is that the owner of the unpacked directory is changed along with the timestamp. Could you also add a chown to portage if the owner does not match? Best Created attachment 323348 [details, diff] fix WORDIR ownership after unpack (In reply to comment #24) > Zac, a second problem not yet addressed by your patch is that the owner of > the unpacked directory is changed along with the timestamp. Could you also > add a chown to portage if the owner does not match? That seems more like an EAPI extension, so I'd like to hear what the pms-bugs people have to say about it. The timestamp thing was a slightly different story, since it was interfering with portage's code which compares distfiles timestamps to the WORKDIR timestamp. Created attachment 323350 [details, diff]
fix WORDIR ownership after unpack
The previous patch was the wrong one.
(In reply to comment #26) > Created attachment 323350 [details, diff] [details, diff] > fix WORDIR ownership after unpack This is in git now: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=ea54077b59d2aec35add5c3f6779b6772f3127a5 This is fixed in 2.1.11.15 and 2.2.0_alpha126. |
After having implanted some traces in file residing below of $WORKDIR of sys-devel/gdb-7.1, I tried to recompile the package by ebuild gdb-7.1.ebuild compile. Instead of just compiling, portage said: >>> gdb-7.1.tar.bz2 has been updated; recreating WORKDIR... That is from /usr/lib/portage/bin/ebuild.sh: 687 for x in $A ; do 688 vecho ">>> Checking ${x}'s mtime..." 689 if [ "${PORTAGE_ACTUAL_DISTDIR:-${DISTDIR}}/${x}" -nt "${WORKDIR}" ]; then 690 vecho ">>> ${x} has been updated; recreating WORKDIR..." 691 newstuff="yes" 692 break 693 fi 694 done But gdb-7.1.tar.bz2 had not been updated: # ls -l /usr/portage/distfiles/gdb-7.1* -rw-rw-r-- 1 portage portage 9207 10. Aug 09:02 /usr/portage/distfiles/gdb-7.1-patches-1.tar.lzma -rw-rw-r-- 1 portage portage 17977195 10. Aug 09:02 /usr/portage/distfiles/gdb-7.1.tar.bz2 However, $WORKDIR had been warped into the past: # ls -ld /var/tmp/portage/sys-devel/gdb-7.1/ drwxrwxr-x 7 portage portage 4096 11. Aug 11:57 # ls -ld /var/tmp/portage/sys-devel/gdb-7.1/work drwx------ 5 portage portage 4096 19. Mär 02:51 ^------------ Looking for just who is responsible for what I'm classifying as an unfriendly act, I next found user vapier likely was involved in delivering the timestamp value: # tar --use-compress-program=/usr/bin/lzma \ -tvf /usr/portage/distfiles/gdb-7.1-patches-1.tar.lzma drwxr-xr-x vapier/users 0 2010-03-19 02:51 ./ drwxr-xr-x vapier/users 0 2010-03-19 02:51 ./extra/ -rw-r--r-- vapier/users 23113 2010-03-19 02:51 ./extra/gdbinit.sample drwxr-xr-x vapier/users 0 2010-03-19 02:51 ./patch/ -rw-r--r-- vapier/users 773 2010-03-19 02:51 ./patch/05_all_readline-headers.patch -rw-r--r-- vapier/users 1815 2010-03-19 02:51 ./patch/80_all_gdb-6.5-dwarf-stack-overflow.patch -rw-r--r-- vapier/users 959 2010-03-19 02:51 ./patch/20_all_gdb-tdep-opcode-include-workaround.patch -rw-r--r-- vapier/users 2734 2010-03-19 02:51 ./README.Gentoo.patches But neither the archive nor any patch contained therein address "work". So this should be a problem of eutils.eclass or sys-devel/patch-2.6.1 itself. Below is the relevant part from sys-devel/gdb-7.1.ebuild: 38 src_unpack() { 39 unpack ${A} 40 cd "${S}" 41 use vanilla || [[ -n ${PATCH_VER} ]] && EPATCH_SUFFIX="patch" epatch "${WORKDIR}"/patch 42 strip-linguas -u bfd/po opcodes/po 43 }