Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 14983

Summary: CONTENTS use spaces as delimiters
Product: Portage Development Reporter: Arkadi Shishlov <arkadi>
Component: CoreAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal CC: ciaran.mccreesh, gentoo-bugzilla, tristan, vhata-gentoo
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
See Also: https://bugs.gentoo.org/show_bug.cgi?id=671622
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 193766, 198097    
Attachments: double-spaces-fix.patch
Patch to provide proper CONTENTS file parsing
patch to .50_pre10 to use tabs for the delimiter, with a bit of display cleanup.
update_vardb.py
contents_demo-0.ebuild
.50_pre10 tab based CONTENTS

Description Arkadi Shishlov 2003-02-02 12:12:50 UTC
quickpkg script cannot add a file with whitespaces in pathname to package. It
use cut -f 2 -d " ". So trying to quickpkg winex results in:
usr/lib/winex/.data/fake_windows/
tar: /usr/lib/winex/.data/fake_windows/Program: Cannot stat: No such file or
directory
tar: /usr/lib/winex/.data/fake_windows/Program: Cannot stat: No such file or
directory
Package is built, but not all files inside. portage version 2.0.46-r9.

Reproducible: Always
Steps to Reproduce:
1. emerge winex
2. quickpkg /var/db/pkg../winex..
Comment 1 SpanKY gentoo-dev 2003-02-03 02:28:39 UTC
nick: what if we change the delimiter to a tab ? i couldnt imagine a package using a directory/filename with a tab in it ...
Comment 2 Wout Mertens (RETIRED) gentoo-dev 2003-02-06 08:59:39 UTC
I wondered about how to solve that problem as well. The only surefire way is delimiting with ", 
since that is not allowed in files, but \t practically doesn't happen either, and it allows most tools 
to just work. 
 
This will still cause the tools which assume general whitespace delimitation to fail, but those 
can be fixed. 
 
Good idea, let's do it :) 
 
Comment 3 Andrew Cooks (RETIRED) gentoo-dev 2003-11-29 14:01:42 UTC
This bug has been inactive for more than 180 days. 

winex ebuild should probably be removed - it died long ago anyway. I'm not sure where else to look to try to reproduce this bug.

Can somebody post an update please?
Comment 4 SpanKY gentoo-dev 2003-11-29 16:59:34 UTC
yes, most ebuilds are 'fixed', but the bug still exists
Comment 5 Marius Mauch (RETIRED) gentoo-dev 2004-02-07 19:12:22 UTC
*** Bug 40744 has been marked as a duplicate of this bug. ***
Comment 6 TGL 2004-02-07 19:36:21 UTC
Created attachment 25157 [details, diff]
double-spaces-fix.patch

Seems that this bug was already fixed for simple spaces in filenames. The
duplicate bug above was about double spaces: here is a minimalistic fix for
this issue. I know that was a very minor issue tho, but it's bugday and that's
what the "choose a random bug" algorithm returned :/

If a cleaner fix is needed, i can think of using backslashs to quote white
chars in filenames written to CONTENTS, and then to split the string with
shlex.split() instead of string.split(). Or also to split only the fields from
the end of the line, and leave the filenames untouched (we know how many fields
there should be by the type of contents entry).
Comment 7 Jonathan Hitchcock 2004-02-08 07:00:44 UTC
Created attachment 25188 [details, diff]
Patch to provide proper CONTENTS file parsing

This is all... Well... Silly.

When I look at the CONTENTS file, I can see which parts are the filenames, and
which parts are the MD5 sums.  It's a very rigid structure, and is very easy to
parse.	The code that split on spaces should never have been written, frankly
;-)

If the line begins with 'dir', 'dev' or 'fif', it is very easy to parse:
everything after the type is the filename.  (On a side note, I've never seen a
package use 'dev' or 'fif'?)
If the line begins with 'obj', then the last two space-separated fields are the
md5 and mtime, and everything after the first space, and before the second last
space, is the filename.
This "everything" can contain spaces, commas, tabs, whatever you want.	It
matters not.

Using any separator, and relying on it, will cause problems one day, if a file
happens to want that separator in its filename.  The only way around this is to
use escape characters, which complexify the code unnecessarily.  Since we have
a fixed structure of the line already, we should use it.

Attached is a patch which uses regular expressions to do this easily.
(It's a patch to portage.py,v 1.385 2004/01/27 14:37:53)

It's difficult to test a part of portage, but my script that uses the same code
works ;-)

One thing that is in portage.py.orig that is not handled in my patch is some
odd code that looks like:

x=len(mydat)-1
if (x >= 13) and (mydat[-1][-1]==')'): # Old/Broken symlink entry
   mydat = mydat[:-10]+[mydat[-10:][ST_MTIME][:-1]]
   writemsg("FIXED SYMLINK LINE: %s\n" % mydat, 1)
   x=len(mydat)-1

It appears on lines 5035 to 5038 of portage.py, and I can't work it out, nor
can I find anything about it in bugs.gentoo.org.  I don't handle this
"Old/Broken symlink entry", but I'm not sure how to?
Comment 8 Brian Harring (RETIRED) gentoo-dev 2004-06-20 03:54:17 UTC
Created attachment 33618 [details, diff]
patch to .50_pre10 to use tabs for the delimiter, with a bit of display cleanup.

Note that the old symlink fix has been moved out of portage; this is handled by
an external script that needs to be added to the portage ebuild post_pkginst
function.
Comment 9 Brian Harring (RETIRED) gentoo-dev 2004-06-20 04:08:30 UTC
Created attachment 33619 [details]
update_vardb.py

This is a simple script that should be updated/included in portage versions;
it's intended use is for performing the necessary transformations on
/var/db/pkg/* as needed.
In this case, it is *only* working on CONTENTS, converting existing files to
tab based delimitors, also performing the old symlink fix for content entries.

Moving this out of portage, and into this file simplifies portages getcontents
function- main benefit is that this script can be added to pkg_postinst, and
perform the updates *once* when updating portage versions.

Either way, been tested, this file transforming /var/db/pkg/*/*/CONTENTS (note,
it's backwards compatable) and the patch posted above address and should fix
the issue.

This ought to be fixed- I'm attaching an ebuild that portage is still screwing
up on, which is fixed via these updates.
Comment 10 Brian Harring (RETIRED) gentoo-dev 2004-06-20 04:09:58 UTC
Created attachment 33620 [details]
contents_demo-0.ebuild

Example ebuild that portage is still exhibiting broken behaviour- specifically,
the file is merged, but portage is unable to remove it due to space issues.
Comment 11 Brian Harring (RETIRED) gentoo-dev 2004-06-20 04:12:15 UTC
Err... one note; for update_vardb.py, it expect one arguement- the vardb base directory work within.
so,  update_vardb.py /var/db/pkg is the appropriate way to execute it.
Comment 12 Brian Harring (RETIRED) gentoo-dev 2004-06-20 19:08:32 UTC
Created attachment 33723 [details, diff]
.50_pre10 tab based CONTENTS

Minor typo in the original patch, corrected now (inverted mtime/md5 positions)
Comment 13 Alec Warner (RETIRED) archtester gentoo-dev Security 2005-04-25 21:29:09 UTC
*bump*, Did this ever get commited? *notes the numerous patches by ferringb*
Comment 14 Simon Stelling (RETIRED) gentoo-dev 2006-09-11 07:42:42 UTC
*** Bug 147154 has been marked as a duplicate of this bug. ***
Comment 15 Zac Medico gentoo-dev 2007-11-11 20:31:52 UTC
In portage-2.1.3.19 the CONTENTS parser is able to handle practically anything except newlines in filenames. It seems like any CONTENTS format changes are beyond the scope of this bug, so we can consider this fixed.
Comment 16 Zac Medico gentoo-dev 2023-12-10 01:19:49 UTC
It was added in this commit:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=1486f1b6e33d7b29d39660fdd873fac796944430

commit 1486f1b6e33d7b29d39660fdd873fac796944430
Author: Zac Medico <zmedico@gentoo.org>
Date:   2007-10-29 05:22:39 +0000

    Rewrite the dblink.getcontents() code to use str.split(" ")
    for splitting CONTENTS lines so that even file paths that
    end with spaces can be handled. This patch makes the fix for
    bug #196836#c6 more complete. Some code for parsing old
    malformed symlink entries has been removed sinces it's
    probably not useful or worth maintaining anymore.
    
    svn path=/main/trunk/; revision=8337

Then it was changed to use a regex in this commit:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=7295411a389c89418b460c059e2a222475dd1ec4

commit 7295411a389c89418b460c059e2a222475dd1ec4
Author: Zac Medico <zmedico@gentoo.org>
Date:   2010-08-06 03:48:26 -0700

    Use a regular expression to simplify dblink.getcontents().