Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 465180

Summary: python-utils-r1.eclass - python_optimize: correctly get a nice path without "//"
Product: Gentoo Linux Reporter: Greg Turner <gmturner007>
Component: EclassesAssignee: Python Gentoo Team <python>
Status: RESOLVED TEST-REQUEST    
Severity: normal CC: esigra
Priority: Normal Keywords: PATCH
Version: unspecified   
Hardware: All   
OS: All   
See Also: https://bugs.gentoo.org/show_bug.cgi?id=465772
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Correctly remove duplicate slashes in path
updated python_optimize patch

Description Greg Turner 2013-04-09 00:28:07 UTC
In python-utils-r1 we attempt to strip leading slashes but there is a bash thinko.  To see why, consider that if 

  foo=XXXyz

then

  ${foo##X} is "XXyz", not "yz"

To achieve the intended result in this way, we would need to use extended shell globbing.

If we don't want to mess with shopt's, our best bet is probably something like:

  while [[ ${foo} == X* ]] ; do
    foo=${foo#X}
  done

as implemented, we achieve the opposite of the intended effect, reliably creating a path with two initial slashes, causing trouble on cygwin and perhaps other platforms too clever for their own good.

patch to be enclosed

Reproducible: Always
Comment 1 Greg Turner 2013-04-09 00:29:45 UTC
(In reply to comment #0)
> In python-utils-r1

For clarity, this is in python-utils-r1's python_optimize function.  The same thinko might well exist in 1000 other places but for now this is the one I'm after :)
Comment 2 Greg Turner 2013-04-09 00:31:20 UTC
Created attachment 344904 [details, diff]
Correctly remove duplicate slashes in path
Comment 3 Greg Turner 2013-04-13 05:09:45 UTC
(In reply to comment #0)
> reliably creating a path with two initial slashes

This may or may not have been a thinko.  I'm not sure, anymore, and I want to do some experiments to figure it out.

The enclosed patch is correct and represents a more complete solution to the problem of duplicate leading slashes than what's in the eclass now.  But maybe there is no actual behavioral problem... just a bit of inelegance... 

I'm just not sure.  But after realizing I was mistaken in some 'causal attributions' that led me to the above statement, I wanted to clarify before someone wastes half a day scratching their head over this.

I'll follow up here with either a confirmation or a disconfirmation of the need (validity is not in question, I don't think) for the above patch soon.
Comment 4 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-04-13 06:04:31 UTC
Hm, you seem to be correct...

$ foo=///bash
$ echo ${foo##/}
//bash

However, I'd rather avoid putting a whole 'while' loop there.

The general idea of that code is that we don't know if ${D} is going to end with a slash or not. If it doesn't, we want to prepend the slash. If it did and we prepend, we'll get a second one. As far as this part works, I don't know if there's anything really worth fixing there.

If someone passes path with double slashes, that's likely to be fixed in the ebuild, not in the eclass.
Comment 5 Greg Turner 2013-04-13 11:35:10 UTC
(In reply to comment #4)
> Hm, you seem to be correct...
> 
> $ foo=///bash
> $ echo ${foo##/}
> //bash
> 
> However, I'd rather avoid putting a whole 'while' loop there.

Sux, dunnit?  How about:

  [[ ${instpath} =~ ^/*(.*)$ ]] && instpath="/${BASH_REMATCH[1]}"

which does it at the expense of some clarity and downlevel compatibility?

> The general idea of that code is that we don't know if ${D} is going to end
> with a slash or not. If it doesn't, we want to prepend the slash. If it did
> and we prepend, we'll get a second one. As far as this part works, I don't
> know if there's anything really worth fixing there.
> 
> If someone passes path with double slashes, that's likely to be fixed in the
> ebuild, not in the eclass.

Actually we do know whether or not ${D} ends in a slash: see: http://git.overlays.gentoo.org/gitweb/?p=proj/pms.git;a=blob;f=ebuild-env-vars.tex;h=8969dd8c8a80f6b06da192a1598f33338694b99c;hb=HEAD#l138

In short, it always ends with a slash.

This is a common source of trouble for cygwin, as code like like x=${D}/foo/bar results in i.e., /var/tmp/portage/category/package/image//foo/bar which then occasionally ends up getting stripped down to "//foo/bar", which, in cygwin, refers to the "bar" SMB share on the "foo" host.  In order to maximize confusion and mistakes, cygwin will interpret any number of preceding slashes greater than or equal to three as equivalent to a single slash :)

Thinking this through, explicitly: if no arguments are provided, then we take sys.path, prepend ${D} to each component.  This briefly creates a path like /var/tmp/portage/category/package/image//usr/lib64/python2.7.

But, next, precisely ${D} is stripped by instpath=${d#${D}}, leaving just the raw paths from sys.path, which look like: '/usr/lib64/python2.7'.  Next we run instpath=/${instpath##/} which is a noop and does indeed result in a beautiful path.

If the ebuild invokes:

  python_optimize "${D}/foo/bar"

(a very common, understandable mistake ebuilds make when they mean "${D}foo/bar" -- indeed, more-or-less the very same mistake is made above by python_optimize), then, once again, we get nice pretty /foo/bar paths by the same process.

If the user correctly provides the argument "${D}foo/bar", then instpath=${d#${D}} results in the relative looking path "foo/bar" but the next statement will fix it.

So, in short, the code works, although it probably doesn't work how it thinks it works :P

One silly corner case where the code wouldn't work would be if, for some weird reason an ebuild wanted to put some .pyo files into ${ROOT} and so ran

  python_optimize "${D%/}"

But that is a pretty strange thing to do and probably not worth worrying about.

The other case, where it's going to fail on cygwin, is if the user put three or more slashes after "image".  Which, yeah, I kinda have to agree, starts to veer off into GIGO territory.  But, you'd be surprised how often stuff like that comes up.

So, I guess, this could be closed as a non-bug.  Alternatively, we could try to clean up this code a bit so that it does what it appears to do.  How about something like fixing ${D}${f} to ${D%/}${f} in the sys.path loop, and changing the beginning of the "for d" loop to use:

  instpath=${d#${D%/}}
  [[ ${instpath} =~ ^/*(.*)$ ]] && instpath="/${BASH_REMATCH[1]}"

It'd be rock solid, I think, if a bit hard on the eyes.  I'll re-spin a patch for that below.
Comment 6 Greg Turner 2013-04-13 11:50:21 UTC
Created attachment 345468 [details, diff]
updated python_optimize patch

Implement the plan described in the preceding comment as a patch
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-04-13 12:01:27 UTC
I've just wrote a comment and this stupid bugzilla ate it due to collision. Utterly stupid thing.

Long story short, PMS was written by someone who is not mentally sane and there's no reason for ${D} to end with a slash. It just makes it use totally counter-intuitive. But it's too late to fix this now.

That said, eclass is not the place to fix it. If user specifies invalid path, the invalid path shall be used. If we fix, it will eventually fail somewhere else and it's better to fail early than in some obscure and unexpected way.

I'll probably remove the block completely and rewrite all the code to assume that ${D} ends with a slash.
Comment 8 Greg Turner 2013-04-13 12:29:25 UTC
(In reply to comment #7)
> Long story short, PMS was written by someone who is not mentally sane and
> there's no reason for ${D} to end with a slash. It just makes it use totally
> counter-intuitive. But it's too late to fix this now.

Couldn't agree more.  This tiny quirk has led to huge amounts of pain and suffering in my cygwin port and probably elsewhere.
Comment 9 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-10-19 08:58:31 UTC
Please re-check the current code of the eclasses and feel free to provide further patches if something's broken again.