Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 423893

Summary: [Future EAPI] Correctly deduce filenames from URLs with urlencode
Product: Gentoo Hosted Projects Reporter: Michał Górny <mgorny>
Component: PMS/EAPIAssignee: PMS/EAPI <pms>
Status: RESOLVED WONTFIX    
Severity: enhancement CC: esigra
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 174380    

Description Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2012-06-27 20:27:11 UTC
P=mpfrc++-20120622

SRC_URI="mirror://github/jauhien/sources/${P//+/%2B}.tar.gz"

The pluses in URI need to be urlencoded for github to work correctly. Sadly, this results in PM saving the file as mpfrc%2B%2B..., so -> ${P}.tar.gz is necessary.

Considering that URL encoding is quite clear and for most cases %xx and expanded forms are treated exactly the same, it may be a good idea if PM actually urldecoded filenames generated from URI.
Comment 1 Ulrich Müller gentoo-dev 2012-06-28 07:46:32 UTC
I think we first need to define what characters should be allowed in a filename, otherwise decoding may lead to undesired results.

Certainly null and slash are forbidden, also whitespace (which is used as separator in SRC_URI itself). Non-ASCII and control characters are problematic too, and some additional restrictions may exist on Prefix systems.
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2012-06-28 08:05:28 UTC
(In reply to comment #1)
> I think we first need to define what characters should be allowed in a
> filename, otherwise decoding may lead to undesired results.
> 
> Certainly null and slash are forbidden, also whitespace (which is used as
> separator in SRC_URI itself). Non-ASCII and control characters are
> problematic too, and some additional restrictions may exist on Prefix
> systems.

We should probably start off Portable Filename Character Set [1] but that's a bit shorter than what packages actually install. There's no point in restricting download filenames beyond installed filenames which we don't standarize, I think.

[1]:http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_276
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2012-06-28 08:20:08 UTC
$ echo $(qlist -a | grep -o . | sort -u)
_ - , : ! / . ~ ' ( ) [ ] { } @ $ & % + = 0 1 2 3 4 5 6 7 8 9 a A á b B c C d D e E é f F g G ğ h H i I í ı İ j J k K l L m M n N o O ö ő p P q Q r R s S t T u U ú ü Ü v V w W x X y Y z Z

That's what characters are installed on my system. '/' we shall skip here. Some of them were installed by games, {} is used by firefox, = by ca certs, ! by themes, ~ by libtool, ',' by i18n.
Comment 4 SpanKY gentoo-dev 2012-06-28 16:04:31 UTC
(In reply to comment #0)

i wonder why the fetch step didn't encode the + for you.  is that a bug in wget/curl/whatever or our usage of it ?

(In reply to comment #2)

yeah, we've adjusted portage in the past to handle unicode characters in installed filenames, so this is already happening in practice.  having the two standards be different doesn't make much sense.
Comment 5 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2012-06-28 18:10:24 UTC
(In reply to comment #4)
> (In reply to comment #0)
> 
> i wonder why the fetch step didn't encode the + for you.  is that a bug in
> wget/curl/whatever or our usage of it ?

We're building the filename from URI, and enforcing it to wget/curl/whatever.
Comment 6 SpanKY gentoo-dev 2012-06-29 04:28:59 UTC
(In reply to comment #5)

i guess changing that algorithm is too late now

(In reply to comment #1)

since we need to urldecode before passing the result to the shell, i don't think we need to explicitly mention slashes.  it'll be the same as if they specified the slash in the path itself.  that just leaves whitespace as the only thing we truly need to block (whatever bash splits on by default, so ' \t\n'.  the rest we can leave up to portability land for now.  i don't think we need PMS to restrict the filename space for systems where people might never care.  i.e. if i wrote my own ebuild that uses : in the filename for my own setups, i could care less that it doesn't work on OS X.
Comment 7 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-09-06 22:13:52 UTC
Achievable using '->'. Interest lost.it