P=mpfrc++-20120622 SRC_URI="mirror://github/jauhien/sources/${P//+/%2B}.tar.gz" The pluses in URI need to be urlencoded for github to work correctly. Sadly, this results in PM saving the file as mpfrc%2B%2B..., so -> ${P}.tar.gz is necessary. Considering that URL encoding is quite clear and for most cases %xx and expanded forms are treated exactly the same, it may be a good idea if PM actually urldecoded filenames generated from URI.
I think we first need to define what characters should be allowed in a filename, otherwise decoding may lead to undesired results. Certainly null and slash are forbidden, also whitespace (which is used as separator in SRC_URI itself). Non-ASCII and control characters are problematic too, and some additional restrictions may exist on Prefix systems.
(In reply to comment #1) > I think we first need to define what characters should be allowed in a > filename, otherwise decoding may lead to undesired results. > > Certainly null and slash are forbidden, also whitespace (which is used as > separator in SRC_URI itself). Non-ASCII and control characters are > problematic too, and some additional restrictions may exist on Prefix > systems. We should probably start off Portable Filename Character Set [1] but that's a bit shorter than what packages actually install. There's no point in restricting download filenames beyond installed filenames which we don't standarize, I think. [1]:http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_276
$ echo $(qlist -a | grep -o . | sort -u) _ - , : ! / . ~ ' ( ) [ ] { } @ $ & % + = 0 1 2 3 4 5 6 7 8 9 a A á b B c C d D e E é f F g G ğ h H i I í ı İ j J k K l L m M n N o O ö ő p P q Q r R s S t T u U ú ü Ü v V w W x X y Y z Z That's what characters are installed on my system. '/' we shall skip here. Some of them were installed by games, {} is used by firefox, = by ca certs, ! by themes, ~ by libtool, ',' by i18n.
(In reply to comment #0) i wonder why the fetch step didn't encode the + for you. is that a bug in wget/curl/whatever or our usage of it ? (In reply to comment #2) yeah, we've adjusted portage in the past to handle unicode characters in installed filenames, so this is already happening in practice. having the two standards be different doesn't make much sense.
(In reply to comment #4) > (In reply to comment #0) > > i wonder why the fetch step didn't encode the + for you. is that a bug in > wget/curl/whatever or our usage of it ? We're building the filename from URI, and enforcing it to wget/curl/whatever.
(In reply to comment #5) i guess changing that algorithm is too late now (In reply to comment #1) since we need to urldecode before passing the result to the shell, i don't think we need to explicitly mention slashes. it'll be the same as if they specified the slash in the path itself. that just leaves whitespace as the only thing we truly need to block (whatever bash splits on by default, so ' \t\n'. the rest we can leave up to portability land for now. i don't think we need PMS to restrict the filename space for systems where people might never care. i.e. if i wrote my own ebuild that uses : in the filename for my own setups, i could care less that it doesn't work on OS X.
Achievable using '->'. Interest lost.it