The documentation does not define URL strings. I suggest to add a note to SRC_URI and HOMEPAGE about the exact requirements of an URL string. Who is responsible for proper encoding? Must the developer recode this and will repoman check the URL encoding? Or is portage responsible for this? ...at least I could not find it. The question came up after reading https://bugs.gentoo.org/show_bug.cgi?id=597648 So in this ticket we are talking about URLs like these: SRC_URI="http://köln.de/l i n u x.tar.gz" Reproducible: Always
Sounds like something that PMS should cover.
(In reply to Mike Gilbert from comment #1) > Sounds like something that PMS should cover. URI syntax is defined by a whole bunch of RFCs (like RFC 3986), and I would say that these specifics are outside the scope of PMS. (In reply to Jonas Stein from comment #0) > So in this ticket we are talking about URLs like these: > SRC_URI="http://köln.de/l i n u x.tar.gz" Not a valid URL by <https://tools.ietf.org/html/rfc3986#appendix-A>: - "köln.de" is not a valid reg-name because "ö" is not a valid character. - "l i n u x.tar.gz" is not a valid segment because " " is not a valid pchar.
What ulm says. URIs are well-defined by RFCs. The only thing that needs to be done in PMS is restricting the allowed protocols, and it does that in 7.3 (at least for SRC_URI, dunno if we specifically need to restrict HOMEPAGE since it's required directly for the package to work). Additionally, 8.2 seems to have some URI definition suggestion: | A URI, in the form proto://host/path. Permitted in SRC_URI and HOMEPAGE. Not sure if that should be considered merely as a remark, or treated as a specific restriction of the URI syntax. Maybe we ought to drop that. Possibly adding few RFC references wouldn't hurt. RFC 3986 specifies the generic URI syntax. Since we're restricting it to a few specific protocols, we may also add RFC 2616 (sect. 3.2) for HTTP. I don't see any specific RFC on FTP URIs -- but possibly RFC 3986 is good enough for it.
RFC reference(s) seems reasonable to me.