Summary: | Portage should accept category / package names with non-ASCII characters | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Michał Górny <mgorny> |
Component: | Core - Ebuild Support | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED WONTFIX | ||
Severity: | enhancement | CC: | hasufell, m.seifert, pms |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=411127 https://bugs.gentoo.org/show_bug.cgi?id=563984 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Michał Górny
![]() ![]() ![]() ![]() What does the error look like? We can add a layout.conf setting, which allows you to configure this for the repository. (In reply to comment #1) > What does the error look like? !!! 'aęł' is not a valid package atom. > We can add a layout.conf setting, which allows you to configure this for the > repository. A layout.conf setting would be useful for repoman. Still, you should be liberal in what you accept. We're not Ciaranis to shoot at people for not using the official language. Hopefully this fixes all but the repoman file.name check: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=fdd7d8cfcfb3055ba755273b684ef4e02b99c14c By the way, I hoped this would also revert the non-version-ending enforcement for bug 174536. Portage not doing that would be the first step towards lifting the restriction. I did a fixup on the previous commit to include dbapi._category_re: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=0d5b0fbd79ba8b2e7dd5d2f2db7d69cad3e56766 (In reply to comment #4) > By the way, I hoped this would also revert the non-version-ending > enforcement for bug 174536. Portage not doing that would be the first step > towards lifting the restriction. Please file a separate bug for that, because the two things are only vaguely related. Binds filename validation to RepoConfig, so that eventually we'll be able to control it via a layout.conf setting: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=6d8d0c02457c2e94c759fe89db0bef196b78158a Portage should print fatal error when a repository contains 2 different directories (of categories or packages), whose names are equivalent in Unicode. E.g. b"\xc3\xb3" and b"o\xcc\x81" >>> import unicodedata >>> unicodedata.name(b"\xc3\xb3".decode()) 'LATIN SMALL LETTER O WITH ACUTE' >>> unicodedata.name(b"o\xcc\x81".decode()[0]) 'LATIN SMALL LETTER O' >>> unicodedata.name(b"o\xcc\x81".decode()[1]) 'COMBINING ACUTE ACCENT' >>> b"\xc3\xb3" == b"o\xcc\x81" False >>> b"\xc3\xb3".decode() == b"o\xcc\x81".decode() False >>> unicodedata.normalize("NFD", b"\xc3\xb3".decode()) == unicodedata.normalize("NFD", b"o\xcc\x81".decode()) True http://en.wikipedia.org/wiki/Unicode_equivalence http://en.wikipedia.org/wiki/Combining_character http://en.wikipedia.org/wiki/Precomposed_character When ebuild of package b"app-misc/a" from repository X contains b"DEPEND=app-misc/\xc3\xb3" and b"app-misc/\xc3\xb3" directory exists in repository Y and b"app-misc/o\xcc\x81" directory exists in repository Z, then ebuilds from both b"app-misc/\xc3\xb3" (from repository Y) and b"app-misc/o\xcc\x81" (from repository Z) directories should be able to satisfy this dependency. (X can be Y or X can be Z, but Y cannot be Z.) unless i missed something, this is just a "nice to have" since such characters are forbidden by PMS (In reply to Arfrever Frehtes Taifersar Arahesis from comment #8) > When ebuild of package b"app-misc/a" from repository X contains > b"DEPEND=app-misc/\xc3\xb3" and b"app-misc/\xc3\xb3" directory exists in > repository Y and b"app-misc/o\xcc\x81" directory exists in repository Z, > then ebuilds from both b"app-misc/\xc3\xb3" (from repository Y) and > b"app-misc/o\xcc\x81" (from repository Z) directories should be able to > satisfy this dependency. > > (X can be Y or X can be Z, but Y cannot be Z.) Yeah, that's the kind of problems that would result from allowing arbitrary chars in package names. Should app-misc/A, app-misc/А, and app-misc/Α map to the same package, too (that's latin, cyrillic, and greek A, respectively)? And how about app-misc/abcd, app-misc/dcba, and app-misc/abcd? (The last one is "abcd" with right-to-left directional override, i.e b"app-misc/\xe2\x80\xaeabcd\xe2\x80\xac".) Can Portage follow the spec, please? PMS is quite explicit about what characters are allowed in package names. Also GLEP 31 limits filenames to ASCII. (In reply to Michał Górny from comment #0) > Citing the robustness principle[1]: > > Be conservative in what you send, liberal in what you accept Nope, this might apply to user input, but certainly it doesn't apply to the tree. We aim for interoperability between different package managers, therefore PMs should be rather strict about what they accept as valid ebuilds. > !!! 'aęł' is not a valid package atom. Right, it isn't. (In reply to Ulrich Müller from comment #10) i don't think PMS is as explicit as you describe. example: A package name may contain any of the characters [A-Za-z0-9+_-]. It must not begin with a hyphen or a plus sign, and must not end in a hyphen followed by anything matching the version syntax described in section 3.2. that does not state the package name is limited to that regex. i.e. it doesn't say "may only contain" or otherwise say that other characters are forbidden. that might have been the intention, but it isn't what the spec says ;). (In reply to SpanKY from comment #11) PMS is written with an informed and well-disposed reader in mind. So sometimes the wording is not absolutely watertight, in order to keep the spec readable. > A package name may contain any of the characters [A-Za-z0-9+_-]. There cannot be any reasonable doubt about the intended meaning of this. *** Bug 563984 has been marked as a duplicate of this bug. *** Closing, as discussed in #gentoo-dev. |