Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 550760

Summary: sys-apps/portage-2.2.20 - /usr/bin/ebuild - Missing error message when skipping files with special characters in manifest creation
Product: Portage Development Reporter: Michael Seifert <m.seifert>
Component: Core - Ebuild SupportAssignee: Portage team <dev-portage>
Status: CONFIRMED ---    
Severity: normal CC: kfm, ysottre
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
URL: https://www.gentoo.org/glep/glep-0031.html#suitable-characters-for-file-and-directory-names
See Also: https://bugs.gentoo.org/show_bug.cgi?id=411127
https://bugs.gentoo.org/show_bug.cgi?id=435934
Whiteboard:
Package list:
Runtime testing required: ---

Description Michael Seifert 2015-05-29 18:22:37 UTC
When calling "/usr/bin/ebuild xyz-1.0.ebuild manifest", files with non-ASCII characters are skipped as discussed in bugs #411127 and #435934. As I see it, these two tickets are feature requests for adding support for Unicode file names.

However, if a file name is not valid, /usr/bin/ebuild should notify the user about that, which it currently does not.

Reproducible: Always

Steps to Reproduce:
Create a manifest of an ebuild where one of the files in $FILESDIR contains an character such as '@'.
Actual Results:  
/usr/bin/ebuild terminates as expected, creating a Manifest which contains an entry for the ebuild, but not for the file in $FILESDIR.

Expected Results:  
/usr/bin/ebuild should exit with an error notification about the invalid file name.
Comment 1 Andrew Miller 2015-05-29 21:12:32 UTC
The "repoman manifest" command also doesn't tell the user that files with non-ASCII characters have been skipped.

I'm using sys-apps/portage-9999 built on May 27 (up to date with git).
Comment 2 kfm 2019-07-19 12:49:32 UTC
I just encountered this issue. One thing to keep in mind is that pathname components are arbitrary byte sequences, where any byte is legal except for 0x00 and 0x2f. No assumption can be made as to the encoding thereof. They could be UTF-8, ISO-8859-1, pure ASCII or even random bytes churned out by an RNG and would still be legal names. Consulting LC_CTYPE doesn't help either, because there is no guarantee that any name was previously written with the implied encoding.

I'm not suggesting that portage/repoman should not have any constraints. However, as far as encodings are concerned, it either needs to:

a) Dictate an encoding and rigidly enforce it (UTF-8, for example)
b) Not assume any encoding and treat the names as just byte sequences

Just not anything in-between.

P.S. In my case, the offending characters were (ASCII) parentheses, which are sometimes used in kernel patch names.