Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 876193 - app-misc/ca-certificates creates files with special characters below /etc
Summary: app-misc/ca-certificates creates files with special characters below /etc
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-08 17:30 UTC by Jonas Stein
Modified: 2023-12-11 16:28 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonas Stein gentoo-dev 2022-10-08 17:30:04 UTC
files below /etc should not have UTF-8 filenames without a good reason

'NetLock_Arany_=Class_Gold=_Főtanúsítvány.pem' -> '../../../usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_Főtanúsítvány.crt'
is created by certdata2pem.py

Ideally certdata2pem.py generates filenames with printable characters only.

Reproducible: Always
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-10-08 17:32:07 UTC
>files below /etc should not have UTF-8 filenames without a good reason

Please cite the source of this rule/policy.
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2022-10-08 22:39:23 UTC
We dicussed it in #gentoo-dev and being unable to type a file in /etc seems reasonable enough as motivation.
Comment 3 Ulrich Müller gentoo-dev 2022-10-09 11:12:36 UTC
So, what characters should be allowed? Any ASCII except NUL and / (which includes control characters)? Or printable ASCII U+0021 to U+007e only (note that /\:*"?<>| may be problematic on some filesystems)?

Or only the POSIX Portable Filename Character Set as defined in https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282? That may be too limited since it excludes e.g. the plus sign.
Comment 4 Ulrich Müller gentoo-dev 2022-10-09 11:41:27 UTC
Another argument against using UTF-8 for filenames may be that even if you could type these characters on your keyboard, the name may still not match because it may be in a different normalization form (see https://unicode.org/reports/tr15/).

For example, the "á" from the example could be either "á" (NFC, U+00e1 LATIN SMALL LETTER A WITH ACUTE) or "á" (NFD, U+0061 LATIN SMALL LETTER A followed by U+0301 COMBINING ACUTE ACCENT). It happens to be the first one, but there's no direct way to distinguish it.

$ touch Főtanúsítvány
$ touch Főtanúsítvány
$ ls -1
Főtanúsítvány
Főtanúsítvány
$ ls -1 | hexdump -C
00000000  46 6f cc 8b 74 61 6e 75  cc 81 73 69 cc 81 74 76  |Fo..tanu..si..tv|
00000010  61 cc 81 6e 79 0a 46 c5  91 74 61 6e c3 ba 73 c3  |a..ny.F..tan..s.|
00000020  ad 74 76 c3 a1 6e 79 0a                           |.tv..ny.|
00000028

There is also the issue of confusables, e.g. A (U+0041 LATIN CAPITAL LETTER A), Α (U+0391 GREEK CAPITAL LETTER ALPHA), and А (U+0410 CYRILLIC CAPITAL LETTER A) which might even have a security impact.
Comment 5 Arfrever Frehtes Taifersar Arahesis 2022-10-09 15:55:12 UTC
There are known problems with non-UTF-8 filenames (e.g. bug #690480), but UTF-8 ASCII-outside filenames should work well.
I am against anglocentric assumption that only 26+26 letters can be used.

If you want to type filename, use tab completion, or 'ls' and copy+paste.
Comment 6 Arfrever Frehtes Taifersar Arahesis 2022-10-09 16:19:14 UTC
Portage could detect situation when 2 different installed filenames are identical after NFD normalization and print warning/error.
(Linux handles filenames as bytes, not Unicode characters. If there is no involvement of Linux-foreign filesystems or communication with other operating systems, the only problem for users is visual confusion.)


(New check obviously would not make "Főtanúsítvány" anyhow invalid.)
Comment 7 Mike Gilbert gentoo-dev 2022-10-09 17:42:21 UTC
Regarding ca-certificates:

Nobody actually types these file names, so that argument makes no sense to me.

I could see making that argument for config files that the sysadmin commonly has some need to edit. However, I haven't seen any examples of config files with foreign characters in the filename.

Regarding installed files in general:

I don't think it is practical to limit the character set to some subset of printable ASCII. That will just lead to conflicts with upstream developers, and we will probably end up patching things downstream. It seems rather pointless to do this for filenames that people rarely look at or type out anyway.

In the rare case that somebody actually needs to manipulate files with characters that they can't type, shells offer tab-completion and terminals offer copy/paste functions.
Comment 8 Ulrich Müller gentoo-dev 2022-10-11 18:43:07 UTC
Debian's policy is this: https://www.debian.org/doc/debian-policy/ch-files.html#file-names

In a nutshell, they require ASCII-only for binaries in PATH but UTF-8 elsewhere. I'd guess that their motivation is similar, i.e. names are restricted to ASCII if the user must type them.
Comment 9 Enrique Domínguez 2023-11-18 02:15:13 UTC
(In reply to Mike Gilbert from comment #7)
> Regarding ca-certificates:
> 
> Nobody actually types these file names, so that argument makes no sense to
> me.

ca-certicates package fails build with another locale different than utf8 one, see https://bugs.gentoo.org/show_bug.cgi?id=916504
I dug a little more and libxcb bug was reported too upstream. Like to see its a  known bug since 2022. Any point making trouble to other users?
Like to see you here too, Sam!
Jonas+1
Comment 10 Mike Gilbert gentoo-dev 2023-12-11 16:28:05 UTC
(In reply to Enrique Domínguez from comment #9)

The solution we usually implement for such problems is to force UTF-8 encoding for installed file names.