C.UTF-8 is in glibc as of 2.35 so we didn't need to patch it in anymore (which plenty of other distros did too). From #gentoo: [07:06:45] <ormaaaj> sam_: I noticed eselect locale gets its list by parsing the `locale -a` output, and locale gets them from the locale-archive. Since glibc versions started including the C.UTF-8 locale, it outputs "C.utf8". I logged into a debian system from before that was added and theirs uniquely didn't accept C.utf8. I didn't dig into how locale-gen gets the [07:06:45] <ormaaaj> names - I assume either the locale defs or an internal enum. [07:07:38] <ormaaaj> So the way we're generating these seems to not even be universally compatible with all glibc versions. `locale -m` output is inconsistent with `locale -a`, but looks more correct and in line with e.g. https://encoding.spec.whatwg.org/#names-and-labels. it might be better to map the locale -a names to those to construct the string. [07:19:10] <tirnanog> arch is like that too. it doesn't tolerate C.utf8, does tolerate C.UTF-8, yet for everything else - e.g. en_US.utf8 vs en_US.UTF-8 - it doesn't matter at all (which is traditional glibc behaviour). that gentoo tolerates C.utf8 is, at least, consistent. I don't understand why these differences exist. [07:23:07] <ormaaaj> I think it's due to the way distros were "patching" in their own definition before it went upstream relatively recently. IIRC gentoo was one of them. [07:25:49] <tirnanog> yeah. still, C.UTF-8 is officially supported as of glibc 2.35, I think. so why would arch and gentoo, taking those two as an example, be different now? the arch behaviour seems off to me. ".utf8" has always worked; it's odd for C to be treated any differently from the others. [07:26:35] <tirnanog> I didn't look into it yet so I don't have any answer. [07:34:05] <tirnanog> one visible artifact of that distinction is that, in the affected distros, locale -a appears to show "C.UTF-8" while showing the ".utf8" suffix for other locales, whereas gentoo shows only the ".utf8" suffix and always accepts it both ways. [07:51:29] <ormaaaj> Looks like it just greps them out of the locale-archive file. Just a sloppy implementation. [07:58:32] <ormaaaj> http://dpaste.com/HFWS4HXEG [08:06:43] <tirnanog> hmm. it gets weirder. gentoo has a novel locale-gen which always includes "C.UTF-8 UTF-8" in the course of generating an archive. arch doesn't. basically: if you put "C.UTF-8 UTF-8" in locale.gen, either of C.UTF-8 or C.utf8 are accepted as valid locale names. if you don't, only C.UTF-8 is (in glibc 2.35+, that is, whether it be visible in the locale archive or not). [08:06:59] <ormaaaj> $() part is the intended paste. broken alias. [08:08:08] <ormaaaj> hm [08:08:44] <tirnanog> that explains why C.utf8 works in gentoo then. if I add "C.UTF-8 UTF-8" to locale.gen in arch and run locale-gen, C.utf8 suddenly starts working there too, in addition to C.UTF-8. oh, and you get _both_ representations showing up in locale -a thereafter. seems rather messy. [08:09:46] <tirnanog> in short, glibc 2.35 and onwards will always support C.UTF-8 but not C.utf8 unless the locale was explicitly generated and incorporated into the archive. [08:15:04] <tirnanog> I think gentoo used to patch support for C.UTF-8 in prior to 2.35. I suppose shoehorning it in via the locale-gen script is now an anachronism. [08:15:16] <tirnanog> still, it all seems pretty messy on the glibc side. [08:15:54] <tirnanog> ultimately, a solid case for not always writing it out properly as "UTF-8". [08:16:02] <tirnanog> er, for always, I mean.
There's two things here. (In reply to Sam James from comment #0) > C.UTF-8 is in glibc as of 2.35 so we didn't need to patch it in anymore > (which plenty of other distros did too). And we dropped our patch. > [08:06:43] <tirnanog> hmm. it gets weirder. gentoo has a novel locale-gen > which always includes "C.UTF-8 UTF-8" in the course of generating an > archive. arch doesn't. basically: if you put "C.UTF-8 UTF-8" in locale.gen, > either of C.UTF-8 or C.utf8 are accepted as valid locale names. if you > don't, only C.UTF-8 is (in glibc 2.35+, that is, whether it be visible in > the locale archive or not). [...] > [08:15:04] <tirnanog> I think gentoo used to patch support for C.UTF-8 in > prior to 2.35. I suppose shoehorning it in via the locale-gen script is now > an anachronism. > [08:15:16] <tirnanog> still, it all seems pretty messy on the glibc side. We still need this because too many things break if *no* UTF-8 locale is available. Think python. [And someone would remove it for sure. "Mah don't need no stinkin unicode."]
The bug was for the utf8 vs UTF-8 issue.
(In reply to Sam James from comment #2) > The bug was for the utf8 vs UTF-8 issue. Then I dont understand what the problem is; for some reason we are just more permissive?