C.UTF-8 is in glibc as of 2.35 so we didn't need to patch it in anymore (which plenty of other distros did too).
[07:06:45] <ormaaaj> sam_: I noticed eselect locale gets its list by parsing the `locale -a` output, and locale gets them from the locale-archive. Since glibc versions started including the C.UTF-8 locale, it outputs "C.utf8". I logged into a debian system from before that was added and theirs uniquely didn't accept C.utf8. I didn't dig into how locale-gen gets the
[07:06:45] <ormaaaj> names - I assume either the locale defs or an internal enum.
[07:07:38] <ormaaaj> So the way we're generating these seems to not even be universally compatible with all glibc versions. `locale -m` output is inconsistent with `locale -a`, but looks more correct and in line with e.g. https://encoding.spec.whatwg.org/#names-and-labels. it might be better to map the locale -a names to those to construct the string.
[07:19:10] <tirnanog> arch is like that too. it doesn't tolerate C.utf8, does tolerate C.UTF-8, yet for everything else - e.g. en_US.utf8 vs en_US.UTF-8 - it doesn't matter at all (which is traditional glibc behaviour). that gentoo tolerates C.utf8 is, at least, consistent. I don't understand why these differences exist.
[07:23:07] <ormaaaj> I think it's due to the way distros were "patching" in their own definition before it went upstream relatively recently. IIRC gentoo was one of them.
[07:25:49] <tirnanog> yeah. still, C.UTF-8 is officially supported as of glibc 2.35, I think. so why would arch and gentoo, taking those two as an example, be different now? the arch behaviour seems off to me. ".utf8" has always worked; it's odd for C to be treated any differently from the others.
[07:26:35] <tirnanog> I didn't look into it yet so I don't have any answer.
[07:34:05] <tirnanog> one visible artifact of that distinction is that, in the affected distros, locale -a appears to show "C.UTF-8" while showing the ".utf8" suffix for other locales, whereas gentoo shows only the ".utf8" suffix and always accepts it both ways.
[07:51:29] <ormaaaj> Looks like it just greps them out of the locale-archive file. Just a sloppy implementation.
[07:58:32] <ormaaaj> http://dpaste.com/HFWS4HXEG
[08:06:43] <tirnanog> hmm. it gets weirder. gentoo has a novel locale-gen which always includes "C.UTF-8 UTF-8" in the course of generating an archive. arch doesn't. basically: if you put "C.UTF-8 UTF-8" in locale.gen, either of C.UTF-8 or C.utf8 are accepted as valid locale names. if you don't, only C.UTF-8 is (in glibc 2.35+, that is, whether it be visible in the locale archive or not).
[08:06:59] <ormaaaj> $() part is the intended paste. broken alias.
[08:08:08] <ormaaaj> hm
[08:08:44] <tirnanog> that explains why C.utf8 works in gentoo then. if I add "C.UTF-8 UTF-8" to locale.gen in arch and run locale-gen, C.utf8 suddenly starts working there too, in addition to C.UTF-8. oh, and you get _both_ representations showing up in locale -a thereafter. seems rather messy.
[08:09:46] <tirnanog> in short, glibc 2.35 and onwards will always support C.UTF-8 but not C.utf8 unless the locale was explicitly generated and incorporated into the archive.
[08:15:04] <tirnanog> I think gentoo used to patch support for C.UTF-8 in prior to 2.35. I suppose shoehorning it in via the locale-gen script is now an anachronism.
[08:15:16] <tirnanog> still, it all seems pretty messy on the glibc side.
[08:15:54] <tirnanog> ultimately, a solid case for not always writing it out properly as "UTF-8".
[08:16:02] <tirnanog> er, for always, I mean.
There's two things here.
(In reply to Sam James from comment #0)
> C.UTF-8 is in glibc as of 2.35 so we didn't need to patch it in anymore
> (which plenty of other distros did too).
And we dropped our patch.
> [08:06:43] <tirnanog> hmm. it gets weirder. gentoo has a novel locale-gen
> which always includes "C.UTF-8 UTF-8" in the course of generating an
> archive. arch doesn't. basically: if you put "C.UTF-8 UTF-8" in locale.gen,
> either of C.UTF-8 or C.utf8 are accepted as valid locale names. if you
> don't, only C.UTF-8 is (in glibc 2.35+, that is, whether it be visible in
> the locale archive or not).
> [08:15:04] <tirnanog> I think gentoo used to patch support for C.UTF-8 in
> prior to 2.35. I suppose shoehorning it in via the locale-gen script is now
> an anachronism.
> [08:15:16] <tirnanog> still, it all seems pretty messy on the glibc side.
We still need this because too many things break if *no* UTF-8 locale is available. Think python.
[And someone would remove it for sure. "Mah don't need no stinkin unicode."]
The bug was for the utf8 vs UTF-8 issue.
(In reply to Sam James from comment #2)
> The bug was for the utf8 vs UTF-8 issue.
Then I dont understand what the problem is; for some reason we are just more permissive?