It appears that Plasma's Region and Language KCM is able to effect locale changes for some distributions, with Gentoo being one of them. Specifically, in bug 945269 - which is otherwise unrelated - Bernd Buschinski reported that Plasma goes as far as to inject locales in /etc/locale.gen and execute the locale-gen utility. Moreover, it is prepared to inject locales that are invalid to glibc, such as "en_DE.UTF-8 UTF-8". Of course, there is no such locale as "en_DE". $ grep -lxFZ LC_IDENTIFICATION /usr/share/i18n/locales/* | xargs -0 basename -a | grep en_DE $ Bernd, please provide/attach information here regarding your system (emerge --info output, Plasma version etc), along with the necessary steps to be taken in Plasma to reproduce the issue.
I mean. This is quite heavy-handed, but from upstream's POV, it will work as expected save for the bug of course. We do tell users to configure their locales using locale-gen, but it might be empty by the time Plasma is installed, either due to neglect or following some odd youtube tutorial. At least for me, running without root priviledges, I didn't manage to have this KCM do anything with /etc/locale.gen, no matter if the file existed (owned by root:root) or if temporarily moved away.
I'm quite surprised it knows about /etc/locale.gen and locale-gen at all.
(In reply to Andreas Sturmlechner from comment #1) > I mean. This is quite heavy-handed, but from upstream's POV, it will work as > expected save for the bug of course. At the time of filing, I was not aware that kde-workspaces contained the code responsible. Now that I am, I would say that it cannot be described as anything that approximates working. Firstly, to inject malformed locale entries into a file that is adjacent to glibc, without due regard as to how locales actually work in glibc, is absurd. In the case of a glibc-based system, it's not especially difficult to discern which locales and codesets are valid. There is even a trivially-parseable "SUPPORTED" file that documents the supported combinations. (I am surprised that a desktop environment is prepared to act as though locale.gen is within its purview to begin with. It's not a standard utility and locale.gen doesn't even belong to glibc, so to speak. Ad-hoc editing of a single file is hardly robust, for that matter.) Secondly, Gentoo's locale-gen program - like various other programs named locale-gen - is poorly written and unsafe. I doubt that it is the conviction of upstream that their KCM works as expected merely because this particular locale-gen tolerates the invalid input and doesn't bother to exit non-zero. It's not as though other distributions necessarily fare any better either. I just tried running locale-gen with "en_DE.UTF-8 UTF-8" as an input in an Arch Linux installation that I have to hand and it: - bothered to exit non-zero upon encountering the invalid input (good) - wrecked the existing locale-archive (bad) Specifically, it wrecked the locale-archive by losing its existing entries and only incorporating whichever valid locales had been passed to localedef up until that point. I haven't yet looked at the implementations of any other distributions and would rather not, in the same sense that I would sooner not observe that which is beyond the trap of my lavatory. Overall, I would say that the behaviour of the KCM hinges upon a questionable premise and is poorly implemented. It also relies upon a non-standard utility that tend towards being poorly implemented - dangerous even. In turn, this non-standard utility depends on the non-standard GNU options of a standard utility (localedef) that has some rather unfortunate bugs of its own. I would go so far as to suggest that the feature be outright disarmed by the kde-workspaces ebuild. Meanwhile, I have written a locale-gen replacement that rejects invalid inputs and operates in a way that is guaranteed never to corrupt the existing locale-archive. I do not wish for my efforts to be stymied by this matter. I will refer the matter to the KDE project if needs be, though it would very much help if someone were to post the exact steps to reproduce this in Gentoo.
Pardon me. Naturally, I meant to write plasma-workspace, not kde-workspaces, in my prior comment.
I can think an additional reason that the approach taken by the KCM is unacceptable. Consider an ordinary GNU/Linux distribution that has not been obsessively modified, duly providing all 500 supported locales (not counting C and POSIX). The distribution can even be Gentoo. Now let's say that someone interacts with the regional settings in Plasma and, for argument's sake, ends up with the exact same locales being injected into locale.gen as were for Bernd. Keep in mind that the locale.gen file had only consisted of comments up until that point. So, we end up with: en_DE.UTF-8 UTF-8 de_DE.UTF-8 UTF-8 en_US.UTF-8 UTF-8 de_DE.UTF-8 UTF-8 The KCM proceeds to execute Gentoo's locale-gen. Assuming that it 'succeeds', the consequence is that the user has gained support for no new locales while having just *lost* support for exactly 498 locales, with all manner of potential consequences. Mind boggling.
(In reply to kfm from comment #3) > At the time of filing, I was not aware that kde-workspaces contained the > code responsible. Now that I am, I would say that it cannot be described as > anything that approximates working. > [...] > I will refer the matter to the KDE project if needs > be, though it would very much help if someone were to post the exact steps > to reproduce this in Gentoo. Shouldn't we wait for the latter to establish the former? Unless you can point at the code that is responsible for injecting wrong locales already? *I'm* not saying the Plasma KCM's code works correctly. But they felt the need to implement that thing, and if we are to hard disable it, we should at least establish why and how we are above it.
(In reply to Andreas Sturmlechner from comment #6) > Shouldn't we wait for the latter to establish the former? Yes. I was hoping for the original reporter to have at least confirmed with version of Plasma they were using by now. > Unless you can point at the code that is responsible for injecting wrong locales already? While I maintain that it's a bad idea at large, I skimmed the code. The "kcms/region_language/kcmregionandlang.cpp" unit contains the "KCMRegionAndLang::constructGlibcLocaleMap" method. It attempts to collect a list of regular files from /usr/share/i18n/locales. If the resulting list is empty, it falls back to executing `list-locales list-locales --no-pager`. Eventually, it interacts with the helper service over a D-Bus socket, wherein `locale-gen` may be executed. Incidentally, the file collection routine is not strictly correct because it does not inspect the contents of the file. It really ought to look for the presence of the LC_IDENTIFICATION field. This is another matter, however. As of master at least, it seems unlikely that the aforementioned method was responsible. The only way that it could have been is for either: - /usr/share/i18n/locales/en_DE to exist as a file - `localectl list-locales` to print a locale incorporating "en_DE" As far as I know, it isn't possible for either condition to arise in Gentoo under normal circumstances. Although it has previously been requested for "en_DE" to be supported, the request was never fulfilled. https://www.sourceware.org/bugzilla/show_bug.cgi?id=22535 As of yet, I haven't looked into how the composed locale map is operated on in any detail. It could be that something goes amiss at that point. I'll attempt to evaluate that possibility.
Another thing to consider is https://invent.kde.org/plasma/plasma-workspace/-/commit/5e2c176, which is only 2 months old. And 6.2.4 was only stabilised 22 days ago for the mainstream arches (amd64 and arm64).
s/2 months/3 months/. The point is that the matching against the gathered glibc locale names was wholly broken, leading to the involvement of heuristics (cough).
Reproduced in Plasma 6.2.4 with the following steps. 1) Have "en_US.UTF-8 UTF-8" be the sole entry in both locale.gen and the archive 2) Start a Plasma session 3) Open the Region & Language settings 4) Modify the "Numbers" row, setting it to "English (Germany)" 5) Apply the changes 6) Provide password if prompted (for polkit interaction) 7) Examine locale.gen $ cat /etc/locale.gen en_US.UTF-8 UTF-8 # generated by KDE Plasma Region & Language KCM en_DE.UTF-8 UTF-8 I was also able to induce other silly bugs, such as having it add the same entry to locale.gen many times over. For that, elect to modify the "Language" and add at least one secondary language. By the way, "English (Germany)" describes a locale that is valid in terms of the Unicode CLDR, and is therefore known to QLocale.
Oh, and one can skip step #1 even. Indeed, The KCM will blithely populate an empty locale.gen with erroneous entries, even exclusively. Thus, it wields the power to transform a system from a state whereby it supports every locale to one that supports precisely none in one fell swoop. That includes Gentoo's locale-gen, which is adequately atrocious. I've decided that I'm not going to take this upstream because I don't wish to sanction any of this nonsense as a matter of principle. Just look at https://invent.kde.org/plasma/plasma-workspace/-/commit/85ffb10 - sunken cost fallacy at work. The appropriate course of action is for vendors to say "no" and specify DGLIBC_LOCALE_GEN=OFF, and perhaps consider what problem locale-gen was supposed to be solving in the longer term. If I am unable to convince Gentoo of this then so be it. I may then have to adjust my reimplementation of locale-gen so as to simply warn of bogus locales rather than immediately reject them and exit 1. I can do that and still have it afford all of the safety guarantees that the present implementation does not.
(In reply to kfm from comment #11) > If I am unable to convince Gentoo of this then so be it. Where did you get that notion? Please don't preempt our reaction to this bug just because we do not immediately change things in a knee-jerk reaction. I'd like to understand what it is they tried to fix on their end with that code - while changing to GLIBC_LOCALE_GEN=OFF, if need be, we could check for a certain misconfiguration in pkg_setup() and throw back at the user what is ultimately part of their responsibilities of setting up a system. You've clearly investigated this thoroughly, so it would be a pity not to report it upstream. They might as well come to the conclusion that their idea of manipulating locales is futile.
(from upstream root CMakeLists.txt) > # notes for packager: > [...] > # For Glibc systems that don't come with pre-generated locales, such as ArchLinux > # This KCM uses "/etc/locale.gen" and "locale-gen" to generate configured locales > # and display a note to let user install fonts themselves if required > # You shouldn't required to do anything in this case > # > # For Glibc systems that come with pre-generated locales, such as Fedora and openSUSE > # a note to let user install fonts themselves if required is displayed > # You should enable GLIBC_LOCALE_PREGENERATED option > # > # For non-glibc systems such as VoidLinux and *BSD > # A warning of configure locale manually is displayed (although the relevant ENVs are set by Plasma) > # You should disable GLIBC_LOCALE_GEN option Shouldn't we do something like 1.) check for glibc 2.) if glibc, set -DGLIBC_LOCALE_PREGENERATED=ON + add a (non-)(?)fatal check to pkg_setup() if no locales yet generated 3.) if not glibc, set -DGLIBC_LOCALE_GEN=OFF + relay upstream's warning to the user to configure locale manually And it is entirely possible that I am misunderstanding GLIBC_LOCALE_PREGENERATED as well, so if you've come across this when looking at the code, and think it is also a non-starter, please let me know.
6.2.5 release is scheduled for Tue 2024-12-31, so whatever we decide to do can be tested from kde overlay stable branch ebuilds for a couple of days.
(In reply to Andreas Sturmlechner from comment #12) > (In reply to kfm from comment #11) > > If I am unable to convince Gentoo of this then so be it. > Where did you get that notion? Please don't preempt our reaction to this bug Note that I wrote "if", neither "since" nor "because". There are various potential outcomes from having reported this bug, and it is simply a fact that this may be one of them. Granted, I was preempting a possible outcome. Fair enough. But do try to see it from my perspective also. - I wrote an implementation of locale-gen which addresses all of its issues - it required effort and time to do it properly - my foremost concern is for my work to be incorporated - not all effort is boundlessly fungible Now, my implementation presently rejects all invalid inputs at an early stage, with useful diagnostics but without further processing. When Bernd raised this issue, I realised it as presenting a serious problem: the implementation cannot be so strict while co-existing with a component of a popular desktop environment that expects for a mixture of valid and invalid inputs to be tolerated. Hence my vested interest in this issue; I am primarily interested in it in so far as it conflicts with my intent. My goal is to improve locale-gen and and render it submissible, not to improve Plasma per se. > just because we do not immediately change things in a knee-jerk reaction. In point of fact, I issued no demand of you in your capacity as a maintainer. On the matter of change, I wrote: "I would go so far as to suggest that the feature be outright disarmed by the kde-workspaces ebuild." This is self-evidently a suggestion, not a demand. Later, I put it more forthrightly. "The appropriate course of action is for vendors to say "no" [...]" This is an opinion, not a demand. Further, at no point did I imply that the decision is mine to make. Ultimately, I have no more of a right than to tell you what to do with your time than you would have in telling me what to do with mine. Should I reach the conclusion - as I did - that I presently do not wish to argue with an upstream about code that I think ought to be jettisoned then that is that. I wish to get locale-gen done, not argue with people about code that I am culturally expected to pretend isn't claptrap. > I'd like to understand what it is they tried to fix on their end with that OK. To begin with, consider the origins of locale-gen as a utility developed by Debian, to address a problem experienced by Debian. Thanks to Debian actually shipping man pages, we may consult locale-gen(8) for the rationale. "By default, the locale package which provides the base support for localisation of libc-based programs does not contain usable localisation files for every supported language. This limitation has became necessary because of the substantial size of such files and the large number of languages supported by libc. As a result, Debian uses a special mechanism where we prepare the actual localisation files on the target host and distribute only the templates for them." So, what is it? An erstwhile space-saving measure. How relevant is it in 2024? I'll leave that latter question as a rhetorical once because to attempt to address it would go well beyond the purview of this bug, and I don't have a concrete answer myself at the present time. Now, while the situation varies by GNU/Linux distribution, there are some whose installation process naturally yields a system where all 500 supported glibc locales are available from the outset. And there are others which limit the initial set of supported locales, in which case the installation process would have required for the user to specify their desired locales in some capacity - or, at the very least - their primary locale. In a sense, Gentoo exists in both categories because it defaults to providing all of the supported locales, with no interaction required on the part of the user. Yet, the user may modify the system so as to wilfully reduce that set of 500 to some particular subset thereof. Gentoo just so happens to use a Debian-style locale-gen utility to facilitate this. It stands to reason that one cannot have a mechanism by which something is subtracted from the whole without entailing loss. It is an incontrovertible fact that, in reducing the set of supported locales to a smaller subset, the system becomes *less* functional. Given all this, it's not especially difficult to surmise the thought process of the Plasma authors. I can imagine an internal monologue that unfolded roughly as follows. "Our Region & Languages KCM allows for arbitrary locales to become an operation requirement as an immediate consequence of user interaction. But what if they are using an operating system for which the complete set of locales was not immediately made available? Calls to setlocale(3) might fail, with applications might not work as expected. Well, we could display a diagnostic message in the case that any of the mapped libc locales prove not to be available, advising the user to consult the documentation of their distribution for further guidance ..." And here is where the thought process goes off the rails. "... but wait! We already have a buggy CLDR to glibc locale mapper that we can never seem to get right. Would it not be a fine thing for us to probe for the existence of a distro-specific utility that affects the set of available glibc locales, quietly modify a root-owned file that we presume to govern its input, hook our mapper up to that and pray that it shall not be interrupted? Why yes, it would be quite splendid!" I rather think not. Besides, and again, it is a fact that the mere act of executing locale-gen may reduce a (potentially complete) set of available locales to a smaller (sadly, potentially empty) set of available locales, as things stand. That does not seem like a positive outcome for having innocently interacted with the KCM's UI. There are various other problems with it but I have said enough. It has no credibility whatsoever as a serious 'solution'. > code - while changing to GLIBC_LOCALE_GEN=OFF, if need be, we could check > for a certain misconfiguration in pkg_setup() and throw back at the user > what is ultimately part of their responsibilities of setting up a system. What would it check for exactly? Incidentally, I experimented with a -DGLIBC_LOCALE_GEN=OFF build of plasma-workspace. In a context where it would otherwise have chosen to execute locale-gen, it shows a banner stating: "Locale has been configured, but this KCM currently doesn't support auto locale generation on your system, please refer to your distribution's manual to install fonts and generate locales." Not bad as far as diagnostic messages go. However, it doesn't seem to bother checking whether the newly selected locale(s) can be genuinely effected before determining whether the warning needs to be shown. That much seems like a missed opportunity, given all the other proxy checks and heuristics encapsulated by the module. I look at this way: he who modifies /etc/locale.gen ought to know what he is getting into. If there is evidence to suggest that Gentoo users tend to - or are even guided - into doing so without being duly forewarned, then it may be that there is an issue of documentation and/or culture that needs addressing. The simplest way for the the hypothetical average Gentoo user to avoid locale-related problems is to never touch /etc/locale.gen, ever. Except for plasma-workspace[policykit] users ... > > You've clearly investigated this thoroughly, so it would be a pity not to > report it upstream. They might as well come to the conclusion that their > idea of manipulating locales is futile.
(In reply to Andreas Sturmlechner from comment #13) > (from upstream root CMakeLists.txt) > > # notes for packager: > > [...] > > # For Glibc systems that don't come with pre-generated locales, such as ArchLinux > > # This KCM uses "/etc/locale.gen" and "locale-gen" to generate configured locales > > # and display a note to let user install fonts themselves if required > > # You shouldn't required to do anything in this case > > # > > # For Glibc systems that come with pre-generated locales, such as Fedora and openSUSE > > # a note to let user install fonts themselves if required is displayed > > # You should enable GLIBC_LOCALE_PREGENERATED option > > # > > # For non-glibc systems such as VoidLinux and *BSD > > # A warning of configure locale manually is displayed (although the relevant ENVs are set by Plasma) > > # You should disable GLIBC_LOCALE_GEN option > > Shouldn't we do something like > 1.) check for glibc > 2.) if glibc, set -DGLIBC_LOCALE_PREGENERATED=ON > + add a (non-)(?)fatal check to pkg_setup() if no locales yet generated > 3.) if not glibc, set -DGLIBC_LOCALE_GEN=OFF > + relay upstream's warning to the user to configure locale manually > > And it is entirely possible that I am misunderstanding > GLIBC_LOCALE_PREGENERATED as well, so if you've come across this when > looking at the code, and think it is also a non-starter, please let me know. I would need to inspect a Fedora system again to be sure, but I think it's the case that Fedora ships all locales as a matter of course. At least, there is no locale-gen utility. In turn, there is no supported way of reducing the available set of locales. Presumably, -DGLIBC_LOCALE_PREGENERATED exists so that vendors such as Red Hat can signify to plasma-workspace that it can presume that the complete set of locales is always available. I see what you're getting at, though. In that case, the prospective pkg_setup() check might be more useful if it were to warn in the case that the locale archive consists of any smaller subset of the complete set of locales defined by /usr/share/i18n/SUPPORTED. Not necessarily in an overbearing manner but along the lines of being on your own if you've reduced the set of available locales and intend to interact with the KCM component. Just a thought.
(In reply to Andreas Sturmlechner from comment #14) > 6.2.5 release is scheduled for Tue 2024-12-31, so whatever we decide to do > can be tested from kde overlay stable branch ebuilds for a couple of days. Thanks. I don't consider myself to be blocked on my locale-gen work at this point. The situation was less clear as the time of opening.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/kde.git/commit/?id=c32480e88d551853c179ddcf84e264df51a3c1b6 commit c32480e88d551853c179ddcf84e264df51a3c1b6 Author: Andreas Sturmlechner <asturm@gentoo.org> AuthorDate: 2024-12-29 19:00:11 +0000 Commit: Andreas Sturmlechner <asturm@gentoo.org> CommitDate: 2024-12-29 19:24:09 +0000 kde-plasma/plasma-workspace: Set GLIBC_LOCALE_PREGENERATED=ON ... and GLIBC_LOCALE_GEN=OFF. Bug: https://bugs.gentoo.org/946289 Signed-off-by: Andreas Sturmlechner <asturm@gentoo.org> kde-plasma/plasma-workspace/plasma-workspace-6.2.49.9999.ebuild | 8 +++----- kde-plasma/plasma-workspace/plasma-workspace-9999.ebuild | 8 +++----- 2 files changed, 6 insertions(+), 10 deletions(-)