I have generated some locales with locale-gen for localization tests (from sys-libs/glibc-2.6.1 (/usr/sbin/locale-gen)). The system is Linux 2.6.18-gentoo-r4 #3 SMP Wed May 9 23:35:47 CEST 2007 i686 Mobile Genuine Intel(R) processor 1500MHz GenuineIntel GNU/Linux When I generate ja_JP.JIS_X0201 it breaks the previously generated ja_JP.SHIFT_JIS. Reproducible: Always Steps to Reproduce: 1. In my /etc/locale.gen, if I have this line (and some others): ja_JP SHIFT_JIS then locale-gen displays some strange message about this locale, but apparently it works all the same: # LC_ALL=POSIX locale-gen * Generating 5 locales (this might take a while) with 1 jobs * (1/5) Generating ja_JP.UTF-8 ... [ ok ] * (2/5) Generating fr_FR.ISO-8859-1 ... [ ok ] * (3/5) Generating fr_FR.UTF-8 ... [ ok ] * (4/5) Generating fr_FR.ISO-8859-15@euro ... [ ok ] * (5/5) Generating ja_JP.SHIFT_JIS ... character map `SHIFT_JIS' is not ASCII compatible, locale not ISO C compliant [ ok ] * Generation complete 2. # locale -a [... some locales ...] ja_JP.shiftjis [...] The encoding is: # LC_ALL=ja_JP.shiftjis locale charmap SHIFT_JIS 3. Now I add a new line (and keep the previous ones) in /etc/locale.gen: ja_JP JIS_X0201 4. While running locale-gen, this new locale does not seem to generate well (I don't know if copying here the whole display is good, sorry if it is too big): * (6/6) Generating ja_JP.JIS_X0201 ... character map `JIS_X0201' is not ASCII compatible, locale not ISO C compliant /usr/share/i18n/locales/ja_JP:14877: LC_MESSAGES: unknown character in field `yesexpr' /usr/share/i18n/locales/ja_JP:14879: LC_MESSAGES: unknown character in field `noexpr' /usr/share/i18n/locales/ja_JP:14880: LC_MESSAGES: unknown character in field `yesstr' /usr/share/i18n/locales/ja_JP:14881: LC_MESSAGES: unknown character in field `nostr' /usr/share/i18n/locales/ja_JP:14914: LC_TIME: unknown character in field `abday' /usr/share/i18n/locales/ja_JP:14914: LC_TIME: unknown character in field `abday' /usr/share/i18n/locales/ja_JP:14914: LC_TIME: unknown character in field `abday' /usr/share/i18n/locales/ja_JP:14914: LC_TIME: unknown character in field `abday' /usr/share/i18n/locales/ja_JP:14914: LC_TIME: unknown character in field `abday' /usr/share/i18n/locales/ja_JP:14914: LC_TIME: unknown character in field `abday' /usr/share/i18n/locales/ja_JP:14914: LC_TIME: unknown character in field `abday' /usr/share/i18n/locales/ja_JP:14916: LC_TIME: unknown character in field `day' /usr/share/i18n/locales/ja_JP:14916: LC_TIME: unknown character in field `day' /usr/share/i18n/locales/ja_JP:14917: LC_TIME: unknown character in field `day' /usr/share/i18n/locales/ja_JP:14917: LC_TIME: unknown character in field `day' /usr/share/i18n/locales/ja_JP:14918: LC_TIME: unknown character in field `day' /usr/share/i18n/locales/ja_JP:14918: LC_TIME: unknown character in field `day' /usr/share/i18n/locales/ja_JP:14919: LC_TIME: unknown character in field `day' /usr/share/i18n/locales/ja_JP:14921: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14921: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14922: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14922: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14923: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14923: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14924: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14924: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14925: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14925: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14926: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14926: LC_TIME: unknown character in field `abmon' /usr/share/i18n/locales/ja_JP:14928: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14928: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14929: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14929: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14930: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14930: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14931: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14931: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14932: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14932: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14933: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14933: LC_TIME: unknown character in field `mon' /usr/share/i18n/locales/ja_JP:14935: LC_TIME: unknown character in field `d_t_fmt' /usr/share/i18n/locales/ja_JP:14937: LC_TIME: unknown character in field `d_fmt' /usr/share/i18n/locales/ja_JP:14939: LC_TIME: unknown character in field `t_fmt' /usr/share/i18n/locales/ja_JP:14941: LC_TIME: unknown character in field `am_pm' /usr/share/i18n/locales/ja_JP:14941: LC_TIME: unknown character in field `am_pm' /usr/share/i18n/locales/ja_JP:14943: LC_TIME: unknown character in field `t_fmt_ampm' /usr/share/i18n/locales/ja_JP:14945: LC_TIME: unknown character in field `era' /usr/share/i18n/locales/ja_JP:14955: LC_TIME: unknown character in field `era_d_fmt' /usr/share/i18n/locales/ja_JP:14957: LC_TIME: unknown character in field `era_d_t_fmt' /usr/share/i18n/locales/ja_JP:14962: LC_TIME: unknown character in field `date_fmt' /usr/share/i18n/locales/ja_JP:14964: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14964: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14964: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14964: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14964: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14964: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14965: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14965: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14965: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14965: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14965: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14966: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14966: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14966: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14967: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14967: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14967: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14968: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14968: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14968: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14969: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14969: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14970: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14970: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14971: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14971: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14972: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14972: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14973: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14973: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14974: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14974: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14975: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14975: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14976: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14976: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14977: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14977: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14978: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14978: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14979: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14979: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14980: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14980: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14981: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14981: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14982: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14982: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14983: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14983: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14984: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14984: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14985: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14985: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14986: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14986: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14987: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14987: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14988: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14988: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14989: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14989: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14990: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14990: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14991: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14991: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14992: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14992: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14993: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14993: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14994: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14994: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14995: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14995: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14996: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14996: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14997: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14997: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14998: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14998: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14999: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:14999: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15000: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15000: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15001: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15001: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15002: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15002: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15003: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15003: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15004: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15004: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15005: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15005: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15006: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15006: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15007: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15007: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15008: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15008: LC_TIME: unknown character in field `alt_digits' /usr/share/i18n/locales/ja_JP:15027: LC_NAME: unknown character in field `name_gen' LC_MESSAGES: value for field `yesexpr' must not be an empty string LC_MESSAGES: value for field `noexpr' must not be an empty string [ !! ] * Generation complete Actual Results: Then the strangest is that locale -a shows both locale: ja_JP.jisx0201 ja_JP.shiftjis And especially they now both have the same JISX0201 charmap: # LC_ALL=ja_JP.jisx0201 locale charmap JIS_X0201 # LC_ALL=ja_JP.shiftjis locale charmap JIS_X0201 Expected Results: Of course the expected result is: - first that there should not be these errors while generating ja_JP.JIS_X0201; - + the fact that we are not sure that it failed. It finished with [!!] but it is still in my updated list of available locales and it does not show any error when I set this locale. - and finally why did it "break" the SHIFT_JIS encoding? This is not very good as encoding is one of the most important parts of the locale...
I have just made a small test. If I do the same, but invert lines in locale.gen: ja_JP JIS_X0201 ja_JP SHIFT_JIS Then I have of course the same errors on screen (but only JIS_X00201 fails "apparently" at the end) but now both encoding will be showed as SHIFT_JIS: # LC_ALL=ja_JP.shiftjis locale charmap SHIFT_JIS # LC_ALL=ja_JP.jisx0201 locale charmap SHIFT_JIS As though the last one is the "winner". I have also a ja_JP.UTF-8, but it is still showing as UTF-8 charmap, and I have several fr_FR with different encoding, without such issue of "encoding stealing". It does this only for these 2 lines. I guess this may be related to the errors displayed? Or maybe that JIS_X0201 is "used" in ISO_2022, on which is apparently based SHIFT-JIS (as far as I understood). Anyway there is an error somewhere.
You definitely know what is going on here better than I do. Could you please file a bug upstream as this behaviour seems to have existed for a very long time? http://sources.redhat.com/bugzilla/
i'll take care of triaging/moving upstream. re-opening until that happens.
the warning from trying to generate ja_JP.SHIFT_JIS should be there. if you look at the character map, it is slightly modified from standard ASCII: byte 0x5C should be \ but it's ¥ instead byte 0x7E should be ~ but it's ‾ instead ISO C requires characters 0x00 through 0x7C have the same values as ASCII. this one does not, hence you get a warning. https://en.wikipedia.org/wiki/Shift_JIS#Shift_JIS_byte_map for the 2nd part, your config file is invalid. the first col needs to be unique because that's the value used when setting locale variables. so when you do: ja_JP SHIFT_JIS this allows you to do LANG=ja_JP and it'll be the same as ja_JP.SHIFT_JS. but when you then do: ja_JP SHIFT_JIS ja_JP JIS_X0201 the 2nd entry clobbers the first one. you instead want to do: ja_JP.SHIFT_JIS SHIFT_JIS ja_JP.JIS_X0201 JIS_X0201 now you can do LANG=ja_JP.SHIFT_JIS and LANG=ja_JP.JIS_X0201. i think the default for LANG=ja_JP should be: ja_JP EUC-JP although you're free to set it however you like on your system. the locale.gen config file is misleading here in its comments so i cleaned that up: http://sources.gentoo.org/gentoo/src/patchsets/glibc/extra/locale/locale.gen?r1=1.1&r2=1.2 http://sources.gentoo.org/gentoo/src/patchsets/glibc/extra/locale/locale.gen.5?r1=1.3&r2=1.4 also the locale-gen tool should catch & warn about this, so i fixed that too: http://sources.gentoo.org/gentoo/src/patchsets/glibc/extra/locale/locale-gen?r1=1.37&r2=1.38 for the last part, all the spew when trying to generate ja_JP.JIS_X0201 is correct. lets break it down one at a time. the first warning: /usr/share/i18n/locales/ja_JP:14877: LC_MESSAGES: unknown character in field `yesexpr' if we look at yesexpr in that file, it has: yesexpr "<U005E><U0028><U005B><U0079><U0059><UFF59><UFF39><U005D>/ <U007C><U306F><U3044><U007C><U30CF><U30A4><U0029>" if we look at all the characters defined in /usr/share/i18n/charmaps/JIS_X0201, we see that it does not define these two that are used in the yesexpr: は <U306F> /xe3/x81/xaf HIRAGANA LETTER HA い <U3044> /xe3/x81/x84 HIRAGANA LETTER I and if we consult the encoding for JIS_X_0201, we see that while it provides the katakana alphabet, it does not provide any hiragana characters: https://en.wikipedia.org/wiki/JIS_X_0201 so localedef complains that it is not possible to create a "yesexpr" because it wants to include hiragana, but the encoding only supports katakana. while the warning is confusing, it's more or less WAI. i think i covered everything, albeit not exactly timely ;).