At the moment the relevant USE flags seem to be: cjk - Adds support for Multi-byte character languages (Chinese, Japanese, Korean) nls - Adds Native Language Support (using gettext - GNU locale utilities) If I want to enable unicode support or similar for an application (say, vim) only cjk seems applicable at the moment, and is a tad misleading (there are hundreds of other languages in unicode!). I'd like to request a: multibyte or unicode USE flag defined as Adds support for Multi-byte characters and redefine cjk to be: Adds support for Chinese, Japanese, Korean - preferably deprecating this since only a few rare cases would it not be covered by a combination of nls and unicode.
Hello Have you seen Bug 9988?
Yes, while similar, it didn't seem quite applicable. Setting a language locks one to a specific language. Enabling multibyte character support allows a variety of languages, even if one's primary language is english. One could, for example, want multibyte characters in order to view Tengwar in vi (at least I seem to remember tengwar is in unicode :) ) Setting a LANG variable, on the other hand, is specific. I don't particularly want to localize to, say Jp since my japanese is quite weak. I *do* want to be able to type in french (which is much stronger) and view french accents and japanese characters. I don't want a french keyboard layout. Multibyte is a fairly specific option. I added this in part due to bug 26558 I just filed requesting that +multi_byte in Vim be flaggable (it is default right now) . Thanks for your interest, though! (ありがとう ございます ? ) :)
"Enabling multibyte character support allows a variety of languages, even if one's primary language is english." Yes and no!. Mutibyte character support is by definition different from unicode support. Multibyte characters are those in which the number of characters is too big to fit in one byte - Chinese, Japanese, and Korean (possibly Thai too) are examples. French and tengwar are NOT multibyte character sets as the symbols of the language can be represented by one byte. I support the call for a unicode flag - Gentoo currently has very poor support for unicode outside of English and the main European languages However if you want to use a flag for setting unicode "nls" is much better current flag to use as "cjk" was designed to add multibyte character support fro encodings such as euc, sjis, jis, gbk, big5, euc_kr. I'm unaware of any software package that lumps unicode (utf08, utf16 etc) in with them as the coding to support unicode is often very different than that for the CJK encodings.
stubear, thank god you showed up :)
It appears that I was wrong and that vim does lump unicode together with support for CJK support. I haven't had a chance to look at the source directly but what I have read online this seems to be the case. However this will cause problems if someone has "+cjk -unicode" as flags as your suggested flag will remove support for both unicode and cjk from vim. Why would someone want "+cjk" and "-unicode"? Well a lot of Japanese aren't happy with the unification of Chinese and Japanese characters in unicode, and a number are quite strongly against unicode. Again I strongly support a "unicode" flag, but care must be taken not to depreciate the "cjk" flag because for most software that has a "cjk", "unicode" + "nls" does not equal "cjk"!
Aren't we really asking for NOT multibyte and NOT unicode but multilinqualization? I think having a situation that allows "+cjk -unicode" is somewhat dangerous, as I don't think the interpretation of this will be clear to everyone. I think what we need to stress are patches/options that add m18n as opposed to ones that enhance a particular language, or set of languages. I think a lot of the stuff that comes in under "cjk" really is just to mean "here is a patch/option that supports cjk, but other langs/encodings as well." With the m18n support in glibc a lot better than it was before, there is not much reason to write patches that support only one language (except for that fact that they might be slightly easier). On the other hand, there are patches that one wants to only use when a particular language is involved. For example, for Japanese, there is a useful patch that does encoding detection for mp3 id3 tags on xmms, because most mp3s created under windows use sjis, where as most people in unix run under euc-jp. Clearly a patch like this we don't want under m18n. What I would lean towards is using terms that are well established and have clear meaning. multilingualization (m18n) and localization (l10n). I think most of the cjk stuff can be switched to using "+m18n", if there really are many patches that enable only cjk, then perhaps "+m18n-cjk" but I don't think the distinction really exists nowadays. For things that add gettext type stuff we could use "+nls" as we do, or use "+l10n" (I don't see the reason to change anything). As for language specific things, we need another system, in the spirit of Bug 9988. Now as for "+unicode" I think adding this is dangerous, because people will assume that unicode == m18n. Clearly, many systems still use legacy encodings and so m18n refers to something that can handle many languages and many encodings, of which unicode is simply one. That being said, there are programs that seem to take the unicode or else approach, and maybe in this case "+unicode" might be appropriate, but maybe something like "+m18n-unicode" so that we can differentiate between full m18n and partial m18n using unicode.
not releng stuff, reassigning to bug-wranglers
# euse -i unicode global use flags (searching: unicode) ************************************************************ [+ C ] unicode - Adds support for Unicode Obsolete bug.