26561 – more specific USE flag for multibyte character support

Bug 26561 - more specific USE flag for multibyte character support

Summary: more specific USE flag for multibyte character support

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Enhancement/Feature Requests (show other bugs)
Hardware:	All Linux

Importance:	High normal (vote)
Assignee:	CJK Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2003-08-13 11:10 UTC by bugs
Modified:	2011-10-30 22:35 UTC (History)
CC List:	2 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description bugs 2003-08-13 11:10:41 UTC

At the moment the relevant USE flags seem to be:
cjk  -	Adds support for Multi-byte character languages (Chinese, Japanese, Korean) 	
nls  -	Adds Native Language Support (using gettext - GNU locale utilities) 

If I want to enable unicode support or similar for an application (say, vim) 
only cjk seems applicable at the moment, and is a tad misleading (there are
hundreds of other languages in unicode!).

I'd like to request a:
multibyte
or
unicode
USE flag  defined as Adds support for Multi-byte characters

and redefine cjk to be:
Adds support for Chinese, Japanese, Korean - preferably deprecating this since
only a few rare cases would it not be covered by a combination of nls and unicode.

Comment 1 Masatomo Nakano (RETIRED) gentoo-dev

2003-08-13 11:18:54 UTC

Hello

Have you seen Bug 9988?

Comment 2 bugs 2003-08-13 11:57:57 UTC

Yes, while similar, it didn't seem quite applicable.
Setting a language locks one to a specific language.

Enabling multibyte character support allows a variety of languages, even if one's primary language is english.
One could, for example, want multibyte characters in order to view Tengwar in vi (at least I seem to remember tengwar is in unicode :) )

Setting a LANG variable, on the other hand, is specific.   I don't particularly want to localize to, say Jp since my japanese is quite weak.  I *do* want to be able to type in french (which is much stronger) and view french accents and japanese characters.  I don't want a french keyboard layout.

Multibyte is a fairly specific option.  I added this in part due to bug 26558 I just filed requesting that +multi_byte in Vim be flaggable (it is default right now) .

Thanks for your interest, though!
(&#12354;&#12426;&#12364;&#12392;&#12358;  &#12372;&#12374;&#12356;&#12414;&#12377; ?  ) :)

Comment 3 Stuart Bouyer 2003-08-13 20:05:43 UTC

"Enabling multibyte character support allows a variety of languages, even if one's primary language is english."

Yes and no!. Mutibyte character support is by definition different from unicode support. Multibyte characters are those in which the number of characters is too big to fit in one byte - Chinese, Japanese, and Korean (possibly Thai too) are examples.

French and tengwar are NOT multibyte character sets as the symbols of the language can be represented by one byte.

I support the call for a unicode flag - Gentoo currently has very poor support for unicode outside of English and the main  European languages

However if you want to use a flag for setting unicode "nls" is much better current flag to use as "cjk" was designed to add multibyte character support fro encodings such as euc, sjis, jis, gbk, big5, euc_kr. I'm unaware of any software package that lumps unicode (utf08, utf16 etc) in with them as the coding to support unicode is often very different than that for the CJK encodings.

Comment 4 Seemant Kulleen (RETIRED) gentoo-dev

2003-08-13 20:20:10 UTC

stubear, thank god you showed up :)

Comment 5 Stuart Bouyer 2003-08-13 22:15:35 UTC

It appears that I was wrong and that vim does lump unicode together with support for CJK support. I haven't had a chance to look at the source directly but what I have read online this seems to be the case. However this will cause problems if someone has "+cjk -unicode" as flags as your suggested flag will remove support for both unicode and cjk from vim.

Why would someone want "+cjk" and "-unicode"? Well a lot of Japanese aren't happy with the unification of Chinese and Japanese characters in unicode, and a number are quite strongly against unicode.

Again I strongly support a "unicode" flag, but care must be taken not to depreciate the "cjk" flag because for most software that has a "cjk", "unicode" + "nls" does not equal "cjk"!

Comment 6 Ken Deeter 2003-11-04 01:07:24 UTC

Aren't we really asking for NOT multibyte and NOT unicode but multilinqualization?
I think having a situation that allows "+cjk -unicode" is somewhat dangerous,
as I don't think the interpretation of this will be clear to everyone.

I think what we need to stress are patches/options that add m18n as opposed
to ones that enhance a particular language, or set of languages. I think
a lot of the stuff that comes in under "cjk" really is just to mean "here
is a patch/option that supports cjk, but other langs/encodings as well."
With the m18n support in glibc a lot better than it was before, there is
not much reason to write patches that support only one language (except for
that fact that they might be slightly easier).

On the other hand, there are patches that one wants to only use when a particular
language is involved. For example, for Japanese, there is a useful patch
that does encoding detection for mp3 id3 tags on xmms, because most mp3s
created under windows use sjis, where as most people in unix run under euc-jp.
Clearly a patch like this we don't want under m18n.

What I would lean towards is using terms that are well established and have
clear meaning. multilingualization (m18n) and localization (l10n).

I think most of the cjk stuff can be switched to using "+m18n", if there
really are many patches that enable only cjk, then perhaps "+m18n-cjk" but
I don't think the distinction really exists nowadays. For things that add
gettext type stuff we could use "+nls" as we do, or use "+l10n" (I don't
see the reason to change anything).

As for language specific things, we need another system, in the spirit of
Bug 9988.

Now as for "+unicode" I think adding this is dangerous, because people will
assume that unicode == m18n. Clearly, many systems still use legacy encodings
and so m18n refers to something that can handle many languages and many encodings,
of which unicode is simply one. That being said, there are programs that
seem to take the unicode or else approach, and maybe in this case "+unicode"
might be appropriate, but maybe something like "+m18n-unicode" so that we
can differentiate between full m18n and partial m18n using unicode.

Comment 7 John Davis (zhen) (RETIRED) gentoo-dev

2004-01-23 17:37:18 UTC

not releng stuff, reassigning to bug-wranglers

Comment 8 Jakub Moc (RETIRED) gentoo-dev

2005-12-10 06:13:00 UTC

# euse -i unicode
global use flags (searching: unicode)
************************************************************

[+ C  ] unicode - Adds support for Unicode


Obsolete bug.