Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 18375
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: utf8 <utf8@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Danny Milosavljevic <danny_milo@yahoo.com>
Add CC:
CC:
Remove selected CCs
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 18375 depends on: 15880 Show dependency tree
Bug 18375 blocks: 24267
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2003-03-28 10:56 0000
Hi!

Is it me or are most of the packages compiled without UTF-8 support (even glibc 2.2 *and* 2.3's locales for that are not installed! O_o)... If there already is a USE flag for that, I apologize.

Otherwhise I'd suggest adding "utf8" to the list of valid USE options... 
(intended to be set before bootstrap for it to take global effect?)

then add to ncurses ebuilds:
  IUSE="utf8"
 use utf8 && myconf="${myconf} --enable-widec"

and to dialog ebuilds:
  IUSE="utf8"
        w=""
        use utf8 && w="w"
        econf --with-ncurses"${w}" || die     <--- instead of just "--with-ncurses"
  
btw Dialog's dialogs in utf8 still look like poo, but at least the windows are properly aligned now (actually rectangular :->)... maybe font issue.

The /etc/rc.conf's KEYMAP has already been changed to support unicode (KEYMAP="-u <whatevermap>").

As most of the newer packages already incorperate optional utf8 support, all that is left is turning it on (if optional at all, and not already required).

Although ncurses for one does the weird thing of renaming it's library when utf8 is turned on O_o ow well, I guess it cant be helped. (I linked libncursesw.so to libncurses.so and now it works even with apps not recompiled after that - why shouldn't it, all programs have to check the locale (f. e. LANG variable) on STARTUP to decide if to use utf8 or not, not at compile time [see posix standard] ;))

I "fixed" glibc's utf8 locales for the time being using:
localedef -v -c -i de_AT -f UTF-8 de_AT.UTF-8
that did not work(?), even with cwd /usr/share/locale

but then with
localedef -v -c -i de_AT -f UTF-8 /usr/share/locale/de_AT.UTF-8
which did partially work (Xfree/Gtk/Qt/all stopped complaining about unknown locale, and umlauts are now two bytes in length as they should be - checked using jstar in a utf8 xterm, which isn't utf8 aware ;))
I dont understand the difference between these two commands in terms of executed stuff, but there seems to be an inconsistency, undocumented change or bug somewhere... (or I just overlooked something ;))

Bash has an outstanding "bug" of not respecting the LANG="de_AT.UTF-8" set within a shell (where to set it earlier?) for its own readline "lib", until spawning of a child shell. In the next subshell(s) it works like a breeze. Maybe there is a "reread config" builtin command though ? *lost*

Also, for setting the LANG environment variable, is there a config file?
If not, I'd suggest putting:
  /etc/profile.d/lang: 
     source /etc/conf.d/lang
and
  /etc/conf.d/lang:
     LANG="whatever"

Also would be possible user-based, but that seems overkill to me.

Thats all I can think of for now :->

laters :)

------- Comment #1 From Danny Milosavljevic 2003-04-06 05:24:34 0000 -------
also for the program "screen" to work with color, create file
/etc/profile.d/linux-utf8:
[ "${TERM}" = "linux" ] && export TERM="linux-utf8"

This of course assumes that utf8 is used, which isn't the case when the
proposed "USE" flag is not set. Dunno how to check that in bash. Maybe checking
in the corresponding ebuild and just creating the file linux-utf8 when the use
flag is set is enough.


Hey, is anyone actually reading this ? :D

------- Comment #2 From Danny Milosavljevic 2003-04-06 05:49:53 0000 -------
also for curses related programs to work with the last comment,

cp /usr/share/terminfo/l/linux /usr/share/terminfo/l/linux-utf8

phew...

------- Comment #3 From Danny Milosavljevic 2003-04-06 06:04:10 0000 -------
still looking into why the border characters (of the program "dialog", for
example) look fine in a utf8 xterm, but wrong in a utf8 console... weird...
Can't seem to find any font which does it right(tm).

------- Comment #4 From Danny Milosavljevic 2003-04-17 12:39:26 0000 -------
To make screen working correctly in an utf8 xterm I had to add the following to
/etc/termcap and remove the original "v0|term" entry:

v0|xterm|xterm in Unicode (UTF-8) mode:\
       :am:km:mi:ms:xn:\
        :co#80:it#8:li#24:\
        :AL=\E[%dL:DC=\E[%dP:DL=\E[%dM:DO=\E[%dB:IC=\E[%d@:\
        :K1=\EOw:K2=\EOu:K3=\EOy:K4=\EOq:K5=\EOs:LE=\E[%dD:\
        :RI=\E[%dC:UP=\E[%dA:ae=^O:al=\E[L:as=^N:bl=^G:bt=\E[Z:\
        :cd=\E[J:ce=\E[K:cl=\E[H\E[2J:cm=\E[%i%d;%dH:cr=^M:\
        :cs=\E[%i%d;%dr:ct=\E[3g:dc=\E[P:dl=\E[M:do=^J:ec=\E[%dX:\
        :ei=\E[4l:ho=\E[H:ic=\E[@:im=\E[4h:\
        :is=\E7\E[r\E[m\E[?7h\E[?1;3;4;6l\E[4l\E8\E>:\
        :k0=\E[21~:k1=\E[11~:k2=\E[12~:k3=\E[13~:k4=\E[14~:\
        :k5=\E[15~:k6=\E[17~:k7=\E[18~:k8=\E[19~:k9=\E[20~:\
        :kD=\E[3~:kI=\E[2~:kN=\E[6~:kP=\E[5~:kb=\177:kd=\EOB:\
        :ke=\E[?1l\E>:kh=\EOH:kl=\EOD:kr=\EOC:ks=\E[?1h\E=:\  
        :ku=\EOA:le=^H:md=\E[1m:me=\E[m\017:mr=\E[7m:nd=\E[C:\
        :rc=\E8:sc=\E7:se=\E[27m:sf=^J:so=\E[7m:sr=\EM:st=\EH:ta=^I:\
        :te=\E[2J\E[?47l\E8:ti=\E7\E[?47h:ue=\E[24m:up=\E[A:\
        :us=\E[4m:vb=\E[?5h\E[?5l:ve=\E[?25h:vi=\E[?25l:\
        :vs=\E[?25h:

Whereas the space at the beginning of every line here is "<space><tab>"

Now I can do:
xterm -u8 -e screen
and that's good :D

------- Comment #5 From Markus Bertheau 2003-04-17 15:21:03 0000 -------
Danny: doesn't screen -U do what you want?

------- Comment #6 From Danny Milosavljevic 2003-04-25 12:36:49 0000 -------
no. its the same as without -U.
What does partially do what I want is 
export TERM="xterm-utf8"
then screen
but many MANY programs cant cope with that setting (did I mention MANY already? ;))

However, with that termcap modification I didn't have a single problem until now (my "xterm" launcher does now include "screen" per default, pretty cool ;))


------- Comment #7 From Danny Milosavljevic 2003-06-01 07:06:21 0000 -------
to fix the de-latin1 keymap, comment out the line

alt keycode 13 = Meta_acute

in /usr/share/keymaps/i386/qwertz/de-latin1.map so that it looks like so:

#alt keycode 13 = Meta_acute


------- Comment #8 From Danny Milosavljevic 2003-06-01 07:14:19 0000 -------
What happened? Nobody interested in UTF-8 support?

If you are interested, tell me how I can help better :)

I've already helped three people set up their gentoo with utf-8 support, but after the third time it get kinda repeative and boring ;) what about fixing the official gentoo huh ? :) (ok, maybe moving into stable *after* 1.4, but what about experimental ebuilds for glibc, baselayout, ncurses and dialog which fix utf-8 ?)

Thanks...

------- Comment #9 From Thomas Scheffler 2003-07-11 12:19:27 0000 -------
Hi,

I'm interested in utf-8 support, too. I really cannot imagine why gentoo lacks support for it as every major distribution has implemented it. Is this a feature?
Does somebody at gentoo already cares about it? I mean this bug is quite "old" now.

------- Comment #10 From Alastair Tse (RETIRED) 2003-07-11 13:08:07 0000 -------
i'm interested in getting utf-8 support on gentoo. i was wondering how far
ska-fan has got with this. if he's not interested, i wouldn't mind this being
assigned to me so i can work on this a bit more.

------- Comment #11 From Markus Bertheau 2003-07-11 13:37:37 0000 -------
Feel free.

------- Comment #12 From Thomas Scheffler 2003-07-12 14:15:47 0000 -------
How long do you think would it take to implement this and provide masked
ebuilds I'm willing to test it when it comes to that point. I find it great
that somebody seems to care. Post a message when you're ready.

------- Comment #13 From Danny Milosavljevic 2003-07-18 09:42:19 0000 -------
yay yay :)  Hi, liquidx

Ok, since the above stuff is rather unordered, I will make a summary of the 
valid points to ease the task:
1) The ncurses ebuild needs --enable-widec, and a link from libncursesw.so to libncurses.so
2) The /etc/rc.conf's KEYMAP has already been changed to support unicode (KEYMAP="-u
<whatevermap>").
3) to create an UTF-8 locale: localedef -v -c -i de_AT -f UTF-8 de_AT.UTF-8 (DOES work now)
 (of course replace de_AT by your favourite languages ^^)
4) set LANG to "??_??.UTF-8"; strictly speaking, there should be a way to set LANG before bash is even started, i.e. between login and starting bash, dunno how. bash internal readline is rather spooky with this.
5) Modified termcap entry for xterm (Comment #4), this fixed loads of things
6) to fix the de-latin1 keymap, comment out the line "alt keycode 13 = Meta_acute" in "/usr/share/keymaps/i386/qwertz/de-latin1.map" (Comment #7)
7) dialog ebuild needs --with-ncursesw, but is evil enough in console even with it. (not a single problem in xterms tho)

Now invalid points:
- Forget that cp terminfo crap I wrote (Comment #2)
- localedef miraculously does not need full path anymore now ;)
- forget that export TERM="linux-utf8" (Comment #1), I was just stupid

TODO:
- Console UTF-8 border characters (Comment #3)... console is evil...

I hope this helps :)




------- Comment #14 From Zhen Lin 2003-07-19 09:19:45 0000 -------
Unicode in the 0xb8000/0xb0000 console is not needed. The console only supports
256 glyphs!

Plan 9 (plan9.bell-labs.com) takes a different approach - EVERYTHING supports
UTF-8 and only UTF-8 (and therefore, 7-bit ASCII). They don't have a text
console,  only a graphical (pixel, not character) terminal, in order to render
Unicode glyphs.

Does the framebuffer support Unicode glyphs? If not, maybe UTF-8 in console is
not worth it.

A reason why UTF-8 dialog breaks so badly may be because the box drawing
characters are represented differently in UTF-8, compared to 8-bit OEM/DOS

------- Comment #15 From Markus Bertheau 2003-07-19 09:34:23 0000 -------
The vga text console on x86 supports at least 512 glyphs.

------- Comment #16 From Carlos Henrique Bauer 2003-07-29 08:06:17 0000 -------
I'm trying to make UTF-8 work on my machine, too. Here is what I have
worked out so far.
 
1) According to some of the above comments, the correct locale data
directory path is /usr/share/locale/*, but when I run "strace program" I
can see the program is looking for the locale files in
/usr/lib/locale/* (glibc-2.3.2-1).
 
For example, if I run "strace vi" I can see the following in the output:

open("/usr/lib/locale/pt_BR.UTF-8/LC_IDENTIFICATION", O_RDONLY)
 
As a matter of fact, after I changed the output of localedef to
/usr/lib/locale, I no longer got "Locale not supported by C library"
messages from programs.

In RedHat 7.2 and 9.0, the UTF-8 data locale directories are stored in /usr/lib/locale.
 
So, what is the correct path for UTF-8 locales?
 
2) I configured the console font as LatArCyrHeb-16. I tested the
border characters with alsamixer and they seem to be fine.
 
The problem: the font is being set by /etc/init.d/consolefont just for
the 1st virtual terminal. In the other ones, all the colored
characteres are incorrectly displayed.

------- Comment #17 From Danny Milosavljevic 2003-07-31 00:29:30 0000 -------
1) in current glibc versions, localedef supports creating locales without
needing the full path (just the name is fine), they also changed the real path
for locales to /usr/lib/locale.

2) ow, I had a problem of not setting every console to utf-8... as a (hacky
whacky) workaround I used:
unicode_start >>/etc/issue

then it worked ;)

But the font was always correctly set here... do you mind attaching your
/etc/init.d/consolefont here so I can take a look what it is doing ?  :)

------- Comment #18 From Neil Watson 2003-08-18 08:46:07 0000 -------
I definitely agree that UTF-8 support should be added to Gentoo.  Having it 
default would be the best.  I've managed to get xterm (using uxterm) to view 
UTF-8 but, things like console and even glibc seem to be compiled without 
support. 

------- Comment #19 From Alexander Winston 2003-08-29 22:40:26 0000 -------
UTF-8 support is certainly one area that is lacking in Gentoo that should not
be. Recently, due to some inexplicable hardware problems with my Gentoo install,
I decided to survey the competition, namely Red Hat Linux 9, aka Shrike. (For
some reason this seems to happen about once a week

------- Comment #20 From Alexander Winston 2003-08-29 22:40:26 0000 -------
UTF-8 support is certainly one area that is lacking in Gentoo that should not
be. Recently, due to some inexplicable hardware problems with my Gentoo install,
I decided to survey the competition, namely Red Hat Linux 9, aka Shrike. (For
some reason this seems to happen about once a week—I suppose I’m really good at
breaking operating systems.) The UTF-8 support is absolutely impeccable—I’ve yet
to find a flaw yet. Consoles, terminals, libraries, applications… They all have
seem to have perfect UTF-8 support. How Red Hat manages to pull off this trick I
may never know, but I sincerely hope it can be mirrored in Gentoo Linux.

------- Comment #21 From Alexander Winston 2003-08-29 22:43:50 0000 -------
DISCLAIMER: Red Hat is not the 

------- Comment #22 From Alexander Winston 2003-08-29 22:43:50 0000 -------
DISCLAIMER: Red Hat is not the “competition”. I’m just using the term so I can
weave an interesting yarn for you folks. :)

------- Comment #23 From Zhen Lin 2003-09-01 05:58:29 0000 -------
How RedHat pulls of this trick? Massive patches. If only there was an easy way
to systematically pull patches out of a SRPM...

In any case, look at bug #27700 for some ebuilds. slang is the first one to
utilise the RH9 patches

------- Comment #24 From Patrick Kursawe 2003-09-11 23:41:28 0000 -------
Danny: I don't find an entry starting with "v0" in my /etc/termcap. On my
system, this file is provided by libtermcap-compat-1.2.3

------- Comment #25 From Danny Milosavljevic 2003-09-25 02:41:56 0000 -------
Patrick: 
Hmm, you are right... massive overhaul in termcap, I'll try if the new termcap
fixes utf-8 for good when I get home today...



------- Comment #26 From Thomas Raschbacher 2003-10-04 07:52:48 0000 -------
hey dannym... any news? :)

------- Comment #27 From Thomas Scheffler 2003-10-28 04:07:28 0000 -------
There is one console input issue (32111) with utf-8 so I add dependency to
that bug here.

------- Comment #28 From Thomas Raschbacher 2003-12-08 23:13:12 0000 -------
is anyone actually working on this?

regards

------- Comment #29 From Rui Malheiro 2003-12-28 06:24:45 0000 -------
I'm also working on getting a full UTF-8 gentoo box. My main complain is glibc
should include as many *.UTF-8 locales as possible. 

Even if precompiling all possible locales is not the solution, during compile
we could test for USE=utf8 and $LANG in order to build the localedef as an
ending step of emerge glibc. Or mabe a use make.conf variable LOCALEDEF.

------- Comment #30 From Danny Milosavljevic 2004-02-12 05:41:53 0000 -------
Status being, with these steps, a fully working utf-8 based system can be
acheived. Now someone modify the ebuilds in portage already :->

Still needed is a ugly bash workaround to make bash's internal readline
recognize utf-8 (in .bash_login: (call a new subshell) bash; thats it)

Also the de keymap has some weirdness with Meta_acute... I don't really know
what it is about, I saw some comment about "the key below the 4", if thats it,
its "E", and AltGr + E = Euro-Sign. This should not be present in de, but only
in de@euro, though. Speculating here.
Can someone shed a light on this ?

As for to-be-used locale configuration, are there plans to add a variable to
make.conf for limiting to-be-installed locales ?
I have dozens of locales I'll never use, however, the locales I want to use
(de_AT.UTF-8) are missing.

comments?

------- Comment #31 From Alexander Jenisch 2004-02-18 08:19:09 0000 -------
i've tried localedef -v -c -i de_AT -f UTF-8 de_AT.UTF-8. shouldn't this make a
de_AT.UTF-8 directory in /usr/share/locale?

------- Comment #32 From Sergey Kuleshov (RETIRED) 2004-02-22 10:29:00 0000 -------
I am also interested in UTF8. So far I have patched slang and mc to get them
working with UTF. But I faced following troubles:

1) In MC I can't enter multibyte characters (those not from acii set)
2) In some applications compiled again ncursesw some (actually most) of the
gettext translated string are not diplayed at all - try centericq with
LC_ALL=ru_RU.UTF-8
3) Border chars stil dont's work :(

------- Comment #33 From Danny Milosavljevic 2004-03-04 07:48:16 0000 -------
Alexander:

no, newer stuff goes into a database file in /usr/lib/locale

You should be able to tell if it works by doing
LANG="de_AT.UTF-8" anygtk2program

if it complains, it doesn't :)

------- Comment #34 From Danny Milosavljevic 2004-03-07 02:37:02 0000 -------
LANG="blahblah" locale charset
is a good way to see if it works

------- Comment #35 From Danny Milosavljevic 2004-03-07 02:40:37 0000 -------
Svyatogor:

hmm... comfirming mc problem... maybe the author knows details ?

as for borderchars, there are two ways to get them working in a linux console:
1) use framebuffer or
2) use a special font, sacrify the bold attribute and have 256 normal chars and 256 line drawing chars... 


------- Comment #36 From Lars Weiler (RETIRED) 2004-03-17 18:05:44 0000 -------
I'm also testing utf-8 on my machine.  The problem I currently run in is
described in bug 20006 (mutt with ncursesw).  mutt compiles fine with it, but
vim not.  So I decided for install both versions of ncurses parallel on my
system.  I don't know, if it is possible with the ebuild, as it has to be
compiled twice (once with --enable-widec and once without).  I read that this
should work on LFS, maybe it's working for us, too?

------- Comment #37 From Heinrich Wendel (RETIRED) 2004-04-11 05:51:00 0000 -------
ncurses and slang change the names of their library when compiled with utf-8
support, should we

a.) symlink them to the old names
b.) install the non-utf8 version as well (like redhat)
c.) don't do anything

------- Comment #38 From Zhen Lin 2004-04-11 05:53:49 0000 -------
Option (b) is good for backwards compatibility, but it means that apps will
compile with non UTF-8 ncurses by default.

Option (a) is not very good... I prefer to use a ldscript. See bug #27700

------- Comment #39 From Heinrich Wendel (RETIRED) 2004-04-11 06:51:11 0000 -------
what are the advantages of a ld-script over a symlink

------- Comment #40 From Zhen Lin 2004-04-11 08:04:37 0000 -------
An ldscript will cause resulting binaries to be linked to the *w name. It will
also cause binaries linked to the non-wide character version to break. The good
thing about this is, if the wide character version is source-compatible but
binary -incompatible (like slang), it will force recompilation, instead of
causing strange bugs. ldscripts are already used by ncurses -- see bug #4411.

But that's just my view.

------- Comment #41 From Heinrich Wendel (RETIRED) 2004-04-24 08:12:57 0000 -------
ok, we really need this settled, can anybody confirm what Zhen said?

------- Comment #42 From Heinrich Wendel (RETIRED) 2004-08-19 09:41:50 0000 -------
i commited an utf-8 enabled ncurses,slang and dialog ebuild. you have to set
unicode in your useflags and emerge it, please test them.

------- Comment #43 From Heinrich Wendel (RETIRED) 2004-08-19 09:43:16 0000 -------
i couldn't patch mc yet, the utf-8 patch seems to be incompatible with the
latest security fixes

------- Comment #44 From Danny Milosavljevic 2004-09-05 11:00:42 0000 -------
bash 2 has a problem with initial locale setting which I only circumvented
earlier.

The correct solution is this patch:
http://lists.debian.or.jp/debian-devel/200210/msg00047.html

symptoms are that in a newly started login shell bash, backspace works wrong,
and if one starts another sub-bash *without* changing anything in that one, it
suddenly worked.

bash 3 has fixed that already.
bash 2 ebuilds should incorperate this patch.

------- Comment #45 From Heinrich Wendel (RETIRED) 2004-09-16 06:45:42 0000 -------
i opened a new bug for the bash problem and fixed mc meanwhile. i will close
this meta bug then. if you have problems with utf8 open a new bug and assigne
it to utf8@gentoo.org

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug