Hi! Is it me or are most of the packages compiled without UTF-8 support (even glibc 2.2 *and* 2.3's locales for that are not installed! O_o)... If there already is a USE flag for that, I apologize. Otherwhise I'd suggest adding "utf8" to the list of valid USE options... (intended to be set before bootstrap for it to take global effect?) then add to ncurses ebuilds: IUSE="utf8" use utf8 && myconf="${myconf} --enable-widec" and to dialog ebuilds: IUSE="utf8" w="" use utf8 && w="w" econf --with-ncurses"${w}" || die <--- instead of just "--with-ncurses" btw Dialog's dialogs in utf8 still look like poo, but at least the windows are properly aligned now (actually rectangular :->)... maybe font issue. The /etc/rc.conf's KEYMAP has already been changed to support unicode (KEYMAP="-u <whatevermap>"). As most of the newer packages already incorperate optional utf8 support, all that is left is turning it on (if optional at all, and not already required). Although ncurses for one does the weird thing of renaming it's library when utf8 is turned on O_o ow well, I guess it cant be helped. (I linked libncursesw.so to libncurses.so and now it works even with apps not recompiled after that - why shouldn't it, all programs have to check the locale (f. e. LANG variable) on STARTUP to decide if to use utf8 or not, not at compile time [see posix standard] ;)) I "fixed" glibc's utf8 locales for the time being using: localedef -v -c -i de_AT -f UTF-8 de_AT.UTF-8 that did not work(?), even with cwd /usr/share/locale but then with localedef -v -c -i de_AT -f UTF-8 /usr/share/locale/de_AT.UTF-8 which did partially work (Xfree/Gtk/Qt/all stopped complaining about unknown locale, and umlauts are now two bytes in length as they should be - checked using jstar in a utf8 xterm, which isn't utf8 aware ;)) I dont understand the difference between these two commands in terms of executed stuff, but there seems to be an inconsistency, undocumented change or bug somewhere... (or I just overlooked something ;)) Bash has an outstanding "bug" of not respecting the LANG="de_AT.UTF-8" set within a shell (where to set it earlier?) for its own readline "lib", until spawning of a child shell. In the next subshell(s) it works like a breeze. Maybe there is a "reread config" builtin command though ? *lost* Also, for setting the LANG environment variable, is there a config file? If not, I'd suggest putting: /etc/profile.d/lang: source /etc/conf.d/lang and /etc/conf.d/lang: LANG="whatever" Also would be possible user-based, but that seems overkill to me. Thats all I can think of for now :-> laters :)
also for the program "screen" to work with color, create file /etc/profile.d/linux-utf8: [ "${TERM}" = "linux" ] && export TERM="linux-utf8" This of course assumes that utf8 is used, which isn't the case when the proposed "USE" flag is not set. Dunno how to check that in bash. Maybe checking in the corresponding ebuild and just creating the file linux-utf8 when the use flag is set is enough. Hey, is anyone actually reading this ? :D
also for curses related programs to work with the last comment, cp /usr/share/terminfo/l/linux /usr/share/terminfo/l/linux-utf8 phew...
still looking into why the border characters (of the program "dialog", for example) look fine in a utf8 xterm, but wrong in a utf8 console... weird... Can't seem to find any font which does it right(tm).
To make screen working correctly in an utf8 xterm I had to add the following to /etc/termcap and remove the original "v0|term" entry: v0|xterm|xterm in Unicode (UTF-8) mode:\ :am:km:mi:ms:xn:\ :co#80:it#8:li#24:\ :AL=\E[%dL:DC=\E[%dP:DL=\E[%dM:DO=\E[%dB:IC=\E[%d@:\ :K1=\EOw:K2=\EOu:K3=\EOy:K4=\EOq:K5=\EOs:LE=\E[%dD:\ :RI=\E[%dC:UP=\E[%dA:ae=^O:al=\E[L:as=^N:bl=^G:bt=\E[Z:\ :cd=\E[J:ce=\E[K:cl=\E[H\E[2J:cm=\E[%i%d;%dH:cr=^M:\ :cs=\E[%i%d;%dr:ct=\E[3g:dc=\E[P:dl=\E[M:do=^J:ec=\E[%dX:\ :ei=\E[4l:ho=\E[H:ic=\E[@:im=\E[4h:\ :is=\E7\E[r\E[m\E[?7h\E[?1;3;4;6l\E[4l\E8\E>:\ :k0=\E[21~:k1=\E[11~:k2=\E[12~:k3=\E[13~:k4=\E[14~:\ :k5=\E[15~:k6=\E[17~:k7=\E[18~:k8=\E[19~:k9=\E[20~:\ :kD=\E[3~:kI=\E[2~:kN=\E[6~:kP=\E[5~:kb=\177:kd=\EOB:\ :ke=\E[?1l\E>:kh=\EOH:kl=\EOD:kr=\EOC:ks=\E[?1h\E=:\ :ku=\EOA:le=^H:md=\E[1m:me=\E[m\017:mr=\E[7m:nd=\E[C:\ :rc=\E8:sc=\E7:se=\E[27m:sf=^J:so=\E[7m:sr=\EM:st=\EH:ta=^I:\ :te=\E[2J\E[?47l\E8:ti=\E7\E[?47h:ue=\E[24m:up=\E[A:\ :us=\E[4m:vb=\E[?5h\E[?5l:ve=\E[?25h:vi=\E[?25l:\ :vs=\E[?25h: Whereas the space at the beginning of every line here is "<space><tab>" Now I can do: xterm -u8 -e screen and that's good :D
Danny: doesn't screen -U do what you want?
no. its the same as without -U. What does partially do what I want is export TERM="xterm-utf8" then screen but many MANY programs cant cope with that setting (did I mention MANY already? ;)) However, with that termcap modification I didn't have a single problem until now (my "xterm" launcher does now include "screen" per default, pretty cool ;))
to fix the de-latin1 keymap, comment out the line alt keycode 13 = Meta_acute in /usr/share/keymaps/i386/qwertz/de-latin1.map so that it looks like so: #alt keycode 13 = Meta_acute
What happened? Nobody interested in UTF-8 support? If you are interested, tell me how I can help better :) I've already helped three people set up their gentoo with utf-8 support, but after the third time it get kinda repeative and boring ;) what about fixing the official gentoo huh ? :) (ok, maybe moving into stable *after* 1.4, but what about experimental ebuilds for glibc, baselayout, ncurses and dialog which fix utf-8 ?) Thanks...
Hi, I'm interested in utf-8 support, too. I really cannot imagine why gentoo lacks support for it as every major distribution has implemented it. Is this a feature? Does somebody at gentoo already cares about it? I mean this bug is quite "old" now.
i'm interested in getting utf-8 support on gentoo. i was wondering how far ska-fan has got with this. if he's not interested, i wouldn't mind this being assigned to me so i can work on this a bit more.
Feel free.
How long do you think would it take to implement this and provide masked ebuilds I'm willing to test it when it comes to that point. I find it great that somebody seems to care. Post a message when you're ready.
yay yay :) Hi, liquidx Ok, since the above stuff is rather unordered, I will make a summary of the valid points to ease the task: 1) The ncurses ebuild needs --enable-widec, and a link from libncursesw.so to libncurses.so 2) The /etc/rc.conf's KEYMAP has already been changed to support unicode (KEYMAP="-u <whatevermap>"). 3) to create an UTF-8 locale: localedef -v -c -i de_AT -f UTF-8 de_AT.UTF-8 (DOES work now) (of course replace de_AT by your favourite languages ^^) 4) set LANG to "??_??.UTF-8"; strictly speaking, there should be a way to set LANG before bash is even started, i.e. between login and starting bash, dunno how. bash internal readline is rather spooky with this. 5) Modified termcap entry for xterm (Comment #4), this fixed loads of things 6) to fix the de-latin1 keymap, comment out the line "alt keycode 13 = Meta_acute" in "/usr/share/keymaps/i386/qwertz/de-latin1.map" (Comment #7) 7) dialog ebuild needs --with-ncursesw, but is evil enough in console even with it. (not a single problem in xterms tho) Now invalid points: - Forget that cp terminfo crap I wrote (Comment #2) - localedef miraculously does not need full path anymore now ;) - forget that export TERM="linux-utf8" (Comment #1), I was just stupid TODO: - Console UTF-8 border characters (Comment #3)... console is evil... I hope this helps :)
Unicode in the 0xb8000/0xb0000 console is not needed. The console only supports 256 glyphs! Plan 9 (plan9.bell-labs.com) takes a different approach - EVERYTHING supports UTF-8 and only UTF-8 (and therefore, 7-bit ASCII). They don't have a text console, only a graphical (pixel, not character) terminal, in order to render Unicode glyphs. Does the framebuffer support Unicode glyphs? If not, maybe UTF-8 in console is not worth it. A reason why UTF-8 dialog breaks so badly may be because the box drawing characters are represented differently in UTF-8, compared to 8-bit OEM/DOS
The vga text console on x86 supports at least 512 glyphs.
I'm trying to make UTF-8 work on my machine, too. Here is what I have worked out so far. 1) According to some of the above comments, the correct locale data directory path is /usr/share/locale/*, but when I run "strace program" I can see the program is looking for the locale files in /usr/lib/locale/* (glibc-2.3.2-1). For example, if I run "strace vi" I can see the following in the output: open("/usr/lib/locale/pt_BR.UTF-8/LC_IDENTIFICATION", O_RDONLY) As a matter of fact, after I changed the output of localedef to /usr/lib/locale, I no longer got "Locale not supported by C library" messages from programs. In RedHat 7.2 and 9.0, the UTF-8 data locale directories are stored in /usr/lib/locale. So, what is the correct path for UTF-8 locales? 2) I configured the console font as LatArCyrHeb-16. I tested the border characters with alsamixer and they seem to be fine. The problem: the font is being set by /etc/init.d/consolefont just for the 1st virtual terminal. In the other ones, all the colored characteres are incorrectly displayed.
1) in current glibc versions, localedef supports creating locales without needing the full path (just the name is fine), they also changed the real path for locales to /usr/lib/locale. 2) ow, I had a problem of not setting every console to utf-8... as a (hacky whacky) workaround I used: unicode_start >>/etc/issue then it worked ;) But the font was always correctly set here... do you mind attaching your /etc/init.d/consolefont here so I can take a look what it is doing ? :)
I definitely agree that UTF-8 support should be added to Gentoo. Having it default would be the best. I've managed to get xterm (using uxterm) to view UTF-8 but, things like console and even glibc seem to be compiled without support.
UTF-8 support is certainly one area that is lacking in Gentoo that should not be. Recently, due to some inexplicable hardware problems with my Gentoo install, I decided to survey the competition, namely Red Hat Linux 9, aka Shrike. (For some reason this seems to happen about once a week
UTF-8 support is certainly one area that is lacking in Gentoo that should not be. Recently, due to some inexplicable hardware problems with my Gentoo install, I decided to survey the competition, namely Red Hat Linux 9, aka Shrike. (For some reason this seems to happen about once a weekI suppose Im really good at breaking operating systems.) The UTF-8 support is absolutely impeccableIve yet to find a flaw yet. Consoles, terminals, libraries, applications They all have seem to have perfect UTF-8 support. How Red Hat manages to pull off this trick I may never know, but I sincerely hope it can be mirrored in Gentoo Linux.
DISCLAIMER: Red Hat is not the
DISCLAIMER: Red Hat is not the competition. Im just using the term so I can weave an interesting yarn for you folks. :)
How RedHat pulls of this trick? Massive patches. If only there was an easy way to systematically pull patches out of a SRPM... In any case, look at bug #27700 for some ebuilds. slang is the first one to utilise the RH9 patches
Danny: I don't find an entry starting with "v0" in my /etc/termcap. On my system, this file is provided by libtermcap-compat-1.2.3
Patrick: Hmm, you are right... massive overhaul in termcap, I'll try if the new termcap fixes utf-8 for good when I get home today...
hey dannym... any news? :)
There is one console input issue (32111) with utf-8 so I add dependency to that bug here.
is anyone actually working on this? regards
I'm also working on getting a full UTF-8 gentoo box. My main complain is glibc should include as many *.UTF-8 locales as possible. Even if precompiling all possible locales is not the solution, during compile we could test for USE=utf8 and $LANG in order to build the localedef as an ending step of emerge glibc. Or mabe a use make.conf variable LOCALEDEF.
Status being, with these steps, a fully working utf-8 based system can be acheived. Now someone modify the ebuilds in portage already :-> Still needed is a ugly bash workaround to make bash's internal readline recognize utf-8 (in .bash_login: (call a new subshell) bash; thats it) Also the de keymap has some weirdness with Meta_acute... I don't really know what it is about, I saw some comment about "the key below the 4", if thats it, its "E", and AltGr + E = Euro-Sign. This should not be present in de, but only in de@euro, though. Speculating here. Can someone shed a light on this ? As for to-be-used locale configuration, are there plans to add a variable to make.conf for limiting to-be-installed locales ? I have dozens of locales I'll never use, however, the locales I want to use (de_AT.UTF-8) are missing. comments?
i've tried localedef -v -c -i de_AT -f UTF-8 de_AT.UTF-8. shouldn't this make a de_AT.UTF-8 directory in /usr/share/locale?
I am also interested in UTF8. So far I have patched slang and mc to get them working with UTF. But I faced following troubles: 1) In MC I can't enter multibyte characters (those not from acii set) 2) In some applications compiled again ncursesw some (actually most) of the gettext translated string are not diplayed at all - try centericq with LC_ALL=ru_RU.UTF-8 3) Border chars stil dont's work :(
Alexander: no, newer stuff goes into a database file in /usr/lib/locale You should be able to tell if it works by doing LANG="de_AT.UTF-8" anygtk2program if it complains, it doesn't :)
LANG="blahblah" locale charset is a good way to see if it works
Svyatogor: hmm... comfirming mc problem... maybe the author knows details ? as for borderchars, there are two ways to get them working in a linux console: 1) use framebuffer or 2) use a special font, sacrify the bold attribute and have 256 normal chars and 256 line drawing chars...
I'm also testing utf-8 on my machine. The problem I currently run in is described in bug 20006 (mutt with ncursesw). mutt compiles fine with it, but vim not. So I decided for install both versions of ncurses parallel on my system. I don't know, if it is possible with the ebuild, as it has to be compiled twice (once with --enable-widec and once without). I read that this should work on LFS, maybe it's working for us, too?
ncurses and slang change the names of their library when compiled with utf-8 support, should we a.) symlink them to the old names b.) install the non-utf8 version as well (like redhat) c.) don't do anything
Option (b) is good for backwards compatibility, but it means that apps will compile with non UTF-8 ncurses by default. Option (a) is not very good... I prefer to use a ldscript. See bug #27700
what are the advantages of a ld-script over a symlink
An ldscript will cause resulting binaries to be linked to the *w name. It will also cause binaries linked to the non-wide character version to break. The good thing about this is, if the wide character version is source-compatible but binary -incompatible (like slang), it will force recompilation, instead of causing strange bugs. ldscripts are already used by ncurses -- see bug #4411. But that's just my view.
ok, we really need this settled, can anybody confirm what Zhen said?
i commited an utf-8 enabled ncurses,slang and dialog ebuild. you have to set unicode in your useflags and emerge it, please test them.
i couldn't patch mc yet, the utf-8 patch seems to be incompatible with the latest security fixes
bash 2 has a problem with initial locale setting which I only circumvented earlier. The correct solution is this patch: http://lists.debian.or.jp/debian-devel/200210/msg00047.html symptoms are that in a newly started login shell bash, backspace works wrong, and if one starts another sub-bash *without* changing anything in that one, it suddenly worked. bash 3 has fixed that already. bash 2 ebuilds should incorperate this patch.
i opened a new bug for the bash problem and fixed mc meanwhile. i will close this meta bug then. if you have problems with utf8 open a new bug and assigne it to utf8@gentoo.org