This is my biggest enhancement request yet. Basically: USE="utf8" builds ncursesw. (Compatibility symlinks are installed, no worries) USE="utf8" enables ncursesw support in dialog. Slang is installed with mandatory UTF-8 support - it might be neccessary to rebuild apps because of slight ABI changes. libiterm and fbiterm give the console the ability to display halfwidth and fullwidth Unicode characters - with more than 512 glyphs. unifont is a Unicode bitmap font for X11, a dependency of libiterm. netkit-telnetd is hardcoded to use ncurses (or was it curses?) - fixed. There might be other apps that are hardcoded to use the non-w version of ncurses - please provide an ldscript to fix - I don't grasp how they work. The ncursesw component can be considered a dup of bug 25992 I think Gentoo should also form a project to enable Unicode/UCS support across the board. Reproducible: Always Steps to Reproduce:
Created attachment 16902 [details] ebuilds.tar.bz2
Created attachment 16945 [details] ncurses-5.3-r1.ebuild Updated ebuild that generates libxxx.so to libxxxw.so ldscripts for more libs
the forming of a group/herd to manage utf8 issues seems like a good idea. it would also be related to cjk stuff i believe. i'll adopt this bug for the moment, but if anyone has any ideas or would like to implement this before me, i would also appreciate it.
That sounds interesting. I couldn't make fbiterm display Unicode characters (although X counterpart xiterm works fine) so I'd be pleased if these ebuilds contribute to UTF-8 support for Gentoo. Nevertheless I stopped making fbiterm ebuild because jfbterm (app-i18n/jfbterm) now supports UTF-8 along with SJIS and other stuff (jfbterm is pretty faster than fbiterm -- fbiterm is very slow). Alastair, I'll take libiterm, fbiterm and unifont part for you.
I emerged your libiterm, fbiterm and unifont but no multibyte characters displayed. It opens unifont (it is hardcoded to the full path of unifont) and loads it into memory when I run fbiterm, but I see only ASCII characters. (This behaviour is just the same as I use my version of fbiterm) Are you able to display multibyte characters with this ebuild?
Did you remember to run unicode_start? And what methodology did you use to test; cat UTF-8-file? (That seems to work best) It works for me.
Thanks. I didn't know about unicode_start. Anyhow, unicode_start doesn't solve the problem with fbiterm. After I run unicode_start I can see UTF-8 text with console (but I'm using jconsole patches, which adds EUC-JP and partial UTF-8 support for native framebuffer), but once I ran fbiterm, I was not able to see any multibyte characters. I tested it with cat UTF-8.txt and w3m-m17n with UTF-8 display code.
Well, it works for me; out of the box UTF-8 support. Just for information: LANG=en_GB.UTF-8 LANGUAGE=en_GB.UTF-8 LC_ALL=en_GB.UTF-8 unicode_start before fbiterm
Created attachment 17554 [details] sample text for UTF-8
It doesn't work for me either (I tested both ja_JP.UTF-8 and en_GB.UTF-8). Can you see Japanese text in attached file with fbiterm? I can see only ' UTF-8' in the second line with fbiterm. (I wrote 2 lines and each line contains Japanese characters) Just FYI: rico% emerge info Portage 2.0.49-r3 (default-x86-1.4, gcc-3.2.3, glibc-2.3.2-r1, 2.4.22) ================================================================= System uname: 2.4.22 i686 Pentium III (Coppermine) ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-O -mcpu=pentium3 -march=i586 -funroll-loops -fomit-frame-pointer -pipe" CHOST="i686-gentoo-linux" COMPILER="gcc3" CONFIG_PROTECT="/etc /var/qmail/control /usr/share/config /usr/kde/2/share/config /usr/kde/3/share/config" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" CXXFLAGS="-O -mcpu=pentium3 -march=i586 -funroll-loops -fomit-frame-pointer -pipe" DISTDIR="/home/distfiles" FEATURES="sandbox buildpkg ccache digest cvs -autoaddcvs" GENTOO_MIRRORS="ftp://sb.itc.u-tokyo.ac.jp/GENTOO http://gentoo.oregonstate.edu" MAKEOPTS="-j2" PKGDIR="/home/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/home/gentoo-x86" SYNC="rsync://sb.itc.u-tokyo.ac.jp/gentoo-portage" USE="x86 oss apm avi crypt cups encode foomaticdb gif jpeg libg++ mad mikmod mpeg ncurses nls pdflib png quicktime spell truetype xml2 xmms xv zlib gdbm berkdb slang readline arts svga tcltk java ruby sdl gpm tcpd pam libwww perl python esd imlib oggvorbis qt kde opengl cdr X gtk gtk2 -gnome -alsa cjk maildir usagi ipv 6 -motif canna -freewnn ssl mmx sse emacs tetex" rico% ldd /usr/bin/fbiterm libm.so.6 => /lib/libm.so.6 (0x412a4000) libXfont.so.1 => /usr/X11R6/lib/libXfont.so.1 (0x41145000) libiterm.so.1 => /usr/lib/libiterm.so.1 (0x40012000) libz.so.1 => /usr/lib/libz.so.1 (0x4133c000) libc.so.6 => /lib/libc.so.6 (0x41016000) libfribidi.so.0 => /usr/lib/libfribidi.so.0 (0x40026000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x41000000)
Created attachment 17561 [details] photo Did you actually create the en_GB.UTF-8 locale? Check. For some reason, it works for me. (In fact, most things do - I wonder why.) Photographic evidence. Could you at least try to explain the (mis)output?
Finally I succeeded! Thanks a lot. I did create ja_JP.UTF-8 locale by # localedef -v -c -i jp_JP -f UTF-8 ja_JP.UTF-8 but got the output described in Comment #10. It looks like % cat utf-8 # prompt # blank line UTF-8 # only see ' UTF-8' % # prompt again I tried it again today. Instruction follows: 1. create en_GB.UTF-8 # localedef -v -c -i en_GB -f UTF-8 en_GB.UTF-8 2. set env % LC_CTYPE=en_GB.UTF-8 % LC_ALL=en_GB.UTF-8 % LANG=en_GB.UTF-8 % LANGUAGE=en_GB.UTF-8 % export LC_CTYPE LC_ALL LANG LANGUAGE 3. unicode_start % unicode_start 4. fbiterm % fbiterm 5. cat ;-) % cat utf-8.txt
Finally I succeeded! Thanks a lot. I did create ja_JP.UTF-8 locale by # localedef -v -c -i jp_JP -f UTF-8 ja_JP.UTF-8 but got the output described in Comment #10. It looks like % cat utf-8 # prompt # blank line UTF-8 # only see ' UTF-8' % # prompt again I tried it again today. Instruction follows: 1. create en_GB.UTF-8 # localedef -v -c -i en_GB -f UTF-8 en_GB.UTF-8 2. set env % LC_CTYPE=en_GB.UTF-8 % LC_ALL=en_GB.UTF-8 % LANG=en_GB.UTF-8 % LANGUAGE=en_GB.UTF-8 % export LC_CTYPE LC_ALL LANG LANGUAGE 3. unicode_start % unicode_start 4. fbiterm % fbiterm 5. cat ;-) % cat utf-8.txt ¤Û¤²¤Û¤² ¤³¤ì¤Ï UTF-8 ¤Î¥Æ¥¹¥È¤Ç¤¹¡£ After I become able to see UTF-8 with en_GB.UTF-8, I also can do the same thing with ja_JP.UTF-8 ... strange, but it works. I don't know why things didn't work yesterday (I did the same thing except I created en_GB.UTF-8 locale today). I'll commit libiterm and fbiterm shortly.
Three (libiterm, fbiterm, unifont) down, three (ncursesw, slang, dialog) to go. Related bugs: bug #17282 for TeX. bug #18735 as a meta-bug. bug #20006 for ncursesw (Duped here) bug #20854 Multibyte encoding for ghostscript (no fix).
Oops, bug #18375 not #18735
I looked into your unifont ebuild. I think there are several things we should solve before I commit it into Portage tree. First, I think it is better to set the HOMEPAGE (perhaps http://czyborra.com/ ? http://dvdeug.dhis.org/unifont.html isn't available atm). Sometimes it is not clear what is the main homepage of the software, but we should try to find one. Second, you must choose at least one lisence from /usr/portage/licenses and set LICENSE (I suggest "freedist" in this case). It must match a filename in that directory (see man 5 ebuild) Third, you must set DEPEND to ensure everyone can build the software. If you look at Makefile Debian's patch created, you will find that it uses perl to convert hex file into bdf (requires dev-lang/perl) and bdftopcf to convert bdf into pcf (requires virtual/x11). Fourth, it is not a good idea to install the font to /usr/X11R6/lib/X11/fonts/misc because Gentoo Policy follows FHS standard. FHS standard requires all files under /usr/X11R6 should belong only to XFree86 distribution. So we chose /usr/share/fonts for bitmap fonts and /usr/share/fonts/ttf for TrueType fonts respectively (not all fonts in Portage follow FHS, but that doesn't mean we can ignore it). Lastly, I think you forgot to add mkfontdir in this ebuild (Debian does it in their postrm sh script). You need to add `mkfontdir ${D}/usr/share/fonts/${PN}` or whatever in your ebuild to get the right fonts.dir. It's not necessary to have fonts.dir for fbiterm to use unifont, but I think it is fair enough to create one since not only fbiterm will use unifont (for example, I used it for Opera 5.x at that time).
I examined libiterm. In these ebuilds, you are correct in setting LICENSE to CPL, but as I wrote in the previous comment, you must choose one from /usr/portage/licenses. It should be either "CPL-0.5" or "CPL-1.0", "CPL" is not allowed (in this case, CPL-1.0 is the right one). I noticed that you applied a patch from Debian, but as I looked through the whole patch I don't think we need it for Gentoo. The patch is for Debian (including Debian GNU/NetBSD ;-p) to compile iterm correctly. If you find something significant in this patch to improve the software for Gentoo, please correct me about it. Also you correctly set dev-libs/fribidi as a dependency for libiterm, but you cleared it in RDEPEND. If you run ldd to libiterm, you will get % ldd /usr/lib/libiterm.so libfribidi.so.0 => /usr/lib/libfribidi.so.0 (0x40025000) libc.so.6 => /lib/libc.so.6 (0x41016000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000) And this is the reason you have to have dev-libs/fribidi in RDEPEND list. As for fribidi, iterm supports either pls (comes with iterm distribution) or fribidi. We don't need to force people to use fribidi because pls is enabled by default. Rather, I think we will use fribidi if bidi USE flag is set. Currently, fribidi stays local USE flag for fvwm but we might ask it for gentoo-dev to move it into global USE flag.
Well, let's turn to fbiterm. Almost all look fine for me. DEPEND should be >=app-i18n/libiterm-${PV} rather than >=app-i18n/libiterm-${PV}* See http://dev.gentoo.org/~liquidx/ebuildmistakes.html for common mistakes ;-) In addtion to it, I think it's better to have >=sys-apps/sed-4 as a dependency because you used sed -i (inplace) option in src_unpack() section. Not all sed support -i (for example, FreeBSD's sed didn't support -i) and considering Portage will extend to other platforms we better have >=sys-apps/sed-4 as its dependency (it's not so hard to rewrite the ebuild without -i, though. It's up to you).
Sorry, I've become a little lax when it comes to writing ebuilds. RDEPEND="" should be RDEPEND="${DEPEND}" or not there at all. fbiterm needs libXfont - I have left out X as a dependency. Also, it might be useful to have a USE flag for utempter for utmp access. The debian patch seems to only apply to configure - I came across this through debian, may as well give it the benefit of the doubt. My >=app-i18n/libiterm-${PV}* dependency is incorrect - use =app-i18n/libiterm-${PV}* instead - the two should be in lockstep, like gnustep-{make,base}. Rewrite the sed modification for sed 3, if what you say is true.
Committed media-fonts/unifont. I'll commit libiterm/fbiterm later.
has the working group on gentoo UTF suppor been formed? i haven't realize that poor UTF support was gentoo's fault and not mine untill i read this and other bugs. reaction of Latin-speaking community in many related bugs is disappointing, but i guess it's our, non-latin's, job to make gentoo as internationalized as RedHat and friends. i wonder why the ebuilds proposed here do not appear in the portage tree? is there some central UTF-related place out there? BTW, i have utf support in xterm and had no problem with displaying little japanese fragment (well, at least i think it was japanese and i think i had no problem ;). i can see all examples in quickbrownfox.txt and in UTF-8-demo.txt (don't remember where did i d/l them from), off course i canot judge wether they are displayed correctly but looks good.
Created attachment 19731 [details] utf-8 text in many different languages + special symbols. use -misc-fixed-medium-r-*-*-18-*-*-*-*-*-iso10646-* to display this text in e.g. xterm
Created attachment 19732 [details] another sample utf-8 text
i'm afraid we've been falling behind in UTF-8 support. luckily we are having an i18n subproject being formed that definitely have this as one of their todo's. i too would really like some decent UTF-8 support, and i've already got some patches ready for utf-8 locale generation for glibc. so rest assure, we are trying to sort this UTF-8 business out as soon as we can.
Thanks, that's great to know. http://www.columbia.edu/kermit/utf8.html http://www.cl.cam.ac.uk/~mgk25/ucs/examples/ http://www.macchiato.com/unicode/Unicode_transcriptions.html Some "nice" example pages. From the UTF-8 FAQ. A 'I feel lucky' search should find it.
Yeah, I'd like to commit libiterm and fbiterm as soon as I get response fromseemant about his intention to put libiterm-mbt on x11-libs. I sent him amail about it three weeks ago and I'm waiting for him to reply...Also I want to verify utf8 patch for ncurses since recent nvi seems to need it in order to display UTF-8 text (but I haven't tested it).
ok, sorry this has been sitting around for so long, i think after 2004.0 we need to target this. the unicode USE flag has been approved, so i think we can use it to signify utf-8 support. i would much prefer these ebuilds be submitted as seperate bug reports and linked to this one rather than having one massive report. also shorting the bug title since unifont and fbiterm are in portage now.
*** Bug 51634 has been marked as a duplicate of this bug. ***
Does anyone in cjk herd want to add unicode USE flag for ncurses and slang? If no (and there is no objection), I'll add them this weekend.
I believe that nano has some bugs when used with ncursesw. I've since switched to slang.
ncurses, slang and dialog are now in portage with utf-8 support, the useflag for that is unicode. iterm seems not directly related, it's just a extra ebuild so i think you should fill a new bug report for this.