Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 77242 - Submission: "Using UTF-8 with Gentoo"
Summary: Submission: "Using UTF-8 with Gentoo"
Status: RESOLVED FIXED
Alias: None
Product: [OLD] Docs-user
Classification: Unclassified
Component: Submit New (show other bugs)
Hardware: All All
: High enhancement (vote)
Assignee: Xavier Neys (RETIRED)
URL: http://dev.gentoo.org/~slarti/utf-8-g...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-09 09:02 UTC by Tom Martin (RETIRED)
Modified: 2005-02-03 04:40 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Martin (RETIRED) gentoo-dev 2005-01-09 09:02:48 UTC
Hi all,

In the past few weeks I've been working on documentation for UTF-8 with Gentoo. I think it is more or less feature complete, but I would really appreciate an editor, and then hopefully it can get into the GDP.

Thanks,
Tom
Comment 1 Tom Martin (RETIRED) gentoo-dev 2005-01-09 09:03:48 UTC
Err, just in case anyone misses it, the draft of the document is in the URL field, and I may as well paste it here:

http://dev.gentoo.org/~slarti/utf-8-guide/utf-8.html

Comment 2 Flammie Pirinen (RETIRED) gentoo-dev 2005-01-09 13:15:00 UTC
GTK+ version 1 needs similar hackery with utf-8 fonts as xlib, ie. you have to specify iso-10646-1 fontspec in ~/.gtkrc or things will break. And things will also break if application settings specify other fonts, you can test it with xmms or sylpheed's gtk+1 versions (the latter are becoming obsolete though).

Also
"""
By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a Scandinavian '
Comment 3 Flammie Pirinen (RETIRED) gentoo-dev 2005-01-09 13:15:00 UTC
GTK+ version 1 needs similar hackery with utf-8 fonts as xlib, ie. you have to specify iso-10646-1 fontspec in ~/.gtkrc or things will break. And things will also break if application settings specify other fonts, you can test it with xmms or sylpheed's gtk+1 versions (the latter are becoming obsolete though).

Also
"""
By pressing AltGr, Shift and [ at once, releasing them, and then pressing a, a Scandinavian 'å' is produced. Similarly, when I press AltGr, Shift and [ at once, release only the [, and then press it again, '˚' is produced. This is useful as a degree symbol.
"""
The degree symbol part is incorrect. At least on my keyboard pressing dead key for ring above and space produces U+02DA, which might look in some fonts like degree symbol (U+00B0), but it isn't, and software looking for a degree symbol wouldn't recognize it. Furthermore, the symbols have different line breaking properties in unicode, which may cause typographical problems.

I don't know if it's worth of mentioning, but many East Asian languages, which haven't moved to latin characters, will benefit from utf-16 instead of utf-8.

For application part it might need noting, that UTF-8 is still regarded abusive in most of Usenet and IRC for example.

Otherwise it looks kind of good to me, didn't check for grammar though, because I'm probably not the best for it. 
Comment 4 Tom Martin (RETIRED) gentoo-dev 2005-01-10 08:42:36 UTC
Okay, I added in the things you mentioned Flammie, I really appreciate the suggestions.

Still TODO:
-- Other input methods (?)
-- Tidy section on dead keys; it's a grammatical mess
-- Any other suggestions for things that need to be covered (?)
Comment 5 Xavier Neys (RETIRED) gentoo-dev 2005-01-10 09:28:00 UTC
Very nice guide indeed.

You might want to mention recode after iconv. recode does in-place recoding which iconv does not iirc.

About utf8 support on the console: can you type utf8 characters? Last time I tried, it did not work. If it does not, it should be mentioned. If it does work, by all means, please, tell me. FYI, displaying utf8 on my console does work.

Input methods: please mention which keyboard/config they apply to. There are so many possibilities.
FYI, none of that works here. I'm still searching for the 5 non-alpha keys to the left of my enter key :-) (I have   J K L ; ' <enter>  )
I type 
Comment 6 Xavier Neys (RETIRED) gentoo-dev 2005-01-10 09:28:00 UTC
Very nice guide indeed.

You might want to mention recode after iconv. recode does in-place recoding which iconv does not iirc.

About utf8 support on the console: can you type utf8 characters? Last time I tried, it did not work. If it does not, it should be mentioned. If it does work, by all means, please, tell me. FYI, displaying utf8 on my console does work.

Input methods: please mention which keyboard/config they apply to. There are so many possibilities.
FYI, none of that works here. I'm still searching for the 5 non-alpha keys to the left of my enter key :-) (I have   J K L ; ' <enter>  )
I type åÅ (angstrom symbols) with meta-a-a and meta-A-A, and ° (degree) with dead_circumflex-0 or meta-*-0
Comment 7 Tom Martin (RETIRED) gentoo-dev 2005-01-10 12:37:11 UTC
I still need to tinker around with the console to get dead keys or something similar working there -- I'm in the same boat as you as far as that goes.

Writing the input methods section is very confusing. I'm going to experiment with some other European layouts in the next couple of days, and try to figure out Xkb some more. The non-alphabetical keys I was referring to the left of the enter key varies with layout.. on a UK layout, they are:

[ ] ; ' #

This is a flaw in the guide. It's too layout specific, and I don't know how dead keys vary between locale.

I'm really struggling with this input methods section. I'm not sure how to set it out and what to put it in it.
Comment 8 Tom Martin (RETIRED) gentoo-dev 2005-01-10 12:46:26 UTC
Okay, I found five minutes free, and ran "setxkbmap se" as an experiment.

Deadkeys work pretty much completely different on a Swedish layout. I can only presume there is this much variation between other layouts. I'm honestly not sure what do about the input methods section.

Does anyone have any ideas?
Comment 9 Flammie Pirinen (RETIRED) gentoo-dev 2005-01-10 13:35:28 UTC
If you want to study different keymaps without changing them, you can use xkeycaps application to locate the keys. Of course it might also be useful to tell how people can customize their keymaps and use multiple keymaps with modern x's (well, now that XFree is gone, all X's support that).

And the Swedish/Finnish keymap isn't all that different with regard to dead keys, at least I didn't have much problems following the explanation. 

As far as other input methods are concerned, I've only used the pluggable ones that Gnome has to offer (you get a list of them behind context menus in almost every HIGged Gnome app in every text input element), and mostly used only the standard and IPA, but also a bit of cyrillic (translitterated) and such. The plugin input methods such as uim might of course be worth of mentioning.

Of course this is another quite large subject and it might go bit off topic to cover it so widely in an UTF-8 documentation.
Comment 10 Philip Nilsson 2005-01-24 05:42:08 UTC
"app-misc/screen supports UTF-8 too, when invoked as screen -u or the following is put into the ~/.screenrc: defutf8 on"

Defutf8 is not enough, one must invoke screen with the -U option to get full UTF8 support. At least for the console.
Comment 11 Tom Martin (RETIRED) gentoo-dev 2005-01-24 08:53:05 UTC
Philip, I'm not sure this is the case. From screen(1):

        -U   Run screen in UTF-8 mode. This option tells screen that your  ter-
             minal sends and understands UTF-8 encoded characters. It also sets
             the default encoding for new windows to `utf8'.

And:

        defutf8 on|off
        
        Same as the utf8 command except that the default setting for  new  win-
        dows  is  changed.  Initial  setting is `on' if screen was started with
        "-U", otherwise `off'.
Comment 12 Alexander Simonov 2005-01-31 13:17:06 UTC
I have write some guide for recruit test to utf8 team :)
http://nk.ukrpack.net/~alexx/unicode-guide.html
Comment 13 Xavier Neys (RETIRED) gentoo-dev 2005-01-31 15:55:45 UTC
Simonov's xml is at http://nk.ukrpack.net/~alexx/unicode-guide.xml

So we now have two competing guides.
PLease sort them out and integrate them.
We should end up with a very nice guide indeed.

Simonov, please do read http://www.gentoo.org/doc/en/xml-guide.xml if you want to sumbit documentation to the doc team. If you have any question or need any help, please subscribe to our ML (gentoo-doc@g.o) and ask.

Thanks for your efforts.
Comment 14 Alexander Simonov 2005-02-01 04:53:05 UTC
Can I intagrate this guides?
Comment 15 Philip Nilsson 2005-02-01 08:31:35 UTC
> Philip, I'm not sure this is the case. From screen(1):

>    -U   Run screen in UTF-8 mode. This option tells screen that your  ter-
>         minal sends and understands UTF-8 encoded characters. It also sets
>         the default encoding for new windows to `utf8'.

Note the 'also'.

  And from personal experience UTF-8 will not work very well
under screen without the flag set.
Comment 16 Tom Martin (RETIRED) gentoo-dev 2005-02-01 09:34:21 UTC
Philip,

Well spotted, I should read more closely next time. Thanks :)

Xavier,

I have added some things from Alexander's guide into mine, it's still at the original URL.


Thanks,
Tom
Comment 17 Xavier Neys (RETIRED) gentoo-dev 2005-02-01 10:27:29 UTC
Well done.

Only one complaint about iconv: it will not change the file content as the text suggests, all it will do is dump the converted file on STDOUT. iconv cannot do inline recoding as recode does.

I suggest publishing the guide as soon as this has been fixed so that it can be translated and we can start getting bugs about it :)

BTW, kudos about the coding style.
Comment 18 Tom Martin (RETIRED) gentoo-dev 2005-02-02 10:58:15 UTC
Okay, I updated the section on iconv, it should okay. Thanks for getting this ready to go :)
Comment 19 Xavier Neys (RETIRED) gentoo-dev 2005-02-02 13:54:39 UTC
Thanks a lot. It'll go live tomorrow morning.
Comment 20 Xavier Neys (RETIRED) gentoo-dev 2005-02-03 04:40:37 UTC
Guide is now in CVS, give it a few minutes to make it online.

File edited for coding style although it was already very good, and made DTD-compliant (it was not valid).

Please submit patches against the on-line file if you need to amend it.

Many thanks for your efforts.