| Summary: | UTF-8 Guide uses 'xx_YY.UTF-8' while only 'xx_YY.utf8' works for newer glibc | ||
|---|---|---|---|
| Product: | [OLD] Docs-user | Reporter: | Wiktor Wandachowicz <wiktorw> |
| Component: | Localisation Guide | Assignee: | Docs Team <docs-team> |
| Status: | RESOLVED INVALID | ||
| Severity: | enhancement | CC: | jakub, truedfx |
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | All | ||
| OS: | Linux | ||
| URL: | http://www.gentoo.org/doc/en/utf-8.xml#doc_chap2 | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
|
Description
Wiktor Wandachowicz
2005-11-22 02:18:36 UTC
Works fine here: <snip> # locale LANG=cs_CZ.UTF-8 ... #locale -a C cs_CZ cs_CZ.utf8 en_US en_US.utf8 POSIX Besides, per yesterday's conversation on #-dev, this turns out to be ncurses issue, not a glibc one. CCing truedfx for some comments... Maybe that's the problem. I'll try to remerge "mc" with "-ncurses +slang" and see what happens. I'll also do everything with LANG="pl_PL.UTF-8" to see if it makes any difference. i wouldnt bother ... the slang in portage is old and broken ... no one has updated it to the 2.0 version which has fixed UTF8 handling try upgrading to ncurses 5.5 > Besides, per yesterday's conversation on #-dev, this turns out to be ncurses > issue, not a glibc one. > CCing truedfx for some comments... Actually, the ncurses issue was the exact opposite: .utf8 didn't work, .UTF-8 did. ncurses 5.4 in the Linux console needs .UTF-8 locales; with .utf8 locales, it would not realise not to use the terminfo description, and try to print lines using the wrong character codes, which would lead to screen corruption. As for this bug, I'm not sure what's wrong, but both .UTF-8 and .utf8 work here (glibc 2.3.6-r1, ncurses 5.5-r1), so I'll also suggest to check how ncurses 5.5 behaves. Now I've checked several things and have a better overview. I created several text files fith different encodings (I used "iconv" to convert between charsets). I think that I finally got the UTF-8 running on the console, because of the tests that proved this. All of you were right, I just didn't believe I got what I wanted. I just want to ask what do you think about this: - I set the font and translation in /etc/conf.d/consolefont - I set the LC_ALL="pl_PL.UTF-8" - This gives me a good result, because files with UTF-8 characters are displayed correctly on the console (cat) - The less is less optimal, becasue sometimes the output may be garbled, but I can control its behaviour through the LESSCHARSET variable <now the tricky part> - I start Midnight Commander (compiled with "-ncurses +slang") and suddenly the hints right over its "command line" are garbled - they are cut sometimes in the middle or give funny visual effects on the background. Test files are displayed correctly (F4 - Edit). - I suspect that this is because of the fact that the translated file (hints.mc.pl) uses the ISO-8859-2 encoding - I convert the original file (using iconv) to the UTF-8 encoding, which fixes this problem - On-line help misses all the localized characters, and displays spaces instead - I couldn't figure out how to fix that, using iconv didn't help <and another one> - I have man pages translated into Polish, so I try "man bash" - Lots of localized characters are displayed incorrectly - I played with /etc/man.conf and tried all possible combinations of NROFF setting, but this didn't improve the situation - I suspect that this is also caused by the fact that man pages use ISO-8859-2 Now my questions: * Is it really necessary to convert all ISO-8859-2 encoded files to UTF-8 just in order to display them correctly on UTF-8 enabled console? (and I'm not asking about X terminal of any kind) * Should the man pages be converted from ISO-8859-2 to UTF-8 just in order to display them correctly on the UTF-8 enabled console? If the answer to both questions is "yes", then it looks like changing the locale to *.UTF-8 is not worth the trouble right now. Lots of resources use the ISO-8859-* standard encodings, and dealing with them on UTF-8 console is troublesome. Of course, such documents can be converted both ways, but the conversion still needs to be done (the worst-case scenario: convert from ISO-8859-* to UTF-8 just to see or edit the file, and convert back afterwards). What is your opinion on this? Uhm, from the above, I pretty much see this like mc-specific issue and Polish manpages issue. I can't see any of the mentioned problems with cs_CZ.UTF-8 (the manpages are definitely displayed correctly), and mc works as well. IMHO, you should file new bugs about mc and man-pages-pl, this bug looks INVALID to me (read - not a documentation issue). Ok, that's perfectly reasonable. I withdraw my request and mark the bug invalid. I'll do more tests and file new bugs if appropriate, as you suggest. Thanks for your time! |