Some manual pages in /usr/share/man/ru/man5 (e.g., shadow.5.gz) are in UTF-8, some (e.g., passwd.5.gz) are in KOI8-R. This mismatch applies not only to Russian, and means that users have to reconfigure Man to guess the encoding (i.e., that the default configuration misformats manual pages). See the example use of enconv (from enca) in http://ru.gentoo-wiki.com/wiki/HOWTO_ru_RU.utf8_Gentoo_way#man , but any guessing is a hack. Let's standartize the encoding of the installed manual pages (thus removing the need for guessing) and make portage check that while installing packages. Reproducible: Always
groff since version 1.20 comes with preconv, So I guess that's what should be used (instead of iconv). In fact, there are several possibilities to activate it for *roff: - explicitly pipe /usr/bin/preconv | /usr/bin/nroff - call groff with the -k or the -D option. - set the GROFF_ENCODING environment variable (can be the empty string even, in which case preconv will try to detect the encoding) Both of the following settings in man.conf work fine for the German man pages (which can be encoded in latin1 or utf-8): NROFF /usr/bin/preconv | /usr/bin/nroff -mandoc NROFF /usr/bin/nroff -mandoc -- -k preconv will also recognise a coding tag like "-*- coding: koi8-r -*-" or "-*- coding: utf-8 -*-" in the first or second line of a man page. I'd think that it would be easier to add such tags (which can be done with a simple patch or sed tweak) than to recode the file. Reassigning to base-system.
What you suggest will probably work. However, all of this logic (and even encoding autodetection) is already implemented in recent versions of man-db, so switching to man-db is also a solution.
requiring groff-1.20.x is fine by me. files/man-1.6f-unicode.patch already tweaks the nroff line to drop the -Tlatin1 ... i guess we can have it insert -k at the same time ...
Created attachment 262011 [details, diff] updated man-1.6f-unicode.patch (In reply to comment #3) > requiring groff-1.20.x is fine by me. In fact, there a no older versions of groff in the tree any more. > files/man-1.6f-unicode.patch already tweaks the nroff line to drop the > -Tlatin1 ... i guess we can have it insert -k at the same time ... Updated patch is attached.
Comment on attachment 262011 [details, diff] updated man-1.6f-unicode.patch seems straight forward enough. what man pages are you testing against with this ?
Comment on attachment 262011 [details, diff] updated man-1.6f-unicode.patch Please hold on. For the general case, when the user's locale indicates an encoding different from that of the man page, the patch seems not to help much. There are two problems: - The encoding isn't autodetected. We would need explicit coding tags in all non-ascii man pages. - With the -k option for *roff, preconv is called too late. It should be first in the pipeline, before gtbl and geqn. (In reply to comment #5) > seems straight forward enough. what man pages are you testing against with > this ? man-pages-de, man-pages-pl, and man-pages-ru for latin-1, latin-2, and koi8-r, respectively. Plus some utf-8 encoded ones (e.g. realpath.1).
I think this should be marked as resolved, I can't reproduce this problem any more.
Indeed, I could not find any Russian manpage in KOI8-R on my system. All of them are in UTF-8, and, given that man-db is now the default man page viewer, the original bug would be irrelevant for the default installation even if it existed.