Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 259176 - standardise man-page encoding
Summary: standardise man-page encoding
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High enhancement with 1 vote (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-16 06:01 UTC by Alexander E. Patrakov
Modified: 2014-07-25 17:05 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
updated man-1.6f-unicode.patch (man-1.6f-unicode.patch,755 bytes, patch)
2011-02-10 08:25 UTC, Ulrich Müller
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander E. Patrakov 2009-02-16 06:01:56 UTC
Some manual pages in /usr/share/man/ru/man5 (e.g., shadow.5.gz) are in UTF-8, some (e.g., passwd.5.gz) are in KOI8-R. This mismatch applies not only to Russian, and means that users have to reconfigure Man to guess the encoding (i.e., that the default configuration misformats manual pages). See the example use of enconv (from enca) in http://ru.gentoo-wiki.com/wiki/HOWTO_ru_RU.utf8_Gentoo_way#man , but any guessing is a hack. Let's standartize the encoding of the installed manual pages (thus removing the need for guessing) and make portage check that while installing packages.

Reproducible: Always
Comment 1 Ulrich Müller gentoo-dev 2011-02-08 16:26:45 UTC
groff since version 1.20 comes with preconv, So I guess that's what should be used (instead of iconv). In fact, there are several possibilities to activate it for *roff:

- explicitly pipe /usr/bin/preconv | /usr/bin/nroff
- call groff with the -k or the -D option.
- set the GROFF_ENCODING environment variable (can be the empty string even,
  in which case preconv will try to detect the encoding)

Both of the following settings in man.conf work fine for the German man pages (which can be encoded in latin1 or utf-8):
NROFF    /usr/bin/preconv | /usr/bin/nroff -mandoc
NROFF    /usr/bin/nroff -mandoc -- -k

preconv will also recognise a coding tag like "-*- coding: koi8-r -*-" or
"-*- coding: utf-8 -*-" in the first or second line of a man page. I'd think that it would be easier to add such tags (which can be done with a simple patch or sed tweak) than to recode the file.

Reassigning to base-system.
Comment 2 Alexander E. Patrakov 2011-02-09 10:19:50 UTC
What you suggest will probably work. However, all of this logic (and even encoding autodetection) is already implemented in recent versions of man-db, so switching to man-db is also a solution.
Comment 3 SpanKY gentoo-dev 2011-02-10 03:16:26 UTC
requiring groff-1.20.x is fine by me.  files/man-1.6f-unicode.patch already tweaks the nroff line to drop the -Tlatin1 ... i guess we can have it insert -k at the same time ...
Comment 4 Ulrich Müller gentoo-dev 2011-02-10 08:25:23 UTC
Created attachment 262011 [details, diff]
updated man-1.6f-unicode.patch

(In reply to comment #3)
> requiring groff-1.20.x is fine by me.

In fact, there a no older versions of groff in the tree any more.

> files/man-1.6f-unicode.patch already tweaks the nroff line to drop the
> -Tlatin1 ... i guess we can have it insert -k at the same time ...

Updated patch is attached.
Comment 5 SpanKY gentoo-dev 2011-02-11 00:34:41 UTC
Comment on attachment 262011 [details, diff]
updated man-1.6f-unicode.patch

seems straight forward enough.  what man pages are you testing against with this ?
Comment 6 Ulrich Müller gentoo-dev 2011-02-11 23:24:27 UTC
Comment on attachment 262011 [details, diff]
updated man-1.6f-unicode.patch

Please hold on. For the general case, when the user's locale indicates an encoding different from that of the man page, the patch seems not to help much. There are two problems:
- The encoding isn't autodetected. We would need explicit coding tags in all
  non-ascii man pages.
- With the -k option for *roff, preconv is called too late. It should be first
  in the pipeline, before gtbl and geqn.

(In reply to comment #5)
> seems straight forward enough.  what man pages are you testing against with
> this ?

man-pages-de, man-pages-pl, and man-pages-ru for latin-1, latin-2, and
koi8-r, respectively. Plus some utf-8 encoded ones (e.g. realpath.1).
Comment 7 michal.halenka 2014-07-25 07:39:10 UTC
I think this should be marked as resolved, I can't reproduce this problem any more.
Comment 8 Alexander E. Patrakov 2014-07-25 17:05:03 UTC
Indeed, I could not find any Russian manpage in KOI8-R on my system. All of them are in UTF-8, and, given that man-db is now the default man page viewer, the original bug would be irrelevant for the default installation even if it existed.