Summary: | sys-apps/man shows mojibake when viewing localized UTF-8 man pages | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Zoltán Halassy <zhalassy> |
Component: | [OLD] Core system | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | esigra |
Priority: | Normal | Keywords: | PATCH |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
strace of man
strace -f of man |
Created attachment 347268 [details]
strace -f of man
Actually could fix it: The default in /etc/man.conf is the following: NROFF /usr/bin/nroff -mandoc If I change this to this: NROFF /usr/bin/groff -mandoc -Tutf8 -k It fixes the accent problems, and makes the manual colored. I just wonder would it be possible somehow to make this work for all users. For example a wrapper around groff, which detects the terminal encoding from locale settings, or something. Changing the default to this could potentially break man pages for other users. Actually, /usr/bin/nroff itself is a wrapper script already, the following change would suffice (line 136, simply add the -k option): -PATH="$GROFF_RUNTIME$PATH" groff -mtty-char $T $opts ${1+"$@"} +PATH="$GROFF_RUNTIME$PATH" groff -k -mtty-char $T $opts ${1+"$@"} Could we add such a change to the sys-apps/groff ? does man-db work ? `emerge -C man && emerge man-db` Yes, it does, thanks. I don't mind migrating to man-db. (However, man-db calls preconv too (but directly), as can be seen in strace. The same what the -k option causes to groff.) I used the old man because it was the default, no other particular reason. (In reply to comment #5) i'm debating how much work i want to do with sys-apps/man if man-db does everything for me ;) it would be easy to update files/man-1.6f-unicode.patch to include -k ... but i think man-db does it a bit more selectively than just always running preconv. it looks for certain markers in the start of the file iirc. I just hit this in bug #523440. Adding a default for an unconditional conversion from utf8 in /etc/man.conf as suggested in comment 2 seems to be the most reasonable thing. This fixes most cases while not breaking working ascii files. The basic problem of borked internationalization in sys-apps/man however would be much more work. Messing with the nroff script and making it something else (comment 3), completely circumventing it and therefore its use of the locale (comment 2) or even using a replacement package for man that seems to add another layer of complications by storing man pages in its own database rather than in plain files in the file system tree does not look like a reasonable choice. As pointed out in bug #523440 , setting environment variable GROFF_ENCODING might help, the only problem is: where does man (or the nroff or groff it calls) get its environment from? Setting GROFF_ENCODING before calling man does not help! The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ce370f012e25ad2eb756cbcaf768bf053161d067 commit ce370f012e25ad2eb756cbcaf768bf053161d067 Author: Mike Gilbert <floppym@gentoo.org> AuthorDate: 2020-03-07 21:54:44 +0000 Commit: Mike Gilbert <floppym@gentoo.org> CommitDate: 2020-03-07 21:59:07 +0000 sys-apps/man: remove package Closes: https://bugs.gentoo.org/468428 Closes: https://bugs.gentoo.org/515534 Closes: https://bugs.gentoo.org/524588 Closes: https://bugs.gentoo.org/589738 Closes: https://bugs.gentoo.org/605352 Closes: https://bugs.gentoo.org/651038 Closes: https://bugs.gentoo.org/683494 Signed-off-by: Mike Gilbert <floppym@gentoo.org> profiles/package.mask | 6 - sys-apps/man/Manifest | 1 - sys-apps/man/files/makewhatis.cron | 5 - sys-apps/man/files/man-1.5m2-apropos.patch | 16 --- sys-apps/man/files/man-1.6-cross-compile.patch | 61 ---------- .../files/man-1.6c-cut-duplicate-manpaths.patch | 83 ------------- sys-apps/man/files/man-1.6e-headers.patch | 13 -- .../man-1.6f-makewhatis-compression-cleanup.patch | 69 ----------- .../files/man-1.6f-man2html-compression-2.patch | 61 ---------- sys-apps/man/files/man-1.6f-parallel-build.patch | 78 ------------ sys-apps/man/files/man-1.6f-so-search-2.patch | 34 ------ sys-apps/man/files/man-1.6f-unicode.patch | 28 ----- sys-apps/man/files/man-1.6g-compress.patch | 17 --- sys-apps/man/files/man-1.6g-echo-escape.patch | 15 --- sys-apps/man/files/man-1.6g-fbsd.patch | 15 --- sys-apps/man/files/man-1.6g-xz.patch | 53 --------- sys-apps/man/man-1.6g-r1.ebuild | 131 --------------------- sys-apps/man/metadata.xml | 8 -- 18 files changed, 694 deletions(-) |
Created attachment 347266 [details] strace of man I use hu_HU.UTF-8 as my current locale, LINGUAS="hu en" is also set up. Let's take net-analyzer/nmap-6.25 . Man page in hu is available, it is installed into /usr/share/man/hu/man1/nmap.1.bz2 (same goes for chsh, gpasswd, groups, hunspell, login, mc, mplayer, newgrp, passwd and su) . The file(s) is(are) natively encoded in UTF-8. For example, nmap manpage title says (directly in the file, in the 31th line, after the .SH "NAME" directive), with correct UTF-8 encoding: "Hálózat feltérképező és biztonsági/kapu letapogató eszköz" However, $ man -P cat nmap | fgrep -A1 NAME gives this: NAME nmap - Hálózat feltérképezŠés biztonsági/kapu letapogató No idea what is wrong. /etc/locale.gen contains one row, this: hu_HU.UTF-8 UTF-8 Everything else works fine in the console. Bash returns hungarian error messages with proper accents. Readline handles hungarian accented letters properly. Midnight Commander speaks hungarian with accented letters properly. less (the default pager) can show UTF-8 files properly. Only man is broken, somehow, it thinks it should convert the man pages to something, which is already in a proper form. Actually, I never seen man to work properly with unicode man pages. Added strace of man.