Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 93664 - sys-apps/man: messages look like garbage (unicode issues)
Summary: sys-apps/man: messages look like garbage (unicode issues)
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
: 211547 235305 339307 399255 491564 (view as bug list)
Depends on: 284822 349381
Blocks: 92017
  Show dependency tree
 
Reported: 2005-05-23 04:08 UTC by Evgeniy Dushistov
Modified: 2013-12-25 22:34 UTC (History)
15 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
fixed ebuild (man-1.5p.ebuild,2.77 KB, text/plain)
2005-05-23 04:11 UTC, Evgeniy Dushistov
Details
man-1.5p-r1.ebuild (man-1.5p-r1.ebuild,2.94 KB, text/plain)
2005-06-24 13:04 UTC, Evgeniy Dushistov
Details
patched ebuild for man-1.6-r1 (man-1.6p-r1.ebuild,3.58 KB, text/plain)
2005-09-02 12:30 UTC, Sergey Belyashov
Details
fix garbage in man's messages (man-1.6a-messages.patch,4.32 KB, patch)
2005-12-30 13:20 UTC, Evgeniy Dushistov
Details | Diff
use iconv for converting catgets texts to correct charset (man-1.6d-catgets-encoding.patch,5.26 KB, patch)
2006-11-20 13:13 UTC, Matthias Schwarzott
Details | Diff
sys-apps/man-1.6f ebuild for modified patch (man-1.6f-r2.ebuild,3.79 KB, text/plain)
2008-02-27 20:36 UTC, Andrian Nord
Details
Modified patch from comment #18 (removed Makefile patching) (man-1.6f-catgets-encoding.patch,4.34 KB, patch)
2008-02-27 21:07 UTC, Andrian Nord
Details | Diff
merge message encoding fix to man-1.6f-r4.ebuild (man-1.6f-r5.ebuild,4.52 KB, text/plain)
2010-07-30 20:02 UTC, Evgeniy Dushistov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Evgeniy Dushistov 2005-05-23 04:08:27 UTC
My locale is ru_RU.UTF-8, but all messages that show "man" utility in utf-8.

For example if enter
$man
without parameters,
it say, if translate to English: "What manual page do you want?".

but I see only a garbage.

but if I do like this
$man 2>&1 | iconv -f koi8-r -t utf-8 

all fine.

so I change man ebuild : if there is unicode flag in USE,
all russian messages will converted to utf-8.

Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Comment 1 Evgeniy Dushistov 2005-05-23 04:11:09 UTC
Created attachment 59618 [details]
fixed ebuild
Comment 2 Alexander Simonov 2005-05-23 07:59:41 UTC
I think it's upstream problem.
Can you create bug on upstream bugzilla ?
Comment 3 Evgeniy Dushistov 2005-05-23 12:26:41 UTC
>I think it's upstream problem.
>Can you create bug on upstream bugzilla ?

You mean sent bug's description to developer(Andries Brouwer) of "man" utility?
Comment 4 Alexander Simonov 2005-05-23 13:24:19 UTC
Yes. I think it's bug of russian translator of "man".
All translates MUST be in utf-8 encoding.
Comment 5 Evgeniy Dushistov 2005-05-23 14:58:27 UTC
>All translates MUST be in utf-8 encoding.

why do you think so?

if you look at subidrectory "msgs" in "man" source tree,
you can see this situation:

$ cat *.codeset
$ codeset=cp1251
$ codeset=iso-8859-2
$ codeset=iso-8859-1
$ codeset=iso-8859-1
$ codeset=iso-8859-7
$ codeset=iso-8859-1
$ codeset=iso-8859-1
$ codeset=iso-8859-1
$ codeset=iso-8859-1
$ codeset=iso-8859-2
$ codeset=iso-8859-1
$ codeset=euc-jp
$ codeset=euc-kr
$ codeset=iso-8859-1
$ codeset=iso-8859-2
$ codeset=iso-8859-1
$ codeset=iso-8859-2
$ codeset=koi8-r
$ codeset=iso-8859-2

no one message in utf-8.

Comment 6 Alexander Simonov 2005-05-24 05:25:36 UTC
(In reply to comment #5)
> >All translates MUST be in utf-8 encoding.
> 
> why do you think so?
> 
> if you look at subidrectory "msgs" in "man" source tree,
> you can see this situation:
> 
> $ cat *.codeset
> $ codeset=cp1251
> $ codeset=iso-8859-2
> $ codeset=iso-8859-1
> $ codeset=iso-8859-1
> $ codeset=iso-8859-7
> $ codeset=iso-8859-1
> $ codeset=iso-8859-1
> $ codeset=iso-8859-1
> $ codeset=iso-8859-1
> $ codeset=iso-8859-2
> $ codeset=iso-8859-1
> $ codeset=euc-jp
> $ codeset=euc-kr
> $ codeset=iso-8859-1
> $ codeset=iso-8859-2
> $ codeset=iso-8859-1
> $ codeset=iso-8859-2
> $ codeset=koi8-r
> $ codeset=iso-8859-2
> 
> no one message in utf-8.
> 
> 

yes it now that!
BUT! In the future all transtation to be in utf-8.
Comment 7 Evgeniy Dushistov 2005-06-24 13:04:04 UTC
Created attachment 61867 [details]
man-1.5p-r1.ebuild
Comment 8 Evgeniy Dushistov 2005-06-24 13:06:15 UTC
I recieved answer from current mantainter of man utility.
He said that it will be fixed.

I change ebuild script to fix this problem for all locales if unicode in USE flags.
Comment 9 Evgeniy Dushistov 2005-07-01 08:40:06 UTC
why so big delay?

it is realy hard to add this trivial fix to ebuild?

now 1.6 stable and have the same problem.
Comment 10 Sergey Belyashov 2005-09-02 12:30:11 UTC
Created attachment 67503 [details]
patched ebuild for man-1.6-r1

I try to solve this problem with my way, but found solution here and use it
(after bug fixing). Also I resolve problem with incorrect supported language
detection (it did not detect russian LINGUA).
This ebuild is checked for correct working with English and Russian. But I
think that it will work correctly with other languages.
Comment 11 Alexander Simonov 2005-09-02 13:04:20 UTC
(In reply to comment #10)
> Created an attachment (id=67503) [edit]
> patched ebuild for man-1.6-r1
> 
> I try to solve this problem with my way, but found solution here and use it
> (after bug fixing). Also I resolve problem with incorrect supported language
> detection (it did not detect russian LINGUA).
> This ebuild is checked for correct working with English and Russian. But I
> think that it will work correctly with other languages.

I repeat This is UPSTREAM problem!
Create bug on upstream bugzilla!
Comment 12 Evgeniy Dushistov 2005-09-18 11:39:56 UTC
>I repeat This is UPSTREAM problem!

Yes, it is. But I sent a bug report several months ago, 
mantainer confirm, that there is such bug, and nothing...

Why not add this as temporary solution, 
and when it was fixed remove this five lines from ebuild?

Why say this is upstream problem, 
and wait several years when mantainer of "man" fix this problem?
Comment 13 Federico Lucifredi 2005-09-19 12:15:49 UTC
> >I repeat This is UPSTREAM problem!

I have been notified by Evgeniy, and this will be fixed with the minor release
addressing UTF-8 support for non 8859-1 languages, which will likely come in
late October.

> Why not add this as temporary solution, 
> and when it was fixed remove this five lines from ebuild?

evgeniy is right. The established community process is to notify the maintainer
(which he did) for the final, solid fix (which takes longer), and introduce a
distro-fix until he releases the final one.
Comment 14 Evgeniy Dushistov 2005-12-30 13:16:48 UTC
Indeed, it is not a exatctly "uft-8", it is happen for all locales
with more that one encoding (may be reassign bug?)

So I create patch with convert all messages to utf-8(ebuild patch),
and convert from utf-8 to current encoding(man patch).

here is ebuild patch, patch for man in attachment

--- /usr/portage/sys-apps/man/man-1.6b-r2.ebuild        2005-12-25 18:36:02.0000
00000 +0300
+++ man-1.6b-r3.ebuild  2005-12-30 23:53:56.630521250 +0300
@@ -53,10 +53,25 @@
        epatch "${FILESDIR}"/man-1.5p-man2html.patch
        epatch "${FILESDIR}"/man-1.5p-mandirlist.patch
 
+       #fix messages encoding 
+       epatch "${FILESDIR}"/man-1.6a-messages.patch
+
        # use non-lazy binds for man
        append-ldflags $(bindnow-flags)
 
        strip-linguas $(eval $(grep ^LANGUAGES= configure) ; echo ${LANGUAGES//,
/ })
+
+       cd msgs
+
+    for mess in `ls mess.* | grep -v codeset`; do
+           if [ -e ${mess}.codeset ]; then
+                   codeset=`sed s/\$\ codeset=//g ${mess}.codeset`
+                   iconv -f $codeset -t utf8 $mess > ${mess}.utf8
+                       mv $mess.utf8 $mess
+                       echo "$ codeset=utf8">${mess}.codeset
+               fi
+       done
+       cd ..
 }
 
 src_compile() {
Comment 15 Evgeniy Dushistov 2005-12-30 13:20:35 UTC
Created attachment 75817 [details, diff]
fix garbage in man's messages
Comment 16 Sergey Belyashov 2006-08-20 02:22:46 UTC
sys-apps/man-1.6d have same problem
Comment 17 Matthias Schwarzott gentoo-dev 2006-11-20 09:17:59 UTC
Is there a reason to leave man-text-messages broken for more than a year without commiting a working fix?
Comment 18 Matthias Schwarzott gentoo-dev 2006-11-20 13:13:22 UTC
Created attachment 102427 [details, diff]
use iconv for converting catgets texts to correct charset

Patch based on "fix garbage in man's messages"-patch
+ makefile-changes to convert catgets-files to utf8.
Comment 19 Evgeniy Dushistov 2006-12-10 09:03:00 UTC
(In reply to comment #17)
> Is there a reason to leave man-text-messages broken for more than a year
> without commiting a working fix?
> 

As I understand position of utf8 gentoo team, this is problem of mainstream,
not Gentoo. But I sent patch year or so ago, the maintainer said that he will look at it, and that's all, there is no more reaction.
Comment 20 Evgeniy Dushistov 2006-12-10 09:03:31 UTC
(In reply to comment #18)
> Created an attachment (id=102427) [edit]
> use iconv for converting catgets texts to correct charset
> 
> Patch based on "fix garbage in man's messages"-patch
> + makefile-changes to convert catgets-files to utf8.
> 

Works for me.

Comment 21 Vadim Efimov 2007-01-05 03:00:58 UTC
(In reply to comment #20)
> (In reply to comment #18)
> > Created an attachment (id=102427) [edit]
> > use iconv for converting catgets texts to correct charset
> > 
> > Patch based on "fix garbage in man's messages"-patch
> > + makefile-changes to convert catgets-files to utf8.
> > 
> 
> Works for me.
> 

and for me.

Opened: 2005-05-23 04:08 PST 

upstream has died ?
Comment 22 Evgeniy Dushistov 2007-01-05 03:25:36 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #18)
> > > Created an attachment (id=102427) [edit]
> > > use iconv for converting catgets texts to correct charset
> > > 
> > > Patch based on "fix garbage in man's messages"-patch
> > > + makefile-changes to convert catgets-files to utf8.
> > > 
> > 
> > Works for me.
> > 
> 
> and for me.
> 
> Opened: 2005-05-23 04:08 PST 
> 
> upstream has died ?
> 
There is comment from current developer of "man" in this disscussion:
----- Comment #13 From Federico Lucifredi 2005-09-19 

may be he just forget about this issue,

and may be it is possible to add him into "Cc" list,
and resend all these comments to him.
Comment 23 SpanKY gentoo-dev 2008-02-24 18:12:19 UTC

*** This bug has been marked as a duplicate of bug 126361 ***
Comment 24 Evgeniy Dushistov 2008-02-25 12:05:30 UTC
>*** This bug has been marked as a duplicate of bug 126361 ***
>please add utf8 support to groff

This bug has no attitude to groff, 
it is about "man" by it self,
for example messages like "Man page not found" and so on.
Comment 25 SpanKY gentoo-dev 2008-02-26 19:27:06 UTC
*** Bug 211547 has been marked as a duplicate of this bug. ***
Comment 26 SpanKY gentoo-dev 2008-02-26 19:55:06 UTC
*** Bug 211547 has been marked as a duplicate of this bug. ***
Comment 27 SpanKY gentoo-dev 2008-02-26 20:33:00 UTC
*** Bug 211547 has been marked as a duplicate of this bug. ***
Comment 28 SpanKY gentoo-dev 2008-02-26 21:11:19 UTC
*** Bug 211547 has been marked as a duplicate of this bug. ***
Comment 29 SpanKY gentoo-dev 2008-02-26 23:22:54 UTC
*** Bug 211547 has been marked as a duplicate of this bug. ***
Comment 30 Andrian Nord 2008-02-27 20:36:19 UTC
Created attachment 144777 [details]
sys-apps/man-1.6f ebuild for modified patch

As insisted in Bug #211547 Comment #20 I will repeat everything here (but not give up, as someone, maybe, hopes). I have contacted Federico Lucifredi - current maintainer of man, and he says that they will think about solution. But I insist that there is need of temporary solution for this issue (I don't know, how many days/months/years will pass, until upstream will merge it, or will at all).

This ebuild converts catgets to UTF-8 using ebuild build-in scripting, without patching Makefile, and uses modified patch from Comment #18, with removed Makefile pathing (if you don't like this way - you can just add patch from c#18 to already existing ebuild and it will work).
Comment 31 Andrian Nord 2008-02-27 21:07:10 UTC
Created attachment 144781 [details, diff]
Modified patch from comment #18 (removed Makefile patching)


In answer to Bug #211547 Comment #19 / #18 :
Yes, I have my answer. And, as you see, I don't like it, because it tolds opinion and interests of one man, without any arguments to his position. I was my mistake to talk on dublicated bug, but I suggest that there is no need of dublicating information, sorry.

But don't try to hide behind bureaucracy meanings - I _will_ continue "wasting" your time, until I will have normal and argumented position WHY this bug wasn't resolved for 3 years passed, while you had everything you need to resolve it.
Comment 32 Pacho Ramos gentoo-dev 2008-03-29 20:25:11 UTC
I have mailled to upstream and he replied me saying that is already working on this and, hopefully, next release will fix this :-)
Comment 33 Alexander E. Patrakov 2009-02-16 06:07:49 UTC
I doubt that there will actually be a release that works (if one doesn't consider Man-DB). So let's just disable broken functionality by default: i.e., man should never produce any translated messages. This is already done with the "-nls" USE flag.

And, until bug #259176 is fixed, it is a good idea to provide a USE flag to disable support for translated manual pages completely, by applying the last hunk from the patch from http://www.mail-archive.com/lfs-dev@linuxfromscratch.org/msg12112.html
Comment 34 Federico Cuello 2010-07-05 13:26:33 UTC
5+ years and still no solution?
Comment 35 Simeon Maryasin 2010-07-05 17:31:04 UTC
One of possible solutions is to use sys-apps/man-db instead of sys-apps/man. It doesn't have such problems with character encodings...
Comment 36 Evgeniy Dushistov 2010-07-10 23:06:59 UTC
(In reply to comment #34)
> 5+ years and still no solution?
> 

there is solution, there are patches,
but looks like nobody from gentoo maintainers 
have whole ten minutes to push them into portage tree.
Comment 37 Evgeniy Dushistov 2010-07-30 20:02:21 UTC
Created attachment 240751 [details]
merge message encoding fix to  man-1.6f-r4.ebuild
Comment 38 Rafał Mużyło 2010-07-30 21:22:32 UTC
*** Bug 235305 has been marked as a duplicate of this bug. ***
Comment 39 Rafał Mużyło 2010-07-30 21:24:36 UTC
I don't think iconv would be a good solution, due to things 
like i.e. uclibc.
Comment 40 Evgeniy Dushistov 2010-07-31 05:13:23 UTC
(In reply to comment #39)
> I don't think iconv would be a good solution, due to things 
> like i.e. uclibc.
> 

man pages on embedded system?

Any way with or without this patch you can build "man" with nls and with uclibc,
because of it used catopen, catclose, catgets that not implemented in uclibc. So I don't think that we should think about uclibc, when solving this problem.
Comment 41 Evgeniy Dushistov 2010-07-31 05:20:22 UTC
(In reply to comment #40)

 > Any way with or without this patch you can build "man" with nls and with
> uclibc,

s/can/can not/g
Comment 42 Peter Volkov (RETIRED) gentoo-dev 2010-12-22 16:35:08 UTC
Guys, try man-db and recent man-pages-ru. For me it fixes issues out of box.
Comment 43 Ben Sagal 2011-12-20 13:53:46 UTC
looks like an old issue with comment on 2010-12-22 that it is fixed, could the bug be closed?
Comment 44 Sergey Belyashov 2011-12-20 14:19:15 UTC
Ben Sagal, this bug is not fixed. man-db is not stable (currently it masked) and not standard.
Comment 45 SpanKY gentoo-dev 2012-01-19 03:07:27 UTC
*** Bug 399255 has been marked as a duplicate of this bug. ***
Comment 46 Pacho Ramos gentoo-dev 2012-01-28 12:19:16 UTC
Assigning to man maintainers as utf8 herd is dead for ages
Comment 47 Alex Efros 2012-01-28 12:34:55 UTC
Subject change was wrong: it's not "manpages" looks like garbage, it's /usr/bin/man messages (like: No manual entry for ...) print to console looks like garbage.
Comment 48 Colin Watson 2012-01-28 12:37:10 UTC
Yes, although to an extent it's probably both (with varying levels of ease of reproduction).  Just switch to man-db already? :-)
Comment 49 Alex Efros 2012-01-28 13:16:05 UTC
(In reply to comment #48)
> Yes, although to an extent it's probably both (with varying levels of ease of
> reproduction).

Not really, that's just a question of using correct configuration in /etc/man.conf:

NROFF           /usr/bin/enconv -L ru -x KOI8-R -C iconv | /usr/bin/nroff -mandoc -Tlatin1 -c | /usr/bin/enconv -L ru -x UTF8

> Just switch to man-db already? :-)

Actually I'm using Vim (with viewdoc plugin) to view man pages in console (but internally it uses /usr/bin/man) - this make syntax highlight, search and navigation between man pages much more comfortable.

As for switch to man-db - I hope it compatible enough with man to not break viewdoc plugin. Currently all man-db versions in portage are ~x86. And I don't see any reason to switch, actually - I don't see any real reasons to start using berkdb instead of plain files here. So, why I should even think about switching? :)
Comment 50 Sergey Belyashov 2012-01-28 16:22:20 UTC
(In reply to comment #49)
> (In reply to comment #48)
> > Yes, although to an extent it's probably both (with varying levels of ease of
> > reproduction).
> 
> Not really, that's just a question of using correct configuration in
> /etc/man.conf:
> 
> NROFF           /usr/bin/enconv -L ru -x KOI8-R -C iconv | /usr/bin/nroff
> -mandoc -Tlatin1 -c | /usr/bin/enconv -L ru -x UTF8
> 

this not fixes man output itself (help for example).
Comment 51 Colin Watson 2012-02-04 22:59:28 UTC
(In reply to comment #49)
> Not really, that's just a question of using correct configuration in
> /etc/man.conf:
> 
> NROFF           /usr/bin/enconv -L ru -x KOI8-R -C iconv | /usr/bin/nroff
> -mandoc -Tlatin1 -c | /usr/bin/enconv -L ru -x UTF8

It's pretty bizarre in this day and age that people should have to configure this manually.  man-db will generally just figure it out by itself, for all languages I've seen manual pages written in (it's easy to add new encoding support, and a lot of pages are just in UTF-8 these days anyway which will work by default), without configuration.  It has supported Russian KOI8-R pages with no configuration since 2003, and automatic detection of KOI8-R vs. UTF-8 since 2007.

> Actually I'm using Vim (with viewdoc plugin) to view man pages in console (but
> internally it uses /usr/bin/man) - this make syntax highlight, search and
> navigation between man pages much more comfortable.
> 
> As for switch to man-db - I hope it compatible enough with man to not break
> viewdoc plugin.

I often use man.vim myself which comes with vim; but I've just tested viewdoc and it basically works fine.  The only problem is completion, because man-db's /usr/bin/man didn't support running 'man --path' to print the manpath; of course this was a trivial fix and I've just committed compatibility code to support this.

Generally, I'd expect compatibility problems to be rare.  This is the first one I recall seeing in a couple of years.

> Currently all man-db versions in portage are ~x86.

Not so.  Since July, CVS has had:

  KEYWORDS="~alpha ~amd64 ~arm ~hppa ~ia64 ~m68k ~mips ~ppc ~ppc64 ~s390 ~sh ~sparc ~x86"

> And I don't see any reason to switch, actually - I don't see any real
> reasons to start using berkdb instead of plain files here. So, why I
> should even think about switching? :)

There seems to be an idea that just won't die that man-db is only about adding a database to man (incidentally, since 2008 I've recommended configuring man-db to use GDBM, not Berkeley DB).  Perhaps this is my fault since I haven't made much effort to emphasise real benefits in the documentation.  These days, the database is the least important of the differences between man and man-db.

I don't want to get into a giant advocacy discussion, but a few reasons to use man-db:

 * Correct encoding support out of the box that doesn't require primitive hardcoding in configuration files, supporting the use of a variety of languages and encodings without reconfiguration.

 * man uses catgets for message translations, which nearly everyone else stopped using in the 1990s, and which is the fundamental cause of this Gentoo bug.  One of the first things I did when I took over man-db in 2001 was to convert it to gettext, which is more correct and robust.

 * man has lots of code like 'command = my_xsprintf("%s%s '%S' | %s%s", ...)' which should fail any competent security review; consider the case where you're using man in a CGI script, for instance.  man-db is designed from top to bottom to have safe and correct command execution (this is the point of libpipeline).

 * man-db is actually maintained.  I mean no disrespect to Federico - we're even co-workers these days! - but man has only had one release since 2007 and that only really had a few minor changes; it doesn't look as though he has time to maintain it.  man-db has had ten full releases since then, and I follow bug reports from several distributions.

I work on distributions too; I know that there's a strong urge to fix the software you're currently using rather than to switch to a replacement.  However, I honestly think at this point man is several years behind man-db as far as i18n is concerned - both this bug and the harder problem of dealing with manual page encodings properly - and it shows no signs of catching up.  When I took over man-db it was in a state much like man is now, and it took me a few years of upstream development before I was really satisfied with how all the locale handling worked.  So, when I advise just switching to man-db, that isn't just "hey, I maintain it, it must be better", but the result of a lot of bitter experience fixing just this kind of bug.
Comment 52 Colin Watson 2012-02-24 08:12:01 UTC
I released man-db 2.6.1 last week, which provides that viewdoc plugin compatibility I mentioned.
Comment 53 Sergey Popov gentoo-dev 2013-06-26 18:10:17 UTC
*** Bug 339307 has been marked as a duplicate of this bug. ***
Comment 54 SpanKY gentoo-dev 2013-12-03 07:27:15 UTC
*** Bug 491564 has been marked as a duplicate of this bug. ***
Comment 55 SpanKY gentoo-dev 2013-12-25 22:34:52 UTC
use man-db for automatic charset conversion.  no plans on making man work.