Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 105135 - Several files from man-pages-fr-1.64 contains wrong caracter set.
Summary: Several files from man-pages-fr-1.64 contains wrong caracter set.
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: No maintainer - Look at https://wiki.gentoo.org/wiki/Project:Proxy_Maintainers if you want to take care of it
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-07 04:23 UTC by Christophe Garault
Modified: 2012-06-06 13:17 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Garault 2005-09-07 04:23:40 UTC
It seems that every man page written in french that uses the iso-8859-15
caracter set is not correctly encoded.

Reproducible: Always
Steps to Reproduce:
1. emerge man-page-fr-1.64.0
2. man iso-8859-15
3.

Actual Results:  
 Le jeu de caractA"res ISO 8859-15 en octal, dA(C)cimal et hexadA(C)cimal.

Expected Results:  
 Le jeu de caract
Comment 1 Christophe Garault 2005-09-07 04:23:40 UTC
It seems that every man page written in french that uses the iso-8859-15
caracter set is not correctly encoded.

Reproducible: Always
Steps to Reproduce:
1. emerge man-page-fr-1.64.0
2. man iso-8859-15
3.

Actual Results:  
 Le jeu de caractA"res ISO 8859-15 en octal, dA(C)cimal et hexadA(C)cimal.

Expected Results:  
 Le jeu de caractères ISO 8859-15 en octal, décimal et hexadécimal.

This happened after emerging the latest version (1.64)
Comment 2 Jakub Moc (RETIRED) gentoo-dev 2005-09-07 04:28:35 UTC
Hmm, these manpages need conversion to utf-8 anyway (at least w/ unicode flags set).
Comment 3 gauthier 2005-09-17 22:32:59 UTC
Same as here.
The previous version works great !! why did you erased it ?

PLEASE STOP ERASING paquets from portage when they work !! Specially when the
new version is buggy !!! Do T have to quickpkg before every update ??? 

I think it is not a good way to maintain a "stable" branch...
Comment 4 sam 2005-10-03 00:31:52 UTC
You can see it raw in some pages

for exemple man lp is working: the 
Comment 5 sam 2005-10-03 00:31:52 UTC
You can see it raw in some pages

for exemple man lp is working: the é char is coded E9
and man dd is not working: the same char is coded C3 A9

i don't know more about it.
Comment 6 NiQoZ 2005-11-10 01:05:14 UTC
same here !!!!!!!!!!!!!
Comment 7 Jakub Moc (RETIRED) gentoo-dev 2005-11-10 01:18:08 UTC
(In reply to comment #4)
> same here !!!!!!!!!!!!!

Sure, and what's the point of making noise here?
Comment 8 Elie Morisse 2006-06-15 00:37:22 UTC
9 months old and still not fixed ? :\
Comment 9 Alec Warner (RETIRED) archtester gentoo-dev Security 2007-01-24 01:16:39 UTC
(In reply to comment #8)
> 9 months old and still not fixed ? :\
> 

vapier how the hell do I fix this?
Comment 10 SpanKY gentoo-dev 2007-01-24 05:15:46 UTC
dont read french man pages

there is no such thing as "wrong encoding" with man pages ... there is a desync with handling of unicode in most man pages

best to sort it out with man itself rather than screwing with man-page packages themselves, but i dont really delve into this crap so i'm just guessing
Comment 11 Denys Duchier 2007-09-11 15:49:25 UTC
> dont read french man pages

my students pretty much have to, as their command of english is not that great.

> there is no such thing as "wrong encoding" with man pages

that statement is impractical since, as there is no absolutely reliable
way to guess the encoding of a file, a robust installation of man pages
ought to ensure that they are all using the same encoding.  The ebuild
could use recode to ensure this.

in the mean time, I have been using a heuristic filter for man.  In /etc/man.conf I have:

NROFF /home/denys/bin/to-latin1 | /usr/bin/nroff -Tutf8 -mlatin1 -c -mandoc

and to-latin1 is the following python script:

#! /usr/bin/python

import sys, chardet, subprocess, os
text = sys.stdin.read()
enc = chardet.detect(text)['encoding']

pro = subprocess.Popen(("/usr/bin/recode",
                        "%s..Latin-1" % enc),
                       stdin=subprocess.PIPE)
pro.stdin.write(text)
pro.stdin.flush()
pro.stdin.close()
os.pidwait(pro.pid)
Comment 12 Denys Duchier 2007-09-12 19:27:25 UTC
> os.pidwait(pro.pid)

oops! that should be:

os.waitpid(pro.pid, 0)
Comment 13 Frédéric Heulin 2008-01-31 17:27:39 UTC
As written here : http://manpagesfr.free.fr/
All versions of man-pages-fr should be UTF-8 encoded since 1.59.0.
A rapid test on the 2.39 package file gives only utf-8 (some pure ascii)
# for i in $(equery f man-pages-fr | grep "bz2$"); do file -iz $i; done

Using utf8 locale such fr_FR.UTF-8, we should have in /etc/man.conf :

NROFF           iconv -t utf-16LE | /usr/bin/groff -Tutf8 -mandoc

conversion to utf-16LE, is to be able to have -Tutf8 option to groff without having double utf8 conversion or desctructive/in error conversions.

If one is in an non-utf-8 locale, one could just add iconv conversion to it:

NROFF           iconv -t utf-16LE | /usr/bin/groff -Tutf8 -mandoc | iconv -t iso8859-15

maybe do this depending on a unicode use flag ?

Btw, man-pages-fr are now 2.79 and we have man-pages-extra-fr, always according to http://manpagesfr.free.fr/.

I think the link for this package should point to http://manpagesfr.free.fr/ instead of http://fr.tldp.org/manfr.php which has not been updated since 2003 !

I may open some new bugs for the last two remarks if needed.
Comment 14 Jakub Moc (RETIRED) gentoo-dev 2008-02-28 10:05:26 UTC
Closing bug about obsolete, no longer shipped version.