<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://bugs.gentoo.org/bugzilla.dtd">

<bugzilla version="2.22.7"
          urlbase="http://bugs.gentoo.org/"
          maintainer="bugzilla@gentoo.org"
>

    <bug>
          <bug_id>121502</bug_id>
          
          <creation_ts>2006-02-03 21:38 0000</creation_ts>
          <short_desc>man pages with unicode give unexpected behavior with dashes</short_desc>
          <delta_ts>2008-02-25 03:04:49 0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Gentoo Linux</product>
          <component>Core system</component>
          <version>2005.1</version>
          <rep_platform>All</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>146315</blocked>
          
          <everconfirmed>1</everconfirmed>
          <reporter>throw_away_2002@yahoo.com</reporter>
          <assigned_to>base-system@gentoo.org</assigned_to>
          <cc>hiyuh.root@gmail.com</cc>
    
    <cc>radek@podgorny.cz</cc>
    
    <cc>release@gentoo.org</cc>
    
    <cc>truedfx@gentoo.org</cc>

      

      
          <long_desc isprivate="0">
            <who>throw_away_2002@yahoo.com</who>
            <bug_when>2006-02-03 21:38:15 0000</bug_when>
            <thetext>One of the strangest things I have ever seen.

ONLY a problem on amd64 (not my x86 machines).

ONLY a problem with LANG set to ANY utf-8 locale (for example, en_US.UTF-8,
but NOT simply en_US or POSIX).

(So far) ONLY a problem for ANY man page in the openssh package
(about a dozen files). Other packages (newly emerged) do not
have this problem, and I just re-emerged openssh because of the
security update. No change.

The problem: using either less or man to view any of these man pages,
the search function (&quot;/&quot;) will not find the dash character (&quot;-&quot;) in the
file (even with many of them obviously visible).

Hope somebody can duplicate this, but if not, happy to do whatever
testing I can.

$ emerge -p info
Portage 2.0.54 (default-linux/amd64/2005.1, gcc-3.4.4, glibc-2.3.5-r2, 2.6.15-gentoo-r1 x86_64)
=================================================================
System uname: 2.6.15-gentoo-r1 x86_64 AMD Athlon(tm) 64 Processor 3200+
Gentoo Base System version 1.6.14
dev-lang/python:     2.4.2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS=&quot;amd64&quot;
AUTOCLEAN=&quot;yes&quot;
CBUILD=&quot;x86_64-pc-linux-gnu&quot;
CFLAGS=&quot;-march=k8 -O3 -pipe -msse2 -mfpmath=sse&quot;
CHOST=&quot;x86_64-pc-linux-gnu&quot;
CONFIG_PROTECT=&quot;/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control&quot;
CONFIG_PROTECT_MASK=&quot;/etc/gconf /etc/terminfo /etc/env.d&quot;
CXXFLAGS=&quot;-march=k8 -O3 -pipe -msse2 -mfpmath=sse&quot;
DISTDIR=&quot;/usr/portage/distfiles&quot;
FEATURES=&quot;autoconfig distlocks sandbox sfperms strict&quot;
GENTOO_MIRRORS=&quot;http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo&quot;
LANG=&quot;en_NZ.UTF-8&quot;
LINGUAS=&quot;en ru&quot;
PKGDIR=&quot;/usr/portage/packages&quot;
PORTAGE_TMPDIR=&quot;/var/tmp&quot;
PORTDIR=&quot;/usr/portage&quot;
PORTDIR_OVERLAY=&quot;/usr/local/portage&quot;
SYNC=&quot;rsync://rsync.gentoo.org/gentoo-portage&quot;
USE=&quot;amd64 X aac aalib acpi alsa apache2 arts audiofile avi berkdb bitmap-fonts bzip2 caps cdparanoia cdr cjk crypt css cups dga directfb divx4linux dvd dvdr emboss encode exif expat faad fam fbcon ffmpeg flac freetype gd ggi gif gmp gphoto2 gpm gstreamer gtk2 idea idn imagemagick imap imlib ipv6 javascript jikes joystick jpeg kde lcms libcaca libwww live lm_sensors lzw lzw-tiff mad matroska mbox memlimit mng motif mp3 mpeg mpi mysql nas ncurses network nls nptl nptlonly ogg opengl pcre pdflib perl png ppds qt quicktime readline real rtc samba scanner sdl silc speex spell ssl tcpd theora tiff truetype truetype-fonts type1-fonts udev unicode usb userlocales utf8 vcd vorbis wifi xinerama xml2 xmms xpm xv xvid zlib linguas_en linguas_ru userland_GNU kernel_linux elibc_glibc&quot;
Unset:  ASFLAGS, CTARGET, LC_ALL, LDFLAGS, MAKEOPTS</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>truedfx@gentoo.org</who>
            <bug_when>2006-02-22 11:54:43 0000</bug_when>
            <thetext>Could you please verify that what you&apos;re seeing is the ASCII minus sign, rather than a non-ASCII Unicode symbol which looks exactly the same? One way to find out is by viewing one of these manpages, copying the character with the mouse, and typing

echo - | cat -v

in a shell, except that instead of typing -, you paste it. I&apos;m guessing you&apos;ll see  &quot;M-bM-^HM-^R&quot; instead of &quot;-&quot;. If this is the case, could you please make sure your /etc/man.conf is the same on all your machines, and if not, if you can reproduce this on other systems by making it the same?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>throw_away_2002@yahoo.com</who>
            <bug_when>2006-02-23 12:20:50 0000</bug_when>
            <thetext>Oops. :(

It seems like you have it right:

$ echo &amp;#8722; | cat -v
M-bM-^HM-^R

And yes, I do have a difference in man.conf (the -Tascii option).

I guess that solves the problem.

Now I need to figure out if I even want to use utf-8 for man pages
(searching on what looks like an ascii &quot;-&quot; seems obvious to me, and
I do it all the time to find the description of an option).

Why on earth would the openssh people make those non-ascii characters
(in the middle of pure ascii text) when a far more obvious (at least
to me) alternative exists?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>truedfx@gentoo.org</who>
            <bug_when>2006-02-23 14:09:46 0000</bug_when>
            <thetext>&gt; Why on earth would the openssh people make those non-ascii characters
&gt; (in the middle of pure ascii text) when a far more obvious (at least
&gt; to me) alternative exists?

It&apos;s not their decision. The manpage contains macros that tell nroff &quot;format &apos;1&apos; as an option&quot;, but it doesn&apos;t tell nroff how to do that. Other manpages would contain &quot;format &apos;-1&apos; in bold&quot; instead, which is why it happens to work with them, but I actually think openssh is doing the right thing here. (If you want to be sure, you can check `gzip -dc /usr/share/man/man1/scp.1.gz`, and look for the .Fl macros. Its meaning is described in the groff_mdoc manpage.) I do think this may be a groff bug though, since &amp;#8722;1 isn&apos;t a valid scp option, only -1 is. base-system, as responsible for groff, added to CC for additional input. Does this description sound about right, and if so, should groff maybe be changed to force ASCII - for command-line options?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>zzam@gentoo.org</who>
            <bug_when>2006-09-04 04:12:06 0000</bug_when>
            <thetext>Does this error still exists?

Can you please tell us what versions of &quot;man&quot; and &quot;groff&quot; you have installed.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>throw_away_2002@yahoo.com</who>
            <bug_when>2006-09-04 10:50:20 0000</bug_when>
            <thetext>(In reply to comment #4)
&gt; Does this error still exists?

No. My (very recent - as in two minutes ago :) ) update of man from 1.6-r1 to 1.6d appears to have fixed the problem.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>zzam@gentoo.org</who>
            <bug_when>2006-09-04 11:51:00 0000</bug_when>
            <thetext>(In reply to comment #5)
&gt; (In reply to comment #4)
&gt; &gt; Does this error still exists?
&gt; 
&gt; No. My (very recent - as in two minutes ago :) ) update of man from 1.6-r1 to
&gt; 1.6d appears to have fixed the problem.
&gt; 
1. Please also give us your version of groff.
2. With which man-page did you check the error?

For me on x86 it produces the error with &quot;man scp&quot; with man-1.6d and all available versions of groff (1.18.1.1, 1.19.1-r2 and 1.19.2-r1).

</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2006-09-04 16:14:04 0000</bug_when>
            <thetext>Still broken here (x86 and amd64): sys-apps/man-1.6d, sys-apps/groff-1.19.2-r1
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>zzam@gentoo.org</who>
            <bug_when>2006-09-05 01:53:56 0000</bug_when>
            <thetext>This bug can be solved by adding the hack now positioned in /usr/share/groff/site-tmac/man.local also to /usr/share/groff/site-tmac/mdoc.local.

See attached (modified) groff-man-UTF-8.diff.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>zzam@gentoo.org</who>
            <bug_when>2006-09-05 01:55:34 0000</bug_when>
            <thetext>Created an attachment (id=96041)
groff-man-UTF-8.diff-modified

</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>zzam@gentoo.org</who>
            <bug_when>2006-09-05 02:01:20 0000</bug_when>
            <thetext>Created an attachment (id=96043)
groff-man-UTF-8.diff-second-try

</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2007-04-04 07:28:43 0000</bug_when>
            <thetext>*** Bug 173165 has been marked as a duplicate of this bug. ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>sirspiritus@yandex.ru</who>
            <bug_when>2007-04-04 19:19:10 0000</bug_when>
            <thetext>I have seen groff and man in FC and Debian Etch are patched for compatibility with UTF8 and autorecoding non-UTF8 mans(in KOI8-R, etc) to UTF. Patches are inside their source packages. For examples: http://mirrors.dotsrc.org/fedora/6/source/SRPMS/man-1.6d-1.1.src.rpm and http://mirrors.dotsrc.org/fedora/6/source/SRPMS/groff-1.18.1.1-11.1.src.rpm.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>rane@gentoo.org</who>
            <bug_when>2007-04-28 15:12:30 0000</bug_when>
            <thetext>*** Bug 176363 has been marked as a duplicate of this bug. ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2007-09-06 15:58:45 0000</bug_when>
            <thetext>*** Bug 191488 has been marked as a duplicate of this bug. ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>vapier@gentoo.org</who>
            <bug_when>2008-02-24 18:11:12 0000</bug_when>
            <thetext>

*** This bug has been marked as a duplicate of bug 126361 ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>vapier@gentoo.org</who>
            <bug_when>2008-02-24 18:42:46 0000</bug_when>
            <thetext>blah, goddamn mess of dupes

this bug is about the dash issue with unicode / non-unicode

it is not about anything else</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>vapier@gentoo.org</who>
            <bug_when>2008-02-24 18:57:43 0000</bug_when>
            <thetext>looks like this was half way fixed (man.local) but the important part (mdoc.local) was left out

groff-1.19.2-r2 includes mdoc.local as well

http://sources.gentoo.org/sys-apps/groff/files/groff-1.19.2-man-unicode-dashes.patch?rev=1.1</thetext>
          </long_desc>
      
          <attachment
              isobsolete="1"
              ispatch="1"
              isprivate="0"
          >
            <attachid>96041</attachid>
            <date>2006-09-05 01:55 0000</date>
            <desc>groff-man-UTF-8.diff-modified</desc>
            <filename>groff-man-UTF-8.diff</filename>
            <type>text/plain</type>
            <data encoding="base64">ZGlmZiAtdXIgZ3JvZmYtMS4xOC4xLm9yaWcvdG1hYy9tYW4ubG9jYWwgZ3JvZmYtMS4xOC4xL3Rt
YWMvbWFuLmxvY2FsCi0tLSBncm9mZi0xLjE4LjEub3JpZy90bWFjL21hbi5sb2NhbAkyMDAwLTEw
LTI2IDE2OjE1OjE3LjAwMDAwMDAwMCArMDIwMAorKysgZ3JvZmYtMS4xOC4xL3RtYWMvbWFuLmxv
Y2FsCTIwMDMtMDMtMTYgMDI6MTU6NTAuMDAwMDAwMDAwICswMTAwCkBAIC0xLDIgKzEsNiBAQAog
LlwiIFRoaXMgZmlsZSBpcyBsb2FkZWQgYWZ0ZXIgYW4tb2xkLnRtYWMuCiAuXCIgUHV0IGFueSBs
b2NhbCBtb2RpZmljYXRpb25zIHRvIGFuLW9sZC50bWFjIGhlcmUuCisuaWYgJ1wqWy5UXSd1dGY4
JyBce1wKKy4gIGNoYXIgXC0gXE4nNDUnCisuICBjaGFyICAtIFxOJzQ1JworLlx9ZGlmZiAtdXIg
Z3JvZmYtMS4xOC4xLm9yaWcvdG1hYy9tYW4ubG9jYWwgZ3JvZmYtMS4xOC4xL3RtYWMvbWFuLmxv
Y2FsCi0tLSBncm9mZi0xLjE4LjEub3JpZy90bWFjL21kb2MubG9jYWwJMjAwMC0xMC0yNiAxNjox
NToxNy4wMDAwMDAwMDAgKzAyMDAKKysrIGdyb2ZmLTEuMTguMS90bWFjL21kb2MubG9jYWwJMjAw
My0wMy0xNiAwMjoxNTo1MC4wMDAwMDAwMDAgKzAxMDAKQEAgLTEsMiArMSw2IEBACiAuXCIgVGhp
cyBmaWxlIGlzIGxvYWRlZCBhZnRlciBkb2MudG1hYy4KIC5cIiBQdXQgYW55IGxvY2FsIG1vZGlm
aWNhdGlvbnMgdG8gZG9jLnRtYWMgaGVyZS4KKy5pZiAnXCpbLlRdJ3V0ZjgnIFx7XAorLiAgY2hh
ciBcLSBcTic0NScKKy4gIGNoYXIgIC0gXE4nNDUnCisuXH0K
</data>        

          </attachment>
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>96043</attachid>
            <date>2006-09-05 02:01 0000</date>
            <desc>groff-man-UTF-8.diff-second-try</desc>
            <filename>groff-man-UTF-8.diff</filename>
            <type>text/plain</type>
            <data encoding="base64">ZGlmZiAtdXIgZ3JvZmYtMS4xOC4xLm9yaWcvdG1hYy9tYW4ubG9jYWwgZ3JvZmYtMS4xOC4xL3Rt
YWMvbWFuLmxvY2FsCi0tLSBncm9mZi0xLjE4LjEub3JpZy90bWFjL21hbi5sb2NhbAkyMDAwLTEw
LTI2IDE2OjE1OjE3LjAwMDAwMDAwMCArMDIwMAorKysgZ3JvZmYtMS4xOC4xL3RtYWMvbWFuLmxv
Y2FsCTIwMDMtMDMtMTYgMDI6MTU6NTAuMDAwMDAwMDAwICswMTAwCkBAIC0xLDIgKzEsNiBAQAog
LlwiIFRoaXMgZmlsZSBpcyBsb2FkZWQgYWZ0ZXIgYW4tb2xkLnRtYWMuCiAuXCIgUHV0IGFueSBs
b2NhbCBtb2RpZmljYXRpb25zIHRvIGFuLW9sZC50bWFjIGhlcmUuCisuaWYgJ1wqWy5UXSd1dGY4
JyBce1wKKy4gIGNoYXIgXC0gXE4nNDUnCisuICBjaGFyICAtIFxOJzQ1JworLlx9CmRpZmYgLXVy
IGdyb2ZmLTEuMTguMS5vcmlnL3RtYWMvbWFuLmxvY2FsIGdyb2ZmLTEuMTguMS90bWFjL21hbi5s
b2NhbAotLS0gZ3JvZmYtMS4xOC4xLm9yaWcvdG1hYy9tZG9jLmxvY2FsCTIwMDAtMTAtMjYgMTY6
MTU6MTcuMDAwMDAwMDAwICswMjAwCisrKyBncm9mZi0xLjE4LjEvdG1hYy9tZG9jLmxvY2FsCTIw
MDMtMDMtMTYgMDI6MTU6NTAuMDAwMDAwMDAwICswMTAwCkBAIC0xLDIgKzEsNiBAQAogLlwiIFRo
aXMgZmlsZSBpcyBsb2FkZWQgYWZ0ZXIgZG9jLnRtYWMuCiAuXCIgUHV0IGFueSBsb2NhbCBtb2Rp
ZmljYXRpb25zIHRvIGRvYy50bWFjIGhlcmUuCisuaWYgJ1wqWy5UXSd1dGY4JyBce1wKKy4gIGNo
YXIgXC0gXE4nNDUnCisuICBjaGFyICAtIFxOJzQ1JworLlx9Cg==
</data>        

          </attachment>
    </bug>

</bugzilla>