<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://bugs.gentoo.org/bugzilla.dtd">

<bugzilla version="2.22.7"
          urlbase="http://bugs.gentoo.org/"
          maintainer="bugzilla@gentoo.org"
>

    <bug>
          <bug_id>20141</bug_id>
          
          <creation_ts>2003-04-28 19:36 0000</creation_ts>
          <short_desc>recode can&apos;t handle utf-8</short_desc>
          <delta_ts>2006-02-04 06:03:45 0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Gentoo Linux</product>
          <component>Applications</component>
          <version>unspecified</version>
          <rep_platform>PPC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>pylon@gentoo.org</reporter>
          <assigned_to>ppc@gentoo.org</assigned_to>
          

      

      
          <long_desc isprivate="0">
            <who>pylon@gentoo.org</who>
            <bug_when>2003-04-28 19:36:06 0000</bug_when>
            <thetext>Using the recode-3.6 it can&apos;t handle utf-8 files.  If you try to recode a file
(assuming an xml-document of gentoo.org) to latin1 it fails on the first utf-8
character with the error

recode: Invalid input in step `UTF-8..ISO-8859-1&apos;

Step to reproduce:
recode utf-8..latin1 &lt; gentoo-x86-install.xml  (of the german doc-tree)

DarkSpecter has also this problem.  On x86 this works without problems.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>pylon@gentoo.org</who>
            <bug_when>2003-04-28 19:41:50 0000</bug_when>
            <thetext>I looked into the sources of recode (especially utf8.c) and assume, that the copy process of the utf-8 characters are in little-endian.  Can somebody with a good C knowledge look into that file?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>pylon@gentoo.org</who>
            <bug_when>2003-04-28 20:06:03 0000</bug_when>
            <thetext>*** Bug 20139 has been marked as a duplicate of this bug. ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>toojays@toojays.net</who>
            <bug_when>2003-06-13 22:56:11 0000</bug_when>
            <thetext>Bug 20027 seems to have the solution to this problem. Attachment 11212 is a patch pulled from Debian (http://packages.debian.org/stable/text/recode.html), and attachment 11211 is the ebuild which makes use of it.

Now with this patch, recode no longer borks on the example which Lars originally wrote about. However, I don&apos;t know if the output is _correct_. :) Anyone?</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>pylon@gentoo.org</who>
            <bug_when>2003-06-28 16:35:38 0000</bug_when>
            <thetext>Thanks for pointing to this patch.  This does really resolve the utf8-problem :-)

So, I commited a new recode-3.6-r1 to portage and masked it ppc.</thetext>
          </long_desc>
      
    </bug>

</bugzilla>