Bug 20141 - recode can't handle utf-8
Bug#: 20141 Product:  Gentoo Linux Version: unspecified Platform: PPC
OS/Version: Linux Status: RESOLVED Severity: normal Priority: P2
Resolution: FIXED Assigned To: ppc@gentoo.org Reported By: pylon@gentoo.org
Component: Applications
URL: 
Summary: recode can't handle utf-8
Keywords:  
Status Whiteboard: 
Opened: 2003-04-28 19:36 0000
Description:   Opened: 2003-04-28 19:36 0000
Using the recode-3.6 it can't handle utf-8 files.  If you try to recode a file
(assuming an xml-document of gentoo.org) to latin1 it fails on the first utf-8
character with the error

recode: Invalid input in step `UTF-8..ISO-8859-1'

Step to reproduce:
recode utf-8..latin1 < gentoo-x86-install.xml  (of the german doc-tree)

DarkSpecter has also this problem.  On x86 this works without problems.

------- Comment #1 From Lars Weiler (RETIRED) 2003-04-28 19:41:50 0000 -------
I looked into the sources of recode (especially utf8.c) and assume, that the
copy process of the utf-8 characters are in little-endian.  Can somebody with a
good C knowledge look into that file?

------- Comment #2 From Lars Weiler (RETIRED) 2003-04-28 20:06:03 0000 -------
*** Bug 20139 has been marked as a duplicate of this bug. ***

------- Comment #3 From John Steele Scott 2003-06-13 22:56:11 0000 -------
Bug 20027 seems to have the solution to this problem. Attachment 11212 [details] is a
patch pulled from Debian (http://packages.debian.org/stable/text/recode.html),
and attachment 11211 [details] is the ebuild which makes use of it.

Now with this patch, recode no longer borks on the example which Lars
originally wrote about. However, I don't know if the output is _correct_. :)
Anyone?

------- Comment #4 From Lars Weiler (RETIRED) 2003-06-28 16:35:38 0000 -------
Thanks for pointing to this patch.  This does really resolve the utf8-problem
:-)

So, I commited a new recode-3.6-r1 to portage and masked it ppc.