Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 20141 - recode can't handle utf-8
Summary: recode can't handle utf-8
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: PPC Linux
: High normal
Assignee: PPC Porters
URL:
Whiteboard:
Keywords:
: 20139 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-04-28 19:36 UTC by Lars Weiler (RETIRED)
Modified: 2006-02-04 06:03 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lars Weiler (RETIRED) gentoo-dev 2003-04-28 19:36:06 UTC
Using the recode-3.6 it can't handle utf-8 files.  If you try to recode a file
(assuming an xml-document of gentoo.org) to latin1 it fails on the first utf-8
character with the error

recode: Invalid input in step `UTF-8..ISO-8859-1'

Step to reproduce:
recode utf-8..latin1 < gentoo-x86-install.xml  (of the german doc-tree)

DarkSpecter has also this problem.  On x86 this works without problems.
Comment 1 Lars Weiler (RETIRED) gentoo-dev 2003-04-28 19:41:50 UTC
I looked into the sources of recode (especially utf8.c) and assume, that the copy process of the utf-8 characters are in little-endian.  Can somebody with a good C knowledge look into that file?
Comment 2 Lars Weiler (RETIRED) gentoo-dev 2003-04-28 20:06:03 UTC
*** Bug 20139 has been marked as a duplicate of this bug. ***
Comment 3 John Steele Scott 2003-06-13 22:56:11 UTC
Bug 20027 seems to have the solution to this problem. Attachment 11212 [details, diff] is a patch pulled from Debian (http://packages.debian.org/stable/text/recode.html), and attachment 11211 [details] is the ebuild which makes use of it.

Now with this patch, recode no longer borks on the example which Lars originally wrote about. However, I don't know if the output is _correct_. :) Anyone?
Comment 4 Lars Weiler (RETIRED) gentoo-dev 2003-06-28 16:35:38 UTC
Thanks for pointing to this patch.  This does really resolve the utf8-problem :-)

So, I commited a new recode-3.6-r1 to portage and masked it ppc.