Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 193739 - sys-libs/glibc-2.5 - iconv conversion from and to IBM-1047 EOL characters fails
Summary: sys-libs/glibc-2.5 - iconv conversion from and to IBM-1047 EOL characters fails
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Library (show other bugs)
Hardware: AMD64 Linux
: High minor (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-25 12:01 UTC by Loc_Vyler
Modified: 2009-04-20 00:52 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
text file in ibm-1047 that is used in description (ibm-1047.example,79 bytes, application/octet-stream)
2007-09-25 16:59 UTC, Loc_Vyler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Loc_Vyler 2007-09-25 12:01:18 UTC
Convertion from IBM-1047 to some (any) ascii codepage fails. EOL is converted wrong.


Reproducible: Always

Steps to Reproduce:
1. iconv -f IBM-1047 -t UTF8 ibm-1047.example > ibm-1047.to.utf8.converted
or
2. iconv -f UTF8 -t IBM-1047 utf8.example > utf8.to.ibm-1047.converted
etc
Actual Results:  
What is converted:
abcdefghijklmnopqrstuvwxyz01234567890!@#$%^&*()_+-=<EOL>
ABCDEFGHIJKLMNOPQRSTUVWXYZ<EOL>
<EOL> - is \x0a or \x0d\0xOa in ascii
and \x15 in ibm-1047

Expected Results:  
ibm-1047.example is

$ hexdump -C ibm-1047.example
00000000  81 82 83 84 85 86 87 88  89 91 92 93 94 95 96 97  |................|
00000010  98 99 a2 a3 a4 a5 a6 a7  a8 a9 f0 f1 f2 f3 f4 f5  |................|
00000020  f6 f7 f8 f9 f0 5a 7c 7b  5b 6c 5f 50 5c 4d 5d 6d  |.....Z|{[l_P\M]m|
00000030  4e 60 7e[15]c1 c2 c3 c4  c5 c6 c7 c8 c9 d1 d2 d3  |N`~.............|
00000040  d4 d5 d6 d7 d8 d9 e2 e3  e4 e5 e6 e7 e8 e9[15]    |...............|
[15] - is EOL

after iconv -f IBM-1047 -t UTF8 ibm-1047.example > ibm-1047.to.utf8.converted

00000000  61 62 63 64 65 66 67 68  69 6a 6b 6c 6d 6e 6f 70  |abcdefghijklmnop|
00000010  71 72 73 74 75 76 77 78  79 7a 30 31 32 33 34 35  |qrstuvwxyz012345|
00000020  36 37 38 39 30 21 40 23  24 25 5e 26 2a 28 29 5f  |67890!@#$%^&*()_|
00000030  2b 2d 3d[c2 85]41 42 43  44 45 46 47 48 49 4a 4b  |+-=..ABCDEFGHIJK|
00000040  4c 4d 4e 4f 50 51 52 53  54 55 56 57 58 59 5a[c2  |LMNOPQRSTUVWXYZ.|
00000050  85]                                               |.|
So, EOL [15] converted to [c2 85] sequence instead of [0a]

And vice-versa:
$ hexdump -C utf8.example
00000000  61 62 63 64 65 66 67 68  69 6a 6b 6c 6d 6e 6f 70  |abcdefghijklmnop|
00000010  71 72 73 74 75 76 77 78  79 7a 30 31 32 33 34 35  |qrstuvwxyz012345|
00000020  36 37 38 39 30 21 40 23  24 25 5e 26 2a 28 29 5f  |67890!@#$%^&*()_|
00000030  2b 2d 3d[0a]41 42 43 44  45 46 47 48 49 4a 4b 4c  |+-=.ABCDEFGHIJKL|
00000040  4d 4e 4f 50 51 52 53 54  55 56 57 58 59 5a[0a]    |MNOPQRSTUVWXYZ.|
[0a] - is EOL

after iconv -f UTF8 -t IBM-1047 utf8.example > utf8.to.ibm-1047.converted

$ hexdump -C utf8.to.ibm-1047.converted
00000000  81 82 83 84 85 86 87 88  89 91 92 93 94 95 96 97  |................|
00000010  98 99 a2 a3 a4 a5 a6 a7  a8 a9 f0 f1 f2 f3 f4 f5  |................|
00000020  f6 f7 f8 f9 f0 5a 7c 7b  5b 6c 5f 50 5c 4d 5d 6d  |.....Z|{[l_P\M]m|
00000030  4e 60 7e[25]c1 c2 c3 c4  c5 c6 c7 c8 c9 d1 d2 d3  |N`~%............|
00000040  d4 d5 d6 d7 d8 d9 e2 e3  e4 e5 e6 e7 e8 e9[25]    |..............%|
So, EOL [0a] converted to [25] instead of [15]

if we take example with \x0d\0xOa line ending just \xOa will converted to \x25, \x0d will stay in converted file like it was in original

iconv from another *nix system libc seems work proper.
It converts \x15 to \x0a in case of iconv -f IBM-1047 -t UTF8
and from \x0a to \x15 in case of iconv -f UTF8 -t IBM-1047

If we look at ibm1047.c (.../glibc-2.5/iconvdata/ibm1047.c)
...
#include <stdint.h>

/* Get the conversion table.  */
#include <ibm1047.h>

#define CHARSET_NAME    "IBM1047//"
#define HAS_HOLES       1       /* Not all 256 character are defined.  */

#include <8bit-generic.c>

But file ibm1047.h doesn't exist in glibc-2.5/iconvdata/
And I couldn't find it in any other versions of glibc (that I found)
Comment 1 SpanKY gentoo-dev 2007-09-25 14:39:26 UTC
that's probably because the tables are generated on the fly while building

actually post the files as attachments instead of printing their hexdumps
Comment 2 Loc_Vyler 2007-09-25 16:59:42 UTC
Created attachment 131881 [details]
text file in ibm-1047 that is used in description
Comment 3 SpanKY gentoo-dev 2007-10-07 02:49:50 UTC
can you open a bug here please:
http://sources.redhat.com/bugzilla/

you know a lot more about the issue than i ;)
Comment 4 Mark Loeser (RETIRED) gentoo-dev 2009-04-20 00:52:43 UTC
Is this still an issue?  Was it reported upstream? :)