193739 – sys-libs/glibc-2.5 - iconv conversion from and to IBM-1047 EOL characters fails

Bug 193739 - sys-libs/glibc-2.5 - iconv conversion from and to IBM-1047 EOL characters fails

Summary: sys-libs/glibc-2.5 - iconv conversion from and to IBM-1047 EOL characters fails

Status:	RESOLVED NEEDINFO

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Library (show other bugs)
Hardware:	AMD64 Linux

Importance:	High minor
Assignee:	Gentoo Toolchain Maintainers

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-09-25 12:01 UTC by Loc_Vyler
Modified:	2009-04-20 00:52 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
text file in ibm-1047 that is used in description (ibm-1047.example,79 bytes, application/octet-stream) 2007-09-25 16:59 UTC, Loc_Vyler	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Loc_Vyler 2007-09-25 12:01:18 UTC

Convertion from IBM-1047 to some (any) ascii codepage fails. EOL is converted wrong.


Reproducible: Always

Steps to Reproduce:
1. iconv -f IBM-1047 -t UTF8 ibm-1047.example > ibm-1047.to.utf8.converted
or
2. iconv -f UTF8 -t IBM-1047 utf8.example > utf8.to.ibm-1047.converted
etc
Actual Results:  
What is converted:
abcdefghijklmnopqrstuvwxyz01234567890!@#$%^&*()_+-=<EOL>
ABCDEFGHIJKLMNOPQRSTUVWXYZ<EOL>
<EOL> - is \x0a or \x0d\0xOa in ascii
and \x15 in ibm-1047

Expected Results:  
ibm-1047.example is

$ hexdump -C ibm-1047.example
00000000  81 82 83 84 85 86 87 88  89 91 92 93 94 95 96 97  |................|
00000010  98 99 a2 a3 a4 a5 a6 a7  a8 a9 f0 f1 f2 f3 f4 f5  |................|
00000020  f6 f7 f8 f9 f0 5a 7c 7b  5b 6c 5f 50 5c 4d 5d 6d  |.....Z|{[l_P\M]m|
00000030  4e 60 7e[15]c1 c2 c3 c4  c5 c6 c7 c8 c9 d1 d2 d3  |N`~.............|
00000040  d4 d5 d6 d7 d8 d9 e2 e3  e4 e5 e6 e7 e8 e9[15]    |...............|
[15] - is EOL

after iconv -f IBM-1047 -t UTF8 ibm-1047.example > ibm-1047.to.utf8.converted

00000000  61 62 63 64 65 66 67 68  69 6a 6b 6c 6d 6e 6f 70  |abcdefghijklmnop|
00000010  71 72 73 74 75 76 77 78  79 7a 30 31 32 33 34 35  |qrstuvwxyz012345|
00000020  36 37 38 39 30 21 40 23  24 25 5e 26 2a 28 29 5f  |67890!@#$%^&*()_|
00000030  2b 2d 3d[c2 85]41 42 43  44 45 46 47 48 49 4a 4b  |+-=..ABCDEFGHIJK|
00000040  4c 4d 4e 4f 50 51 52 53  54 55 56 57 58 59 5a[c2  |LMNOPQRSTUVWXYZ.|
00000050  85]                                               |.|
So, EOL [15] converted to [c2 85] sequence instead of [0a]

And vice-versa:
$ hexdump -C utf8.example
00000000  61 62 63 64 65 66 67 68  69 6a 6b 6c 6d 6e 6f 70  |abcdefghijklmnop|
00000010  71 72 73 74 75 76 77 78  79 7a 30 31 32 33 34 35  |qrstuvwxyz012345|
00000020  36 37 38 39 30 21 40 23  24 25 5e 26 2a 28 29 5f  |67890!@#$%^&*()_|
00000030  2b 2d 3d[0a]41 42 43 44  45 46 47 48 49 4a 4b 4c  |+-=.ABCDEFGHIJKL|
00000040  4d 4e 4f 50 51 52 53 54  55 56 57 58 59 5a[0a]    |MNOPQRSTUVWXYZ.|
[0a] - is EOL

after iconv -f UTF8 -t IBM-1047 utf8.example > utf8.to.ibm-1047.converted

$ hexdump -C utf8.to.ibm-1047.converted
00000000  81 82 83 84 85 86 87 88  89 91 92 93 94 95 96 97  |................|
00000010  98 99 a2 a3 a4 a5 a6 a7  a8 a9 f0 f1 f2 f3 f4 f5  |................|
00000020  f6 f7 f8 f9 f0 5a 7c 7b  5b 6c 5f 50 5c 4d 5d 6d  |.....Z|{[l_P\M]m|
00000030  4e 60 7e[25]c1 c2 c3 c4  c5 c6 c7 c8 c9 d1 d2 d3  |N`~%............|
00000040  d4 d5 d6 d7 d8 d9 e2 e3  e4 e5 e6 e7 e8 e9[25]    |..............%|
So, EOL [0a] converted to [25] instead of [15]

if we take example with \x0d\0xOa line ending just \xOa will converted to \x25, \x0d will stay in converted file like it was in original

iconv from another *nix system libc seems work proper.
It converts \x15 to \x0a in case of iconv -f IBM-1047 -t UTF8
and from \x0a to \x15 in case of iconv -f UTF8 -t IBM-1047

If we look at ibm1047.c (.../glibc-2.5/iconvdata/ibm1047.c)
...
#include <stdint.h>

/* Get the conversion table.  */
#include <ibm1047.h>

#define CHARSET_NAME    "IBM1047//"
#define HAS_HOLES       1       /* Not all 256 character are defined.  */

#include <8bit-generic.c>

But file ibm1047.h doesn't exist in glibc-2.5/iconvdata/
And I couldn't find it in any other versions of glibc (that I found)

Comment 1 SpanKY gentoo-dev

2007-09-25 14:39:26 UTC

that's probably because the tables are generated on the fly while building

actually post the files as attachments instead of printing their hexdumps

Comment 2 Loc_Vyler 2007-09-25 16:59:42 UTC

Created attachment 131881 [details]
text file in ibm-1047 that is used in description

Comment 3 SpanKY gentoo-dev

2007-10-07 02:49:50 UTC

can you open a bug here please:
http://sources.redhat.com/bugzilla/

you know a lot more about the issue than i ;)

Comment 4 Mark Loeser (RETIRED) gentoo-dev

2009-04-20 00:52:43 UTC

Is this still an issue?  Was it reported upstream? :)