Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 53141 - sed with nls support cannot cope with different charsets
Summary: sed with nls support cannot cope with different charsets
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All All
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-06-06 09:37 UTC by jochen
Modified: 2005-02-11 19:23 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jochen 2004-06-06 09:37:54 UTC
I have sed 4.0.9 compiled with nls support enabled. However, in an UTF-8 terminal I can't process latin1 texts anymore:

Reproducible: Always
Steps to Reproduce:
1. echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*bar/bar/
2. echo -e "foo\337bar" | LC_CTYPE=C sed s/foo.*bar/bar/
3.

Actual Results:  
foo
Comment 1 jochen 2004-06-06 09:37:54 UTC
I have sed 4.0.9 compiled with nls support enabled. However, in an UTF-8 terminal I can't process latin1 texts anymore:

Reproducible: Always
Steps to Reproduce:
1. echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*bar/bar/
2. echo -e "foo\337bar" | LC_CTYPE=C sed s/foo.*bar/bar/
3.

Actual Results:  
fooßbar
bar

Expected Results:  
bar
bar
Comment 2 Ciaran McCreesh 2004-06-06 12:05:54 UTC
I'm pretty sure this is INVALID. If I'm remembering my Unicode right, \377b is one 'character' in UTF-8. What's the output of the following?

echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/
Comment 3 jochen 2004-09-02 05:43:20 UTC
sorry for the long delay.

hum, yes. you're right, \337b is one character.

However:

$ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/
foo
Comment 4 jochen 2004-09-02 05:43:20 UTC
sorry for the long delay.

hum, yes. you're right, \337b is one character.

However:

$ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/
fooßbar

and even:

$ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*/bar/
barßbar

Comment 5 SpanKY gentoo-dev 2004-09-24 19:52:10 UTC
can you try this out with sed-4.1.2 ?
Comment 6 SpanKY gentoo-dev 2005-02-11 19:23:36 UTC
get back to us on whether 4.1.2 does the right thing