Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 53141

Summary: sed with nls support cannot cope with different charsets
Product: Gentoo Linux Reporter: jochen <jochen.eisinger>
Component: [OLD] Core systemAssignee: Gentoo's Team for Core System packages <base-system>
Status: RESOLVED NEEDINFO    
Severity: normal CC: ciaran.mccreesh
Priority: High    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
Package list:
Runtime testing required: ---

Description jochen 2004-06-06 09:37:54 UTC
I have sed 4.0.9 compiled with nls support enabled. However, in an UTF-8 terminal I can't process latin1 texts anymore:

Reproducible: Always
Steps to Reproduce:
1. echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*bar/bar/
2. echo -e "foo\337bar" | LC_CTYPE=C sed s/foo.*bar/bar/
3.

Actual Results:  
foo
Comment 1 jochen 2004-06-06 09:37:54 UTC
I have sed 4.0.9 compiled with nls support enabled. However, in an UTF-8 terminal I can't process latin1 texts anymore:

Reproducible: Always
Steps to Reproduce:
1. echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*bar/bar/
2. echo -e "foo\337bar" | LC_CTYPE=C sed s/foo.*bar/bar/
3.

Actual Results:  
fooßbar
bar

Expected Results:  
bar
bar
Comment 2 Ciaran McCreesh 2004-06-06 12:05:54 UTC
I'm pretty sure this is INVALID. If I'm remembering my Unicode right, \377b is one 'character' in UTF-8. What's the output of the following?

echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/
Comment 3 jochen 2004-09-02 05:43:20 UTC
sorry for the long delay.

hum, yes. you're right, \337b is one character.

However:

$ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/
foo
Comment 4 jochen 2004-09-02 05:43:20 UTC
sorry for the long delay.

hum, yes. you're right, \337b is one character.

However:

$ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/
fooßbar

and even:

$ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*/bar/
barßbar

Comment 5 SpanKY gentoo-dev 2004-09-24 19:52:10 UTC
can you try this out with sed-4.1.2 ?
Comment 6 SpanKY gentoo-dev 2005-02-11 19:23:36 UTC
get back to us on whether 4.1.2 does the right thing