Summary: | sed with nls support cannot cope with different charsets | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | jochen <jochen.eisinger> |
Component: | [OLD] Core system | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED NEEDINFO | ||
Severity: | normal | CC: | ciaran.mccreesh |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
jochen
2004-06-06 09:37:54 UTC
I have sed 4.0.9 compiled with nls support enabled. However, in an UTF-8 terminal I can't process latin1 texts anymore: Reproducible: Always Steps to Reproduce: 1. echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*bar/bar/ 2. echo -e "foo\337bar" | LC_CTYPE=C sed s/foo.*bar/bar/ 3. Actual Results: fooßbar bar Expected Results: bar bar I'm pretty sure this is INVALID. If I'm remembering my Unicode right, \377b is one 'character' in UTF-8. What's the output of the following? echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/ sorry for the long delay. hum, yes. you're right, \337b is one character. However: $ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/ foo sorry for the long delay. hum, yes. you're right, \337b is one character. However: $ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*ar/bar/ fooßbar and even: $ echo -e "foo\337bar" | LC_CTYPE=de_DE.UTF-8 sed s/foo.*/bar/ barßbar can you try this out with sed-4.1.2 ? get back to us on whether 4.1.2 does the right thing |