Summary: | sys-apps/sed: processes files with non-ASCII chars wrong if LC_ALL="C" | ||
---|---|---|---|
Product: | Gentoo/Alt | Reporter: | Charles Davis <cdavis5x> |
Component: | Prefix Support | Assignee: | Gentoo Prefix <prefix> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | ||
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | OS X | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Charles Davis
2010-11-07 05:19:59 UTC
The C-locale of the Mac actually is MacRoman. After fixing your sed, this should work for you too: % env LC_ALL=en_GB.UTF-8 sed -e 's/\(.*\)/\"\1\",/' test.txt "aâbc", I don't think it's really sed's fault (In reply to comment #1) > The C-locale of the Mac actually is MacRoman. That makes sense. > > After fixing your sed, this should work for you too: > % env LC_ALL=en_GB.UTF-8 sed -e 's/\(.*\)/\"\1\",/' test.txt > "aâbc", That works all right. > > I don't think it's really sed's fault I forgot to mention that this works perfectly fine with Mac OS X's built-in sed with LC_ALL=C. (Gee, that would have been helpful to know before! ;) Personally I think it is sed's fault. In a regexp, '.' means "match ANY character." Especially considering that Mac OS's own sed has no trouble at all with this, then if GNU sed's not matching all the characters--even the non-ASCII ones--this is a problem. In fact, I tried it with GNU sed from MacPorts. It too has this problem. This appears to be a bug in GNU sed itself. Sorry to have wasted your time. |