Summary: | sed's misdoings | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Igor Golubev <ooptimum> |
Component: | Current packages | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED INVALID | ||
Severity: | critical | CC: | truedfx |
Priority: | High | ||
Version: | 2006.1 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Igor Golubev
2006-09-29 08:24:02 UTC
I forgot to indicate header for the first `emerge --info` output. Here it is: Portage 2.1.1 (default-linux/x86/2006.1/server, gcc-4.1.1, glibc-2.4-r3, 2.6.17-gentoo-r4 i686) ================================================================= $ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g' [a][][] $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [][][] $ emerge --info | grep glibc Portage 2.1.2_pre1-r4 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r4, 2.6.17-gentoo-r8-amd64 i686) Really don't see how is this glibc-2.4 issue. This gave me the wrong clue, Jacub: $ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g' [a][b][z] $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [g][i][f] $ emerge --info |grep glibc Portage 2.1.1 (hardened/x86/2.6, gcc-3.3.6, glibc-2.3.6-r4, 2.6.11-hardened-r15 i686) There's no bug here. If you want to match only the uppercase letters of the English alphabet, set LC_ALL=C. If you want to match the uppercase letters of the current locale, use [[:upper:]]. [A-Z] means "uppercase A, uppercase Z, or any of the characters that would be sorted between them in the current locale", and in en_US.UTF-8, that includes the lowercase b through z. echo {A..Z} {a..z} | fmt -w 1 | sort Ubuntu 6.06LTS: $ locale LANG=ru_RU.UTF-8 LANGUAGE=ru_RU:ru:en_GB:en LC_CTYPE="ru_RU.UTF-8" LC_NUMERIC="ru_RU.UTF-8" LC_TIME="ru_RU.UTF-8" LC_COLLATE="ru_RU.UTF-8" LC_MONETARY="ru_RU.UTF-8" LC_MESSAGES="ru_RU.UTF-8" LC_PAPER="ru_RU.UTF-8" LC_NAME="ru_RU.UTF-8" LC_ADDRESS="ru_RU.UTF-8" LC_TELEPHONE="ru_RU.UTF-8" LC_MEASUREMENT="ru_RU.UTF-8" LC_IDENTIFICATION="ru_RU.UTF-8" LC_ALL= $ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g' [a][b][z] $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [g][i][f] $ sed --version GNU sed версия 4.1.4 Don't you think that this behaviour of sed in Gentoo could lead to numerous mistakes in the scripts written with this syntax in mind? on Gentoo/Linux $ locale LANG=ru_RU.KOI8-R LC_CTYPE="ru_RU.KOI8-R" LC_NUMERIC="ru_RU.KOI8-R" LC_TIME="ru_RU.KOI8-R" LC_COLLATE="ru_RU.KOI8-R" LC_MONETARY="ru_RU.KOI8-R" LC_MESSAGES="ru_RU.KOI8-R" LC_PAPER="ru_RU.KOI8-R" LC_NAME="ru_RU.KOI8-R" LC_ADDRESS="ru_RU.KOI8-R" LC_TELEPHONE="ru_RU.KOI8-R" LC_MEASUREMENT="ru_RU.KOI8-R" LC_IDENTIFICATION="ru_RU.KOI8-R" LC_ALL= $ echo "[aA][bB][cC]" | sed 's/[A-Z]//g' && sed --version | grep sed [a][][] GNU sed версия 4.1.5 $ emerge --info | grep glibc | grep gcc Portage 2.1.1 (default-linux/x86/2006.1/desktop, gcc-4.1.1, glibc-2.4-r3, 2.6.18.xsuid.bot i686) Actions on ASCII character ranges should not depend on the locale. From urxvt launched with LANG="C" wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]/' [b][aA][zZ] wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's[A-Z]//g' [g][i][f] From urxvt launched with LANG="ru_RU.KOI8-R" wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]//' [B][aA][zZ] wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [][][] (In reply to comment #5) > Don't you think that this behaviour of sed in Gentoo could lead to numerous > mistakes in the scripts written with this syntax in mind? Such scripts are broken and should be fixed -- and they are. (In reply to comment #7) > Actions on ASCII character ranges should not depend on the locale. Yes, they should. This is briefly mentioned in the sed info page, as well as the behaviour required by POSIX. sorry i missed some symbols in my previous post, with "С" all ok. But from urxvt launched with LANG="ru_RU.KOI8-R" get wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]//g' [][a][] wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [][][] Harald van Dijk is spot on with everything he has said |