$ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g' [a][][] the result must be: [a][b][z] $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [][][] the result must be: [g][i][f] sed gives wrong output on these both configurations: System uname: 2.6.17-gentoo-r4 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz Gentoo Base System version 1.12.5 Last Sync: Fri, 29 Sep 2006 12:30:04 +0000 app-admin/eselect-compiler: [Not Present] dev-java/java-config: [Not Present] dev-lang/python: 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: [Not Present] dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.17.50.0.3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r5 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=i686 -fomit-frame-pointer -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -march=i686 -fomit-frame-pointer -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict" GENTOO_MIRRORS="ftp://mirror.aiya.ru/pub/gentoo/ ftp://ftp.citkit.ru/pub/Linux/gentoo/" LANG="en_US.UTF-8" LC_ALL="" LDFLAGS="-Wl,-O1,--hash-style=both" LINGUAS="" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/portage/local /usr/portage/local/layman/toolchain_overlay" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 apache2 berkdb bitmap-fonts cli crypt cups dlloader dri elibc_glibc fortran input_devices_evdev input_devices_keyboard input_devices_mouse isdnlog kernel_linux libg++ mailwrapper mysql ncurses nls nptl nptlonly pam pcre perl ppds pppd python readline reflection session snmp spl ssl truetype truetype-fonts type1-fonts udev unicode userland_GNU vhosts video_cards_apm video_cards_ark video_cards_ati video_cards_chips video_cards_cirrus video_cards_cyrix video_cards_dummy video_cards_fbdev video_cards_glint video_cards_i128 video_cards_i740 video_cards_i810 video_cards_imstt video_cards_mga video_cards_neomagic video_cards_nsc video_cards_nv video_cards_rendition video_cards_s3 video_cards_s3virge video_cards_savage video_cards_siliconmotion video_cards_sis video_cards_sisusb video_cards_tdfx video_cards_tga video_cards_trident video_cards_tseng video_cards_v4l video_cards_vesa video_cards_vga video_cards_via video_cards_vmware video_cards_voodoo xml xorg zlib" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, PORTAGE_RSYNC_EXTRA_OPTS and Portage 2.1.1 (default-linux/x86/2006.1, gcc-4.1.1, glibc-2.4-r3, 2.6.17-gentoo-r8-ww i686) ================================================================= System uname: 2.6.17-gentoo-r8-ww i686 AMD Athlon(TM) XP 2700+ Gentoo Base System version 1.12.5 Last Sync: Fri, 29 Sep 2006 01:53:01 +0000 ccache version 2.3 [enabled] app-admin/eselect-compiler: [Not Present] dev-java/java-config: [Not Present] dev-lang/python: 2.4.3-r4 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.3 dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r1 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O3 -march=i686 -mtune=athlon-xp -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/splash /etc/terminfo" CXXFLAGS="-O3 -march=i686 -mtune=athlon-xp -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict" GENTOO_MIRRORS="http://mirror.aiya.ru/pub/gentoo http://gentoo.osuosl.org http://mirror.gentoo.no" LANG="ru_RU.UTF-8" LINGUAS="ru en" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 3dnow 3dnowext 7zip X aac acl alsa bash-completion berkdb bitmap-fonts bzip2 cdparanoia chardet cli crypt cups curl dbus directfb divx dlloader dri dvd dvdr dvdread elibc_glibc encode examples fbcon ffmpeg gdbm gif glitz gpm gtk gtk2 gzip hal hardened iconv imlib input_devices_evdev input_devices_keyboard input_devices_mouse isdnlog ithreads jpeg kernel_linux ldap libg++ linguas_en linguas_ru mad matroska md5sum mikmod mmx mmxext mng mozilla mp3 mpeg ncurses nls no-old-linux nptl nptlonly nsplugin nvidia ogg opengl pam pango pcre perl png ppds pppd pyste python quicktime rar readline reflection sdl session spell spl sqlite sse ssl startup-notification svg symlink sysfs tcltk tcpd theora threads thumbnail tk toolbar trayicon truetype truetype-fonts type1-fonts udev unicode usb userland_GNU userlocales v4l v4l2 video_cards_nv video_cards_nvidia video_cards_vesa vorbis win32codecs wma wmf x264 xfce xml xorg xpm xv xvid zip zlib" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS Switch to gcc-3.4.6 on these configuration gives no result.
I forgot to indicate header for the first `emerge --info` output. Here it is: Portage 2.1.1 (default-linux/x86/2006.1/server, gcc-4.1.1, glibc-2.4-r3, 2.6.17-gentoo-r4 i686) =================================================================
$ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g' [a][][] $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [][][] $ emerge --info | grep glibc Portage 2.1.2_pre1-r4 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r4, 2.6.17-gentoo-r8-amd64 i686) Really don't see how is this glibc-2.4 issue.
This gave me the wrong clue, Jacub: $ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g' [a][b][z] $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [g][i][f] $ emerge --info |grep glibc Portage 2.1.1 (hardened/x86/2.6, gcc-3.3.6, glibc-2.3.6-r4, 2.6.11-hardened-r15 i686)
There's no bug here. If you want to match only the uppercase letters of the English alphabet, set LC_ALL=C. If you want to match the uppercase letters of the current locale, use [[:upper:]]. [A-Z] means "uppercase A, uppercase Z, or any of the characters that would be sorted between them in the current locale", and in en_US.UTF-8, that includes the lowercase b through z. echo {A..Z} {a..z} | fmt -w 1 | sort
Ubuntu 6.06LTS: $ locale LANG=ru_RU.UTF-8 LANGUAGE=ru_RU:ru:en_GB:en LC_CTYPE="ru_RU.UTF-8" LC_NUMERIC="ru_RU.UTF-8" LC_TIME="ru_RU.UTF-8" LC_COLLATE="ru_RU.UTF-8" LC_MONETARY="ru_RU.UTF-8" LC_MESSAGES="ru_RU.UTF-8" LC_PAPER="ru_RU.UTF-8" LC_NAME="ru_RU.UTF-8" LC_ADDRESS="ru_RU.UTF-8" LC_TELEPHONE="ru_RU.UTF-8" LC_MEASUREMENT="ru_RU.UTF-8" LC_IDENTIFICATION="ru_RU.UTF-8" LC_ALL= $ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g' [a][b][z] $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [g][i][f] $ sed --version GNU sed версия 4.1.4 Don't you think that this behaviour of sed in Gentoo could lead to numerous mistakes in the scripts written with this syntax in mind?
on Gentoo/Linux $ locale LANG=ru_RU.KOI8-R LC_CTYPE="ru_RU.KOI8-R" LC_NUMERIC="ru_RU.KOI8-R" LC_TIME="ru_RU.KOI8-R" LC_COLLATE="ru_RU.KOI8-R" LC_MONETARY="ru_RU.KOI8-R" LC_MESSAGES="ru_RU.KOI8-R" LC_PAPER="ru_RU.KOI8-R" LC_NAME="ru_RU.KOI8-R" LC_ADDRESS="ru_RU.KOI8-R" LC_TELEPHONE="ru_RU.KOI8-R" LC_MEASUREMENT="ru_RU.KOI8-R" LC_IDENTIFICATION="ru_RU.KOI8-R" LC_ALL= $ echo "[aA][bB][cC]" | sed 's/[A-Z]//g' && sed --version | grep sed [a][][] GNU sed версия 4.1.5 $ emerge --info | grep glibc | grep gcc Portage 2.1.1 (default-linux/x86/2006.1/desktop, gcc-4.1.1, glibc-2.4-r3, 2.6.18.xsuid.bot i686)
Actions on ASCII character ranges should not depend on the locale.
From urxvt launched with LANG="C" wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]/' [b][aA][zZ] wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's[A-Z]//g' [g][i][f] From urxvt launched with LANG="ru_RU.KOI8-R" wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]//' [B][aA][zZ] wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [][][]
(In reply to comment #5) > Don't you think that this behaviour of sed in Gentoo could lead to numerous > mistakes in the scripts written with this syntax in mind? Such scripts are broken and should be fixed -- and they are. (In reply to comment #7) > Actions on ASCII character ranges should not depend on the locale. Yes, they should. This is briefly mentioned in the sed info page, as well as the behaviour required by POSIX.
sorry i missed some symbols in my previous post, with "С" all ok. But from urxvt launched with LANG="ru_RU.KOI8-R" get wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]//g' [][a][] wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g' [][][]
Harald van Dijk is spot on with everything he has said