Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 149526 - sed's misdoings
Summary: sed's misdoings
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-29 08:24 UTC by Igor Golubev
Modified: 2006-09-29 12:25 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Igor Golubev 2006-09-29 08:24:02 UTC
$ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g'
[a][][]

the result must be: [a][b][z]

$ echo "[gG][iI][fF]" | sed 's/[A-Z]//g'
[][][]

the result must be: [g][i][f]

sed gives wrong output on these both configurations:

System uname: 2.6.17-gentoo-r4 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz
Gentoo Base System version 1.12.5
Last Sync: Fri, 29 Sep 2006 12:30:04 +0000
app-admin/eselect-compiler: [Not Present]
dev-java/java-config: [Not Present]
dev-lang/python:     2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.17.50.0.3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r5
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=i686 -fomit-frame-pointer -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -march=i686 -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="ftp://mirror.aiya.ru/pub/gentoo/ ftp://ftp.citkit.ru/pub/Linux/gentoo/"
LANG="en_US.UTF-8"
LC_ALL=""
LDFLAGS="-Wl,-O1,--hash-style=both"
LINGUAS=""
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local /usr/portage/local/layman/toolchain_overlay"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 apache2 berkdb bitmap-fonts cli crypt cups dlloader dri elibc_glibc fortran input_devices_evdev input_devices_keyboard input_devices_mouse isdnlog kernel_linux libg++ mailwrapper mysql ncurses nls nptl nptlonly pam pcre perl ppds pppd python readline reflection session snmp spl ssl truetype truetype-fonts type1-fonts udev unicode userland_GNU vhosts video_cards_apm video_cards_ark video_cards_ati video_cards_chips video_cards_cirrus video_cards_cyrix video_cards_dummy video_cards_fbdev video_cards_glint video_cards_i128 video_cards_i740 video_cards_i810 video_cards_imstt video_cards_mga video_cards_neomagic video_cards_nsc video_cards_nv video_cards_rendition video_cards_s3 video_cards_s3virge video_cards_savage video_cards_siliconmotion video_cards_sis video_cards_sisusb video_cards_tdfx video_cards_tga video_cards_trident video_cards_tseng video_cards_v4l video_cards_vesa video_cards_vga video_cards_via video_cards_vmware video_cards_voodoo xml xorg zlib"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, PORTAGE_RSYNC_EXTRA_OPTS

and

Portage 2.1.1 (default-linux/x86/2006.1, gcc-4.1.1, glibc-2.4-r3, 2.6.17-gentoo-r8-ww i686)
=================================================================
System uname: 2.6.17-gentoo-r8-ww i686 AMD Athlon(TM) XP 2700+
Gentoo Base System version 1.12.5
Last Sync: Fri, 29 Sep 2006 01:53:01 +0000
ccache version 2.3 [enabled]
app-admin/eselect-compiler: [Not Present]
dev-java/java-config: [Not Present]
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.3
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r1
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O3 -march=i686 -mtune=athlon-xp -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/splash /etc/terminfo"
CXXFLAGS="-O3 -march=i686 -mtune=athlon-xp -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict"
GENTOO_MIRRORS="http://mirror.aiya.ru/pub/gentoo http://gentoo.osuosl.org http://mirror.gentoo.no"
LANG="ru_RU.UTF-8"
LINGUAS="ru en"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 3dnow 3dnowext 7zip X aac acl alsa bash-completion berkdb bitmap-fonts bzip2 cdparanoia chardet cli crypt cups curl dbus directfb divx dlloader dri dvd dvdr dvdread elibc_glibc encode examples fbcon ffmpeg gdbm gif glitz gpm gtk gtk2 gzip hal hardened iconv imlib input_devices_evdev input_devices_keyboard input_devices_mouse isdnlog ithreads jpeg kernel_linux ldap libg++ linguas_en linguas_ru mad matroska md5sum mikmod mmx mmxext mng mozilla mp3 mpeg ncurses nls no-old-linux nptl nptlonly nsplugin nvidia ogg opengl pam pango pcre perl png ppds pppd pyste python quicktime rar readline reflection sdl session spell spl sqlite sse ssl startup-notification svg symlink sysfs tcltk tcpd theora threads thumbnail tk toolbar trayicon truetype truetype-fonts type1-fonts udev unicode usb userland_GNU userlocales v4l v4l2 video_cards_nv video_cards_nvidia video_cards_vesa vorbis win32codecs wma wmf x264 xfce xml xorg xpm xv xvid zip zlib"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Switch to gcc-3.4.6 on these configuration gives no result.
Comment 1 Igor Golubev 2006-09-29 08:32:51 UTC
I forgot to indicate header for the first `emerge --info` output. Here it is:

Portage 2.1.1 (default-linux/x86/2006.1/server, gcc-4.1.1, glibc-2.4-r3, 
2.6.17-gentoo-r4 i686)
=================================================================
Comment 2 Jakub Moc (RETIRED) gentoo-dev 2006-09-29 08:33:22 UTC
$ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g'
[a][][]

$ echo "[gG][iI][fF]" | sed 's/[A-Z]//g'
[][][]

$ emerge --info | grep glibc
Portage 2.1.2_pre1-r4 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r4, 2.6.17-gentoo-r8-amd64 i686)

Really don't see how is this glibc-2.4 issue.
Comment 3 Igor Golubev 2006-09-29 08:38:56 UTC
This gave me the wrong clue, Jacub:

$ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g'
[a][b][z]
$ echo "[gG][iI][fF]" | sed 's/[A-Z]//g'
[g][i][f]
$ emerge --info |grep glibc
Portage 2.1.1 (hardened/x86/2.6, gcc-3.3.6, glibc-2.3.6-r4, 2.6.11-hardened-r15 i686)
Comment 4 Harald van Dijk (RETIRED) gentoo-dev 2006-09-29 09:04:12 UTC
There's no bug here. If you want to match only the uppercase letters of the English alphabet, set LC_ALL=C. If you want to match the uppercase letters of the current locale, use [[:upper:]]. [A-Z] means "uppercase A, uppercase Z, or any of the characters that would be sorted between them in the current locale", and in en_US.UTF-8, that includes the lowercase b through z.

echo {A..Z} {a..z} | fmt -w 1 | sort
Comment 5 Igor Golubev 2006-09-29 09:49:22 UTC
Ubuntu 6.06LTS:

$ locale
LANG=ru_RU.UTF-8
LANGUAGE=ru_RU:ru:en_GB:en
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ echo "[aA][bB][zZ]" | sed 's/[A-Z]//g'
[a][b][z]
$ echo "[gG][iI][fF]" | sed 's/[A-Z]//g'
[g][i][f]
$ sed --version
GNU sed версия 4.1.4

Don't you think that this behaviour of sed in Gentoo could lead to numerous mistakes in the scripts written with this syntax in mind?
Comment 6 Sergey Dryabzhinsky 2006-09-29 10:04:23 UTC
on Gentoo/Linux

$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE="ru_RU.KOI8-R"
LC_NUMERIC="ru_RU.KOI8-R"
LC_TIME="ru_RU.KOI8-R"
LC_COLLATE="ru_RU.KOI8-R"
LC_MONETARY="ru_RU.KOI8-R"
LC_MESSAGES="ru_RU.KOI8-R"
LC_PAPER="ru_RU.KOI8-R"
LC_NAME="ru_RU.KOI8-R"
LC_ADDRESS="ru_RU.KOI8-R"
LC_TELEPHONE="ru_RU.KOI8-R"
LC_MEASUREMENT="ru_RU.KOI8-R"
LC_IDENTIFICATION="ru_RU.KOI8-R"
LC_ALL=

$ echo "[aA][bB][cC]" | sed 's/[A-Z]//g' && sed --version | grep sed
[a][][]
GNU sed версия 4.1.5
$ emerge --info | grep glibc | grep gcc
Portage 2.1.1 (default-linux/x86/2006.1/desktop, gcc-4.1.1, glibc-2.4-r3, 2.6.18.xsuid.bot i686)

Comment 7 Sergey Dryabzhinsky 2006-09-29 10:12:17 UTC
Actions on ASCII character ranges should not depend on the locale.
Comment 8 Oleg S. Marin 2006-09-29 10:15:47 UTC
From urxvt launched with LANG="C"
wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]/'
[b][aA][zZ]
wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's[A-Z]//g'
[g][i][f]

From urxvt launched with LANG="ru_RU.KOI8-R"
wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]//'
[B][aA][zZ]
wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g'
[][][]
Comment 9 Harald van Dijk (RETIRED) gentoo-dev 2006-09-29 10:19:24 UTC
(In reply to comment #5)
> Don't you think that this behaviour of sed in Gentoo could lead to numerous
> mistakes in the scripts written with this syntax in mind?

Such scripts are broken and should be fixed -- and they are.

(In reply to comment #7)
> Actions on ASCII character ranges should not depend on the locale.

Yes, they should. This is briefly mentioned in the sed info page, as well as the behaviour required by POSIX.
Comment 10 Oleg S. Marin 2006-09-29 10:40:31 UTC
sorry i missed some symbols in my previous post, with "С" all ok.
But from urxvt launched with LANG="ru_RU.KOI8-R" get

wwolf@terrum ~ $ echo "[bB][aA][zZ]" | sed 's/[A-Z]//g'
[][a][]
wwolf@terrum ~ $ echo "[gG][iI][fF]" | sed 's/[A-Z]//g'
[][][]
Comment 11 SpanKY gentoo-dev 2006-09-29 12:25:07 UTC
Harald van Dijk is spot on with everything he has said