Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 232413 - sed has problems with some upper limits in regex-ranges
Summary: sed has problems with some upper limits in regex-ranges
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-20 07:23 UTC by Guenther Brunthaler
Modified: 2008-07-20 07:58 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Guenther Brunthaler 2008-07-20 07:23:08 UTC
When a sed regular expression contains a character range where the upper limit is specified as an base-8 escape sequence, sed rejects this when running in a UTF-8 locale - even for 7-bit values.

Reproducible: Always

Steps to Reproduce:
1.Run the demonstration shell script from below
2.
3.

Actual Results:  
No problem (no octal escapes):


No problem (non-UTF-8 locale):

Problem: (UTF-8 locale)
sed: -e expression #1, char 10: Invalid range end


Expected Results:  
No problem (no octal escapes):


No problem (non-UTF-8 locale):

Problem: (UTF-8 locale)


Here are the contents of the demonstration script:

---cut here---
#! /bin/sh
echo 'No problem (no octal escapes):'
echo | LC_ALL=en_US sed -e '/[A-B]/b'
echo | LC_ALL=en_US.utf8 sed -e '/[A-B]/b'
echo 'No problem (non-UTF-8 locale):'
echo | LC_ALL=en_US sed -e '/[A-\102]/b'
echo 'Problem: (UTF-8 locale)'
echo | LC_ALL=en_US.utf8 sed -e '/[A-\102]/b'
---cut here---

Note that B and \102 should be the same, and both should have the same internal representation in ASCII as well as UTF-8: It's a 7-bit value only.

Workaround: Override the locale sed is run in. But that's clearly a kludge and not a definitive solution.
Comment 1 Guenther Brunthaler 2008-07-20 07:36:24 UTC
Oops - it seems the Problem is not just related to UTF-8.

sed seems to have a problem with upper range limits also in non-UTF-8 locales if the upper limit is a space character:

$ echo | LC_ALL=C sed -e '/[\001- ]/b'
sed: -e expression #1, char 10: Invalid range end
Comment 2 Guenther Brunthaler 2008-07-20 07:40:52 UTC
I cannot believe those bugs (if they are?) have not been found so far?

It's not that "sed" is a rarely-used program.

BTW, I'm using sed-4.1.5-r1.

Portage 2.1.4.4 (default/linux/x86/2008.0/desktop, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r8-xquad-20080710 i686)
=================================================================
System uname: 2.6.24-gentoo-r8-xquad-20080710 i686 AMD Phenom(tm) 9600 Quad-Core Processor
Timestamp of tree: Sat, 19 Jul 2008 05:45:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.4.4-r13, 2.5.2-r5
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r2
sys-devel/automake:  1.4_p6, 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -DNDEBUG -pipe -fno-stack-check"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/local/etc /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/host-variants/ /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-march=k8 -O2 -DNDEBUG -pipe -fno-stack-check"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--nospinner"
FEATURES="ccache distlocks metadata-transfer notitles parallel-fetch prelink sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox"
GENTOO_MIRRORS="/usr/local/portage/distfiles/precious /usr/local/portage/distfiles /usr/local/portage/distfiles/local /usr/local/portage/distfiles/mnt ftp://130.208.16.31/pub/gentoo/ http://140.127.177.17/pub/Linux/Gentoo ftp://140.127.177.17/pub/Linux/Gentoo http://gentoo.mirrors.easynews.com/linux/gentoo/ ftp://140.127.177.15/pub/Linux/Gentoo http://140.127.177.15/pub/Linux/Gentoo http://ftp.udc.es/gentoo/ http://mirrors.64hosting.com/pub/mirrors/gentoo/ http://gentoo.netnitco.net ftp://mirrors.64hosting.com/pub/mirrors/gentoo/"
LANG="en"
LC_ALL="C"
LDFLAGS="-Wl,-O1"
LINGUAS="de"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/rox /usr/local/portage"
SYNC="rsync://rsync.de.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext X a52 aac aalib acpi alsa apache2 arts audiofile avahi bash-completion berkdb bluetooth branding bzip2 cairo caps cddb cdr cli cracklib crypt css cups curl custom-cflags dbus directfb dri dts dv dvd dvdr dvdread ecc emboss encode evo exif expat fbcon ffmpeg fftw firefox flac foomaticdb fortran freetype ftp fuse gd gdbm gif gimp glut gmp gphoto2 gpm gstreamer gtk gtk2 hal iconv idea ieee1394 imagemagick imlib isdnlog jack java6 javascript jbig jikes jp2 jpeg jpeg2k kde kdeenablefinal kdehiddenvisibility kdexdeltas kipi lcms ldap libcaca libclamav libnotify logrotate lzo mad matroska midi mikmod mmx mmxext mng mp3 mpeg mudflap mule musepack musicbrainz ncurses nls nptl nptlonly nsplugin oav odbc offensive ofx ogg openal opengl openmp pam pcre pdf perl pic png ppds pppd python qt qt3 qt3support qt4 quicktime readline reflection samba sasl screen sdl session sharedmem slang smartcard sndfile sox speex spell spl sqlite sse sse2 sse3 sse4a ssl startup-notification svg tcltk tetex theora threads tiff tk truetype unicode usb userlocales utf8 vcd vorbis wxwindows x264 x86 xft xml xorg xosd xpm xscreensaver xsl xv xvid xvmc zeroconf zlib" ALSA_CARDS="emu10k1" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse joystick" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" USERLAND="GNU" VIDEO_CARDS="fbdev glint i810 mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa vga via vmware voodoo"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 3 Guenther Brunthaler 2008-07-20 07:45:26 UTC
(In reply to comment #1)
> sed seems to have a problem with upper range limits also in non-UTF-8 locales
> if the upper limit is a space character:
> 
> $ echo | LC_ALL=C sed -e '/[\001- ]/b'
> sed: -e expression #1, char 10: Invalid range end

The same expression *works*, if " " is replaced by \040:

$ echo | LC_ALL=C sed -e '/[\001-\040]/b'
Comment 4 Guenther Brunthaler 2008-07-20 07:58:08 UTC
OK - I found the source of the problem myself: According to "IEEE Std 1003.1, 2003 Edition", BRE's don't have to recognize octal escapes at all!

So, SORRY guys; seem's I've been using too much Perl... closing bug ;-)