Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 17051 - estonian locale breaks some sed expressions ([A-Za-z])
Summary: estonian locale breaks some sed expressions ([A-Za-z])
Status: RESOLVED DUPLICATE of bug 9901
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: x86 Linux
: High major (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
: 17150 18697 19686 19687 23371 24348 24461 26627 27131 27132 27232 28248 30935 38418 74822 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-03-07 18:31 UTC by Veiko Kukk
Modified: 2005-07-17 13:06 UTC (History)
12 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
dialog-0.9_beta20020519.ebuild (dialog-0.9_beta20020519.ebuild,785 bytes, application/octet-stream)
2003-03-07 19:46 UTC, Veiko Kukk
Details
(portage-2.0.49-r15_locale-fix.diff) (portage-2.0.49-r15_locale-fix.diff,1.45 KB, patch)
2003-11-20 05:29 UTC, bartron
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Veiko Kukk 2003-03-07 18:31:33 UTC
When i try to emerge dialog I get following error:

Calculating dependencies ...done!
>>> emerge (1 of 1) dev-util/dialog-0.9_beta20020519 to /
>>> md5 ;-) dialog_0.9b-20020519.orig.tar.gz
>>> Unpacking source...
>>> Unpacking dialog_0.9b-20020519.orig.tar.gz to
/var/tmp/portage/dialog-0.9_beta20020519/work
>>> Source unpacked.
configure: error: ncurses: invalid package name

!!! ERROR: dev-util/dialog-0.9_beta20020519 failed.
!!! Function econf, Line 262, Exitcode 1
!!! econf failed




Reproducible: Always
Steps to Reproduce:
1.emerge dialog
Actual Results:  
failed to build dialog

Expected Results:  
should build

Portage 2.0.47-r8 (default-x86-1.4, gcc-3.2.1, glibc-2.3.1-r2)
=================================================================
System uname: 2.4.19-gentoo-r10 i686 AMD Duron(tm) processor
GENTOO_MIRRORS="ftp://ftp.gentoo.linux.no/pub/gentoo
http://www.ibiblio.org/pub/Linux/distributions/gentoo"
CONFIG_PROTECT="/etc /var/qmail/control /usr/kde/2/share/config
/usr/kde/3/share/config /usr/X11R6/lib/X11/xkb /usr/kde/3.1/share/config
/usr/share/config"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
PORTDIR="/usr/portage"
DISTDIR="/usr/portage/distfiles"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR_OVERLAY=""
USE="x86 oss 3dnow apm avi crypt cups encode gif jpeg kde libg++ mikmod mmx mpeg
ncurses nls pdflib png quicktime truetype xml2 xmms xv zlib gdbm berkdb slang
readline guile X sdl gpm tcpd pam libwww ssl python imlib oggvorbis gtk qt motif
opengl gnome -alsa -arts -svga -spell apache2 cdr esd flash imap java lcms ldap
mbox mozilla mysql perl ruby samba sasl sse tcltk tiff"
COMPILER="gcc3"
CHOST="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -frerun-cse-after-loop
-frerun-loop-opt -fexpensive-optimizations"
CXXFLAGS="-march=athlon-xp -O3 -pipe -fomit-frame-pointer -frerun-cse-after-loop
-frerun-loop-opt -fexpensive-optimizations -Wno-deprecated"
ACCEPT_KEYWORDS="x86"
MAKEOPTS="-j2"
AUTOCLEAN="yes"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
FEATURES="sandbox ccache"
Comment 1 Martin Holzer (RETIRED) gentoo-dev 2003-03-07 18:37:59 UTC
could you please attach the ebuild 
Comment 2 Veiko Kukk 2003-03-07 19:46:20 UTC
Created attachment 9118 [details]
dialog-0.9_beta20020519.ebuild

ebuild of dialog that fails to build
Comment 3 Brandon Low (RETIRED) gentoo-dev 2003-03-25 01:05:34 UTC
I just committed a new version from debian, please try that.. .otherwise please remerge ncurses and try again.

Thanks.
Comment 4 Veiko Kukk 2003-04-02 10:16:55 UTC
As I found out, this error is caused by sed 4.x. Since using supersed I have had no problems building any ebuild. 
Comment 5 Brandon Low (RETIRED) gentoo-dev 2003-04-17 22:44:18 UTC
Strange, sed-4.x is based on the supersed code... can you please double test with sed-4.0.6 or later... if still broken at all, reopen bug.
Comment 6 Veiko Kukk 2003-04-18 18:54:37 UTC
retested with sed-4.0.7 :

emerge dialog -u
Calculating dependencies ...done!
>>> emerge (1 of 1) dev-util/dialog-0.9_beta20030308 to /
>>> md5 ;-) dialog_0.9b-20020814.orig.tar.gz
>>> Unpacking source...
>>> Unpacking dialog_0.9b-20020814.orig.tar.gz to /var/tmp/portage/dialog-0.9_beta20030308/work
>>> Source unpacked.
configure: error: ncurses: invalid package name

!!! ERROR: dev-util/dialog-0.9_beta20030308 failed.
!!! Function econf, Line 273, Exitcode 1
!!! econf failed
Comment 7 Brandon Low (RETIRED) gentoo-dev 2003-04-18 19:45:18 UTC
I just committed a fixed 20030308-r1 that actually installs the right version, sorry about that.

If that one doesn't work then we still have a sed problem or something.
Comment 8 Veiko Kukk 2003-04-20 14:06:37 UTC
with sed-4.0.7

emerge dialog
Calculating dependencies ...done!
>>> emerge (1 of 1) dev-util/dialog-0.9_beta20030308-r1 to /
>>> md5 ;-) dialog_0.9b-20030308.orig.tar.gz
>>> Unpacking source...
>>> Unpacking dialog_0.9b-20030308.orig.tar.gz to /var/tmp/portage/dialog-0.9_beta20030308-r1/work
>>> Source unpacked.
configure: error: ncurses: invalid package name

!!! ERROR: dev-util/dialog-0.9_beta20030308-r1 failed.
!!! Function econf, Line 273, Exitcode 1
!!! econf failed
Comment 9 Brandon Low (RETIRED) gentoo-dev 2003-04-20 18:58:11 UTC
Because it's still broken
Comment 10 Brandon Low (RETIRED) gentoo-dev 2003-04-20 18:58:20 UTC
*** Bug 19687 has been marked as a duplicate of this bug. ***
Comment 11 Brandon Low (RETIRED) gentoo-dev 2003-04-22 19:42:39 UTC
I'm kinda stumped about this, because this stuff works for me... :-\
Comment 12 Veiko Kukk 2003-06-19 02:35:35 UTC
could it depend on language and locale settings?
Comment 13 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-06-23 19:41:23 UTC
Veiko: That is a possibility, could you please try
LC=C emerge mysql

(or any of the other packages that were failing for you)
Comment 14 Veiko Kukk 2003-06-24 03:24:33 UTC
>Veiko: That is a possibility, could you please try
>LC=C emerge mysql

I think this mysql problem is different, not sed related. "LC=C emerge mysql" gives the same error message as before.
Comment 15 Veiko Kukk 2003-06-24 10:53:08 UTC
i have same problem with expect, tried "LC=C emerge expect":

# LC=C emerge expect
Calculating dependencies ...done!
>>> emerge (1 of 1) dev-tcltk/expect-5.37.1-r1 to /
>>> md5 src_uri ;-) expect-5.37.1.tar.gz
>>> Unpacking source...
>>> Unpacking expect-5.37.1.tar.gz to /var/tmp/portage/expect-5.37.1-r1/work
>>> Source unpacked.
X
configure: error: tcl: invalid package name

!!! ERROR: dev-tcltk/expect-5.37.1-r1 failed.
!!! Function econf, Line 304, Exitcode 1
!!! econf failed
Comment 16 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-06-24 12:59:22 UTC
Veiko: could you please try to run this?
ac_option="--with-ncurses=/usr"
ac_package=`echo $ac_option|sed -e 's/-*with-//' -e 's/=.*//'`
echo $ac_package
echo $ac_package | sed 's/[-_a-zA-Z0-9]//g'

and show me all of the output?
Comment 17 Veiko Kukk 2003-06-25 02:09:00 UTC
variable portage # ac_option="--with-ncurses=/usr"
variable portage # ac_package=`echo $ac_option|sed -e 's/-*with-//' -e 's/=.*//'`
variable portage # echo $ac_package
ncurses
variable portage # echo $ac_package | sed 's/[-_a-zA-Z0-9]//g'
u
Comment 18 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-06-25 02:34:23 UTC
Veiko: that is definetly the incorrect output
here is what it should be :
=== CUT ===
bash-2.05b$  ac_option="--with-ncurses=/usr"
bash-2.05b$ ac_package=`echo $ac_option|sed -e 's/-*with-//' -e 's/=.*//'`
bash-2.05b$ echo $ac_package
ncurses
bash-2.05b$ echo $ac_package | sed 's/[-_a-zA-Z0-9]//g'

bash-2.05b$
=== CUT ===
(bash isn't my usual shell)

Could you please attach the output of 'set' ?
Comment 19 Veiko Kukk 2003-06-25 03:31:23 UTC
variable root # set
BASH=/bin/bash
BASH_VERSINFO=([0]="2" [1]="05b" [2]="0" [3]="1" [4]="release" [5]="i686-pc-linux-gnu")
BASH_VERSION='2.05b.0(1)-release'
CC=gcc
CLASSPATH=/opt/blackdown-jdk-1.4.1/jre/lib/rt.jar:.
COLUMNS=139
CONFIG_PROTECT='/usr/X11R6/lib/X11/xkb /usr/kde/3.1/share/config /usr/share/config'
CONFIG_PROTECT_MASK=/etc/gconf
CVS_RSH=ssh
CXX=g++
DIRSTACK=()
DISPLAY=ws002:0.0
EDITOR=/bin/nano
EUID=0
GDK_USE_XFT=1
GROUPS=()
G_BROKEN_FILENAMES=1
HISTFILE=/root/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/root
HOSTNAME=variable.dyn.ee
HOSTTYPE=i686
IFS=$' \t\n'
INFODIR=/usr/share/info:/usr/X11R6/info
INFOPATH=/usr/share/info:/usr/share/gcc-data/i686-pc-linux-gnu/3.3/info
INPUTRC=/etc/inputrc
JAVAC=/opt/blackdown-jdk-1.4.1/bin/javac
JAVA_HOME=/opt/blackdown-jdk-1.4.1
JDK_HOME=/opt/blackdown-jdk-1.4.1
KDEDIR=/usr/kde/3.1
KDEDIRS=/usr
LANG=et_EE
LC_ALL=et_EE
LESS=-R
LESSOPEN='|lesspipe.sh %s'
LINES=51
LOGNAME=root
MACHTYPE=i686-pc-linux-gnu
MAIL=/root/Mailbox
MAILCHECK=60
MANPATH=/usr/share/man:/usr/local/share/man:/usr/share/gcc-data/i686-pc-linux-gnu/3.3/man:/usr/X11R6/man:/opt/blackdown-jdk-1.4.1/man
MOZILLA_FIVE_HOME=/usr/lib/mozilla
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PAGER=/usr/bin/less
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.3:/opt/Acrobat5:/usr/X11R6/bin:/opt/blackdown-jdk-1.4.1/bin:/opt/blackdown-jdk-1.4.1/jre/bin:/usr/qt/3/bin:/usr/kde/3.1/sbin:/usr/kde/3.1/bin
PIPESTATUS=([0]="0")
PPID=22560
PS1='\[\033[01;31m\]\h \[\033[01;34m\]\W \$ \[\033[00m\]'
PS2='> '
PS4='+ '
PWD=/root
QMAKESPEC=linux-g++
QTDIR=/usr/qt/3
SHELL=/bin/bash
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
SHLVL=1
TERM=xterm
UID=0
USER=root
XAUTHORITY=/root/.xauthVPNf82
XINITRC=/etc/X11/xinit/xinitrc
_=EDITOR
Comment 20 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-06-25 10:36:01 UTC
It is a locale bug!

Test script to reproduce it:
== CUT ==
#!/bin/bash
TESTLOCALES="et_EE C"

for TESTLOCALE in $TESTLOCALES; do
echo "Testing Locale ${TESTLOCALE}"
SEDEXPR="LC_ALL=${TESTLOCALE} sed 's/[-_a-zA-Z0-9]//g'"
echo -n "[0-9]:" ; echo -- "0123456789" | eval $SEDEXPR
echo -n "[a-z]:" ; echo -- "abcdefghijklmnopqrstuvwxyz" | eval $SEDEXPR
echo -n "[A-Z]:" ; echo -- "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | eval $SEDEXPR
echo -n "-:" ; echo -- "-" | eval $SEDEXPR
done;
== CUT ==

== Output ==
Testing Locale et_EE
[0-9]:
[a-z]: tuvwxy
[A-Z]: TUVWXY
-:
Testing Locale C
[0-9]:
[a-z]:
[A-Z]:
-:
== Output ==
Comment 21 Veiko Kukk 2003-06-25 11:30:47 UTC
>It is a locale bug!

so.. it means that sed-4.x is buggy?
Comment 22 Veiko Kukk 2003-06-25 11:42:47 UTC
answering to my previous question - YES. tested with supersed:

Testing Locale et_EE
[0-9]:
[a-z]:
[A-Z]:
-:
Testing Locale C
[0-9]:
[a-z]:
[A-Z]:
-:
Comment 23 Veiko Kukk 2003-06-30 06:48:34 UTC
are you still alive? i think sed-4.x shold be removed from portage because it's buggy.
Comment 24 Brandon Low (RETIRED) gentoo-dev 2003-06-30 07:29:28 UTC
uhm NO.  sed-4 is depended upon by dozens of packages in portage for the -i feature among others, this is only a bug with certain locales, and will hopefully be resolved by future revisions of sed, I am forwarding this bug to the sed developers to ensure that they are aware of it.
Comment 25 Brandon Low (RETIRED) gentoo-dev 2003-07-01 12:22:35 UTC
Well it sorta looks like this is a gentoo specific internationalization issue:
> Hi, I'm the package maintainer for sed on Gentoo Linux, and am writing        
> to raise an issue that has come to light WRT sed-4.x's                        
> internationalization support.  Please refer to                                
> http://bugs.gentoo.org/show_bug.cgi?id=17051 for more information (we         
> didn't get around to realizing it was a locale bug until the bottom of        
> the bug page there, were hoping it was another issue in the Gentoo            
> package instead.)  Thanks,                                                    
>                                                                               
> Brandon Low                                                                   
> Gentoo Developer                                                              

Don't know if it means much to you, but I tried the test given at the
bottom of the web page you provided (above), and I cannot reproduce
those results with Cygwin and bash. I get the same results regardless
of whether LC_ALL is set to "et_EE" or to "C": all characters in the
string are deleted, as expected.

   Cygwin v1.3.22 running under Windows 2000 Pro
   bash shell
   GNU sed v4.0.5

---BEGIN CONSOLE OUTPUT---
epement@sw218-et04 ~
$ uname -a
CYGWIN_NT-5.0 sw218-et04 1.3.22(0.78/3/2) 2003-03-18 09:20 i686 unknown unknown Cygwin

epement@sw218-et04 ~
$ sed --version
GNU sed version 4.0.5
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.

---END CONSOLE OUTPUT-----

Likewise, setting LC or LC_ALL has no effect on the behavior of sed or
ssed under Windows 2000 Pro using 4NT.EXE or CMD.EXE as the shell.

--
Eric Pement - eric.pement@moody.edu
Comment 26 Veiko Kukk 2003-07-04 08:39:21 UTC
it's really serious bug anyway. Is it marked stable in portage tree?
Comment 27 Brandon Low (RETIRED) gentoo-dev 2003-07-06 17:11:16 UTC
Azarah:  You do the glibc ebuild, any ideas about this?

yes, this is a serious bug, but it is important for us to have sed-4.x stable for functionality reasons, additionally this is only a serious bug on certain locales and we didn't even know what caused the bug until now, for that matter it is now likely that it is not a problem with our sed ebuild at all.
Comment 28 Veiko Kukk 2003-07-12 11:43:43 UTC
> it is now likely that it is not 
>a problem with our sed ebuild at all.

but what then? i understand that for some people it's really hard to unerstand that there are more languages and countries than english and usa, but something must be done with this bug.
Comment 29 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-07-12 12:38:23 UTC
The bug appears to be a glitch in the locale file for et_EE.
/usr/share/i18n/locales/et_EE

This file belongs to glibc and _not_ Sed. The fact that supersed worked was because it seemed to ignore locales.

I really can't make much sense of how the format works.
It is somewhere in the LC_COLLATE block I think.

Mandrake had this problem last year:
http://mail.gnu.org/archive/html/autoconf/2002-07/msg00124.html

A few possible solutions:
1. Fix the locale - probably quite hard to solve
2. force LC_COLLATE=C when emerging
Comment 30 Veiko Kukk 2003-07-13 09:06:05 UTC
>Mandrake had this problem last year:
>http://mail.gnu.org/archive/html/autoconf/2002-07/msg00124.html

I had exact same problem under gentoo with 'file' and 'rp-pppoe'. But I can't remember if it was with sed 4.x or 3.x.
Comment 31 Veiko Kukk 2003-08-22 07:18:08 UTC
>echo -n "[a-z]:" ; echo -- "abcdefghijklmnopqrstuvwxyz" | eval $SEDEXPR
>echo -n "[A-Z]:" ; echo -- "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | eval $SEDEXPR

"ABCDEFGHIJKLMNOPQRSTUVWXYZ" is not Estonian alphabet. Estonian alphabet is 
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss 
Comment 32 Veiko Kukk 2003-08-22 07:18:08 UTC
>echo -n "[a-z]:" ; echo -- "abcdefghijklmnopqrstuvwxyz" | eval $SEDEXPR
>echo -n "[A-Z]:" ; echo -- "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | eval $SEDEXPR

"ABCDEFGHIJKLMNOPQRSTUVWXYZ" is not Estonian alphabet. Estonian alphabet is 
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss  Zz  Tt Uu Vv Ww Õõ Ää Öö Üü Xx Yy

To see it correctly, use ISO-8859-15 codepage.
Comment 33 SpanKY gentoo-dev 2003-08-22 12:13:34 UTC
*** Bug 27132 has been marked as a duplicate of this bug. ***
Comment 34 SpanKY gentoo-dev 2003-08-22 12:14:41 UTC
*** Bug 17150 has been marked as a duplicate of this bug. ***
Comment 35 SpanKY gentoo-dev 2003-08-22 12:15:29 UTC
*** Bug 18697 has been marked as a duplicate of this bug. ***
Comment 36 SpanKY gentoo-dev 2003-08-22 12:15:57 UTC
*** Bug 19686 has been marked as a duplicate of this bug. ***
Comment 37 SpanKY gentoo-dev 2003-08-22 12:16:15 UTC
*** Bug 23371 has been marked as a duplicate of this bug. ***
Comment 38 SpanKY gentoo-dev 2003-08-22 12:16:34 UTC
*** Bug 24348 has been marked as a duplicate of this bug. ***
Comment 39 SpanKY gentoo-dev 2003-08-22 12:16:57 UTC
*** Bug 24461 has been marked as a duplicate of this bug. ***
Comment 40 SpanKY gentoo-dev 2003-08-22 12:19:51 UTC
*** Bug 27131 has been marked as a duplicate of this bug. ***
Comment 41 Martin Schlemmer (RETIRED) gentoo-dev 2003-08-22 13:33:21 UTC
Sorry, this got drowned with my mail/inet/too_many_bugs problems.

What about just building sed without locale support ?
Comment 42 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-08-22 17:25:42 UTC
Veiko: from your message
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" is not Estonian alphabet. Estonian alphabet is 
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss 
Comment 43 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-08-22 17:25:42 UTC
Veiko: from your message
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" is not Estonian alphabet. Estonian alphabet is 
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss  Zz  Tt Uu Vv Ww Õõ
Ää Öö Üü Xx Yy"

note that all of the letters A-Z I included _are_ in your alphabet. I suspect it is either the locale data or the regex routines that expand [A-Za-z] that are at fault.

If the order is as you note it, then expanding A-Z would show exactly that TUVWXY are excluded, because 'A-Z' expands to ABCDEFGHIJKLMNOPQRSZ.

I don't believe skipping locale support in sed is a suitable option (I use it myself sometimes). Either configure should be run with LC_ALL=C, or the locale data should be fixed.
Comment 44 SpanKY gentoo-dev 2003-08-24 14:16:27 UTC
*** Bug 27232 has been marked as a duplicate of this bug. ***
Comment 45 Veiko Kukk 2003-08-25 10:11:49 UTC
> If the order is as you note it

Yes, it is.

> then expanding A-Z would show exactly that TUVWXY
> are excluded, because 'A-Z' expands to ABCDEFGHIJKLMNOPQRSZ.

That was what I wanted to point out. This means that this test script doesn't prove that sed is buggy. The script itself is buggy, because it's not written for other languages than english in mind. I'm not a programmer, so I'm unable to show you the right solution but I belive that it's possible to write sed scripts that work with all locales. I belive that neither sed nor glibc locale file are buggy, but those scripts that fail in case of et_EE locale are buggy.
Comment 46 SpanKY gentoo-dev 2003-09-05 13:37:46 UTC
*** Bug 26627 has been marked as a duplicate of this bug. ***
Comment 47 bartron 2003-09-06 18:00:14 UTC
  I'd rather say this is dialog's configure script's fault.  
`configure' scripts created by more recent versions of  
autoconf try very hard to get `LC_COLLATE' set to "C" if 
either `LC_COLLATE' or any of the higher-level variables  
are set to anything else just to avoid this kind of  
situation. 
 
  Even though this seems scary to me and doesn't really make  
sense...character ranges depending on `LC_COLLATE' are actually  
*expected* behavior. Didn't find it mentioned in (gnu-)sed and  
regex docs, but the grep manpage [1] describes this exact  
behavior, as do the standards ([2],[3],[4],[5]): 
 
--- begin quote [3] --- 
 
    Range expressions must not be used in portable applications  
    because their behaviour is dependent on the collating sequence.  
    Ranges will be treated according to the current collating  
    sequence, and include such characters that fall within the  
    range based on that collating sequence, regardless of character  
    values. This, however, means that the interpretation will differ  
    depending on collating sequence. [...] 
 
--- end quote --- 
 
 
  So, if `LC_COLLATE' is set (either directly or implicitly  
via LC_ALL, etc), sed-4, when building character ranges,  
assumes characters in the same order as strcoll() would  
sort them under this locale. For example, while in "C" order  
is "-0..9A..Z_a..z", "de_DE" and "en_US" turn that into  
"_-0..9aAbB..zZ", so '[a-c]' really means '[aAbBc]'  
(non-ASCII chars omitted). 
 
  In Estonian, sorting is "_-0..9AaBbCc...", the same  
'[a-c]' is equivalent to '[aBbCc]'. 
Also, as mentioned above, "et_EE" sorts 'z' between 's'  
and 't', which means '[A-Z]' does *not* include '[TUVWXY]'. 
 
  The cleanest fix I could think of would be to integrate 
the same `LC_*' logic that autoconf uses into portage. 
That would catch not only broken configure scripts  
(dialog, tiff, telnet-bsd, ...), but also builds that 
do not use autoconf. 
 
 
References: 
 
[1] grep-2.5 manpage, section [REGULAR EXPRESSIONS] 
 
[2] The Single UNIX[R] Specification, Version 2 
    Shell & Utilities -> Utilities -> Sed 
    <http://www.opengroup.org/onlinepubs/7908799/xcu/sed.html> 
    Section [ENVIRONMENT VALRIABLES] 
    [used to be for registered users only, but somehow slipped 
    into google] 
 
[3] The Single UNIX[R] Specification, Version 2 
    Base Definitions -> Regular Expressions 
    <http://www.opengroup.org/onlinepubs/7908799/xbd/re.html> 
    Section [RE Bracket Expressions] 
    [used to be for registered users only, but somehow made  
    its way into google as well] 
 
[4] IEEE 1003.1, 2003 Edition, <www.unix.org/> 
    Shell & Utilities -> Utilities -> Sed 
    Section [ENVIRONMENT VARIABLES] 
    [more recent version of [2], available online for free but  
    requires registration] 
 
[5] IEEE 1003.1, 2003 Edition, <www.unix.org/> 
    Base Definitions -> Regular Expressions 
    Section [RE Bracket Expression] 
    [more recent version of [3], available online for free but  
    requires registration] 
 
 
Comment 48 SpanKY gentoo-dev 2003-09-09 08:10:12 UTC
*** Bug 28248 has been marked as a duplicate of this bug. ***
Comment 49 SpanKY gentoo-dev 2003-10-25 05:00:52 UTC
*** Bug 30935 has been marked as a duplicate of this bug. ***
Comment 50 Veiko Kukk 2003-11-16 09:13:33 UTC
This bug and the fact that nothing inficates gentoo developers interest to
solve the problem which in fact is a bug reminds me: http://www.msxnet.org/humour/world-according-to-america.png
Comment 51 SpanKY gentoo-dev 2003-11-16 09:21:01 UTC
well i'm glad you live with the dragons because it means we can ignore you
further

at least until civilization comes to your country ... remind us then please
Comment 52 bartron 2003-11-20 05:29:22 UTC
Created attachment 20986 [details, diff]
(portage-2.0.49-r15_locale-fix.diff)

  suggested patch

  Verified to work correctly with portage-2.0.48-r5 and 2.0.49-r15 --
Comment 53 bartron 2003-11-20 05:30:21 UTC
  -- but only applies cleanly to 2.0.49-r15. For all other versions, 
just put this at the top of `/usr/lib/portage/bin/ebuild.sh'

---SNIP---
if [ "${LC_ALL+set}" = set ]; then
    if [ -z "$LC_MESSAGES" ]; then LC_MESSAGES=$LC_ALL; fi
    if [ -z "$LC_MONETARY" ]; then LC_MONETARY=$LC_ALL; fi
    if [ -z "$LC_TIME" ]; then LC_TIME=$LC_ALL; fi
    unset LC_ALL
    export LC_MESSAGES LC_MONETARY LC_TIME
fi
LC_COLLATE="C"
LC_CTYPE="C"
LC_NUMERIC="C"
export LC_COLLATE LC_CTYPE LC_NUMERIC
---SNIP---
Comment 54 Seemant Kulleen (RETIRED) gentoo-dev 2004-01-16 02:06:26 UTC
mind if I take this one?  I've been helping the estonian devs to fix these issues
Comment 55 Seemant Kulleen (RETIRED) gentoo-dev 2004-01-16 03:44:11 UTC
basically, as I stated in the bug about wxGTK:

in sed, egrep and tr expressions:

a-zA-Z0-9 -> [:alnum:]
a-zA-Z -> [:alpha:]
a-z -> [:lower:]
A-Z -> [:upper:]
0-0 -> [:digit:]

in the configure scripts.  I'll fix dialog, but I'd like to get a list of all packages which break so that patches can be made and sent upstream to implement proper regexes
Comment 56 Seemant Kulleen (RETIRED) gentoo-dev 2004-01-16 04:06:13 UTC
veiko: please emerge sync and try 20031002 again please (with estonian locale exported)
Comment 57 Marius Mauch (RETIRED) gentoo-dev 2004-01-16 12:38:18 UTC
*** Bug 38418 has been marked as a duplicate of this bug. ***
Comment 58 Mati K 2004-04-05 11:16:17 UTC
reporting other broken builds as 2004.0 and as current date:
vim-core
automake (installs to /--prefix/ root directory, nice)
usbutils (to same location as previous)
media-sound/oggtst (same)
mozilla-1.6-r1
Comment 59 Mr. Bones. (RETIRED) gentoo-dev 2004-05-24 16:27:26 UTC
Ok, the way I understand this bug is that people have coded up bad sed that
assumes that english is the language in use.  When the badly written sed
scripts are run under a different locale, bad things happen.  Is that pretty
accurate?  I'm leaning toward closing this bug and requesting that individual
bugs be filed for problematic packages.
Comment 60 Seemant Kulleen (RETIRED) gentoo-dev 2004-05-24 23:05:29 UTC
that may well be a better solution to fix all the individual sed scripts (though I'm not sure at the moment the best way to do so, because [[:alpha:]] and [[:alnum:]] are problematic (so sayeth Thomas Dickey)).  The one sure way of doing this would be to enforce the relevant LC_something before portage enters the src_compile step.
Comment 61 Aron Griffis (RETIRED) gentoo-dev 2004-05-25 12:21:39 UTC
Comment 58 describes my favorite solution.  Personally I think that portage should always enforce the C locale so that we have predictable regexes, error output, etc.  Is there any (real) reason that portage shouldn't do this for all of its operations, instead of picking and choosing?
Comment 62 SpanKY gentoo-dev 2004-05-26 19:13:28 UTC
*** Bug 38418 has been marked as a duplicate of this bug. ***
Comment 63 Seemant Kulleen (RETIRED) gentoo-dev 2004-05-31 18:03:42 UTC

*** This bug has been marked as a duplicate of 9901 ***
Comment 64 SpanKY gentoo-dev 2004-12-22 06:20:40 UTC
*** Bug 74822 has been marked as a duplicate of this bug. ***