Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 76192 - UTF-8 breaks grep
Summary: UTF-8 breaks grep
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL: http://bugs.debian.org/cgi-bin/bugrep...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-12-30 18:13 UTC by Michael Mauch
Modified: 2005-01-05 21:37 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Mauch 2004-12-30 18:13:49 UTC
Two bugs from Debian's BTS relating to grep and UTF-8, both also seen on Gentoo. The one mentioned in the URL above:

# echo utf breaks grep | LC_ALL=en_US.utf8 grep "[A-Z]"
utf breaks grep
#

And the one from
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=249245&archive=yes>:

# echo Y | LC_ALL=en_US.utf8 egrep -i '[y]'
#                                        



Reproducible: Always
Steps to Reproduce:
1. Set your locale to *.utf8.
2. Use grep to search for plain old 7 bit characters.


Actual Results:  
See Details.

Expected Results:  
The first expression should find nothing, because there are no uppercase
characters in the input.

The second expression should print the "Y", because it's an uppercase "y".

The expected results are printed with locales that are not *.utf8.

grep-2.5.1a from the GNU mirrors does not fix these problems (I tested that).

<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=249245&archive=yes> apparently
was fixed by the Debian guys.
Comment 1 Michael Mauch 2004-12-31 01:10:27 UTC
Both of these problems are fixed in the CVS version:

# echo utf breaks grep | LC_ALL=en_US.utf8 ./grep "[A-Z]"
# echo Y | LC_ALL=en_US.utf8 ./egrep -i '[y]'
Y
#
Comment 2 Stian Skjelstad 2004-12-31 13:24:10 UTC
Is this releated to the sed bugs I have seen refered where people are seeing odd results when using estonian (I remember right) locals and [a-z] ranges ?
Comment 3 Ciaran McCreesh 2004-12-31 14:35:52 UTC
Comment #2 -- no, not really.
Comment 4 SpanKY gentoo-dev 2005-01-05 21:37:17 UTC
added deb patch to 2.5.1-r7