76192 – UTF-8 breaks grep

Bug 76192 - UTF-8 breaks grep

Summary: UTF-8 breaks grep

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Gentoo's Team for Core System packages

URL:	http://bugs.debian.org/cgi-bin/bugrep...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2004-12-30 18:13 UTC by Michael Mauch
Modified:	2005-01-05 21:37 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Mauch 2004-12-30 18:13:49 UTC

Two bugs from Debian's BTS relating to grep and UTF-8, both also seen on Gentoo. The one mentioned in the URL above:

# echo utf breaks grep | LC_ALL=en_US.utf8 grep "[A-Z]"
utf breaks grep
#

And the one from
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=249245&archive=yes>:

# echo Y | LC_ALL=en_US.utf8 egrep -i '[y]'
#                                        



Reproducible: Always
Steps to Reproduce:
1. Set your locale to *.utf8.
2. Use grep to search for plain old 7 bit characters.


Actual Results:  
See Details.

Expected Results:  
The first expression should find nothing, because there are no uppercase
characters in the input.

The second expression should print the "Y", because it's an uppercase "y".

The expected results are printed with locales that are not *.utf8.

grep-2.5.1a from the GNU mirrors does not fix these problems (I tested that).

<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=249245&archive=yes> apparently
was fixed by the Debian guys.

Comment 1 Michael Mauch 2004-12-31 01:10:27 UTC

Both of these problems are fixed in the CVS version:

# echo utf breaks grep | LC_ALL=en_US.utf8 ./grep "[A-Z]"
# echo Y | LC_ALL=en_US.utf8 ./egrep -i '[y]'
Y
#

Comment 2 Stian Skjelstad 2004-12-31 13:24:10 UTC

Is this releated to the sed bugs I have seen refered where people are seeing odd results when using estonian (I remember right) locals and [a-z] ranges ?

Comment 3 Ciaran McCreesh 2004-12-31 14:35:52 UTC

Comment #2 -- no, not really.

Comment 4 SpanKY gentoo-dev

2005-01-05 21:37:17 UTC

added deb patch to 2.5.1-r7