Two bugs from Debian's BTS relating to grep and UTF-8, both also seen on Gentoo. The one mentioned in the URL above: # echo utf breaks grep | LC_ALL=en_US.utf8 grep "[A-Z]" utf breaks grep # And the one from <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=249245&archive=yes>: # echo Y | LC_ALL=en_US.utf8 egrep -i '[y]' # Reproducible: Always Steps to Reproduce: 1. Set your locale to *.utf8. 2. Use grep to search for plain old 7 bit characters. Actual Results: See Details. Expected Results: The first expression should find nothing, because there are no uppercase characters in the input. The second expression should print the "Y", because it's an uppercase "y". The expected results are printed with locales that are not *.utf8. grep-2.5.1a from the GNU mirrors does not fix these problems (I tested that). <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=249245&archive=yes> apparently was fixed by the Debian guys.
Both of these problems are fixed in the CVS version: # echo utf breaks grep | LC_ALL=en_US.utf8 ./grep "[A-Z]" # echo Y | LC_ALL=en_US.utf8 ./egrep -i '[y]' Y #
Is this releated to the sed bugs I have seen refered where people are seeing odd results when using estonian (I remember right) locals and [a-z] ranges ?
Comment #2 -- no, not really.
added deb patch to 2.5.1-r7