Summary: | sys-apps/grep-2.21 has different behaviour on binary files with regular expression than previous versions | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Cedric Godin <cedric.godin> |
Component: | [OLD] Core system | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | CC: | proteuss |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
Test file to grep
history file to illustrate bug |
Description
Cedric Godin
2014-12-07 21:41:15 UTC
Hello, any advice on this ? Should I redirect the problem to the grub maintener in order for him to adapt the way to grep ? I'm asking because I think I've seen a GLSA about grep :-) #537046 please attach the file you're testing with here I get similar problems with >=sys-apps/grep-2.21 when I grep my history. e.g. history | grep spell returns nothing but with older versions of grep returns all the entries that contain 'spell'. (In reply to Andreas Proteus from comment #3) again, we need exact inputs to reproduce here. things that are specific to your system are obviously not easy to recreate on others. simply write your history to a file: history > history.log if the grep still fails: grep spell history.log then attach that file here. Created attachment 397904 [details]
Test file to grep
Here is a test file that shows the behaviour change
cedric@endymion ~ $ grep -V
grep (GNU grep) 2.20
... cedric@endymion ~ $ grep ".grep" test-grep
Binary file test-grep matches
cedric@endymion ~ $ grep -V
grep (GNU grep) 2.21
...
cedric@endymion ~ $ grep ".grep" test-grep
cedric@endymion ~ $
Created attachment 397910 [details]
history file to illustrate bug
$ grep spell hist.log
Binary file (standard input) matches
$ grep -I spell hist.log
$ (returns nothing).
$ grep -a spell hist.log
$ returns all lines containing 'spell'
With older than 2.21 versions
$ grep spell hist.log
returns the lines containg 'spell' as expected.
I think that this problem may be caused when the file contains strings of different encodings. i.e. strings from other than the current locale.
what does `locale` say for both of you guys ? and `emerge -pv grep` ? i can't reproduce with Cedric's file directly, but running it through hexdump, it looks like binary data to me -- you've got an embedded NUL in there. Andreas file is arguably broken: you've got binary data (ISO-8859-1?) at line 690 and UTF-8 at lines 852-854. my guess is that you're both using a UTF-8 locale which means the files are (rightly) detected as binary as neither are encodable as UTF-8. i would say grep is working correctly, but this would be something to take to upstream to see what they think. Thank you for the reply. I also read the reply from bug-grep. My history is not broken. My default locale is UTF-8 but I frequently deal with media containing ISO named files and directories. So commands including ISO characters are saved in history. So I presume the answer is either alias grep='grep -a' or stick to an older version of grep. P.S. The first instance I noticed the new behaviour of grep was when I was clearing old kernels by greping the output of qlist. qlist -ICv gentoo-sources | grep -v "\."[78] I had kernels installed which were no longer in portage and grep returned nothing. This means that grep found "binary" data in the output of qlist and gave up. What I meant to say in my P.S. above is that there may be many scripts broken as a result of this new behaviour of grep. (In reply to Andreas Proteus from comment #8) i don't see how your qlist example makes sense. it would only have output characters in the ASCII printable set. do you have an actual list that shows a problem there ? (In reply to SpanKY from comment #10) Unfortunately this weekend I cleaned up all my machines and I cannot reproduce this error to post more details. I will keep an eye for it and if it occurs again I will post a separate bug report. This change of behaviour is on binary files only I think (like the BCD file grub is using to detect the windows version). And if you treat the file as text, it will work. > emerge -pv grep These are the packages that would be merged, in order: Calculating dependencies ... done! [ebuild R ] sys-apps/grep-2.21-r1::gentoo USE="nls pcre -static" 0 KiB Total: 1 package (1 reinstall), Size of downloads: 0 KiB > locale LANG= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= (In reply to Cedric Godin from comment #12) ok, your case is slightly different, but still intentional. from the NEWS: When searching binary data, grep now may treat non-text bytes as line terminators. This can boost performance significantly. grep has always considered these files as binary. the difference is that now NULs, rather than being matchable (e.g. by your "."), are used as terminators. looking at your example file: This is a test file to\0grep\n grep-2.20 would treat that as one line and allow ".grep" to match the "\0grep". but grep-2.21 treats that as two lines like: This is a test file to\ngrep\n so the "." doesn't get a chance to match the \0. you can see this by using "^grep" on the file -- it'll match. use -aq to get correct behavior with both grep 2.20 and 2.21. if you want to plead your case, feel free to e-mail upstream, but i'm closing this as the change you describe is covered in the NEWS file. |