Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 531946 - sys-apps/grep-2.21 has different behaviour on binary files with regular expression than previous versions
Summary: sys-apps/grep-2.21 has different behaviour on binary files with regular expre...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-07 21:41 UTC by Cedric Godin
Modified: 2015-03-03 21:56 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Test file to grep (test-grep,28 bytes, application/octet-stream)
2015-03-02 19:55 UTC, Cedric Godin
Details
history file to illustrate bug (hist.log,28.53 KB, text/plain)
2015-03-02 23:30 UTC, Andreas Proteus
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cedric Godin 2014-12-07 21:41:15 UTC
For some days, my grub-mkconfig says that my Windows is a Vista one but before it detected it correctly as being 7.
Looking in os-prober I see that the way to detect the OS type is to grep on the version in BCD file.
When I do it manually, the grep doesn't indeed find anything about 7. But if I emerge a previous version (up to 2.20) of grep, the result is the opposite.
If I pass -a as parameter to grep 2.21, then it macthes the string.
I'm wondering if other applications tha os-prober suffer from this behaviour.

Reproducible: Always

Steps to Reproduce:
endymion Boot # grep -V
grep (GNU grep) 2.16
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
endymion Boot # grep "W.i.n.d.o.w.s. .7" BCD
Binary file BCD matches



endymion Boot # grep -V
grep (GNU grep) 2.21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
endymion Boot # grep "W.i.n.d.o.w.s. .7" BCD
endymion Boot # echo $?
1
endymion Boot # grep -qsa "W.i.n.d.o.w.s. .7" BCD
endymion Boot # echo $?
0
endymion Boot # 



Portage 2.2.15 (python 2.7.8-final-0, default/linux/amd64/13.0/desktop, gcc-4.8.3, glibc-2.20, 3.17.4-gentoo x86_64)
=================================================================
System uname: Linux-3.17.4-gentoo-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.2
KiB Mem:     8138308 total,   3350176 free
KiB Swap:    4000180 total,   4000148 free
Timestamp of tree: Sun, 07 Dec 2014 17:45:01 +0000
sh bash 4.3_p30-r1
ld GNU ld (Gentoo 2.24 p1.4) 2.24
app-shells/bash:          4.3_p30-r1
dev-java/java-config:     2.2.0
dev-lang/perl:            5.20.1-r3
dev-lang/python:          2.7.8, 3.3.5-r1
dev-util/cmake:           3.0.2
dev-util/pkgconfig:       0.28-r2
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.13.6
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.13, 2.69
sys-devel/automake:       1.11.6-r1, 1.14.1
sys-devel/binutils:       2.24-r3
sys-devel/gcc:            4.8.3
sys-devel/gcc-config:     1.8
sys-devel/libtool:        2.4.3-r2
sys-devel/make:           4.1-r1
sys-kernel/linux-headers: 3.17-r1 (virtual/os-headers)
sys-libs/glibc:           2.20
Repositories: gentoo steam-overlay gnustep
Installed sets: @system
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=corei7-avx -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=corei7-avx -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs collision-protect config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://ftp.snt.utwente.nl/pub/os/linux/gentoo http://gentoo.oregonstate.edu http://www.ibiblio.org/pub/Linux/distributions/gentoo"
LANG="fr_BE.UTF-8"
LC_ALL="fr_BE.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,now"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/steam /var/lib/layman/gnustep"
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="64bit X a52 aac accessibility acl acpi akonadi alsa alsa-plugin amd64 apng archive asf asm audio audiofile avahi avcodec avformat avi bash-completion berkdb bitmap-fonts blender-game branding bzip2 cairo caps cdda cdinstall cdio cdparanoia cdr chm chroot clamav cle266 clucene crypt css cups curl dbus device-mapper dga diskio dts dv dvb dvbpsi dvd dvdnav dvdr dvdread e_modules_clock e_modules_comp e_modules_conf-applications e_modules_conf-dialogs e_modules_conf-display e_modules_conf-edgebindings e_modules_conf-interaction e_modules_conf-intl e_modules_conf-keybindings e_modules_conf-menus e_modules_conf-paths e_modules_conf-performance e_modules_conf-randr e_modules_conf-shelves e_modules_conf-theme e_modules_conf-wallpaper2 e_modules_conf-window-manipulation e_modules_conf-window-remembers e_modules_cpufreq e_modules_dropshadow e_modules_everything e_modules_fileman e_modules_fileman-opinfo e_modules_gadman e_modules_ibar e_modules_ibox e_modules_illume2 e_modules_mixer e_modules_msgbus e_modules_notification e_modules_pager e_modules_shot e_modules_start e_modules_syscon e_modules_systray e_modules_tasks e_modules_temperature e_modules_winlist e_modules_wizard eet egl eigen elf emerald encode equalizer erandom evas exceptions exif faac faad fam fame fbcon fbcondecor fdt ffmpeg fftw file-icons firefox flac font-server fontconfig foomaticdb fts3 fuse g3dvl gallium gcc-libffi gd geoip geoloc gif gimp git glew glib glibc-omitfp glitz gmp gnustep gphoto2 gpm graphviz gsf gsl gstreamer gtk gudev handbook hddtemp hpcups hwdb iceweasel iconv icq icu imap imlib innodb inotify introspection ipc iptables ipv6 irc jack jfs joystick jpeg jpeg2k json kcal kdcraw kde kde4 kdeenablefinal kdepim keymap kipi kmod konqueror kontact kpathsea kqemu largeterminal latex lcms libass libev libffi libkms libnotify libwww lightning live lm_sensors lzma lzo mad maildir marble mdnsresponder-compat memlimit mempool-chained menu-plugin mikmod mmx mmxext mng mod modperl mozdom mozilla moznocompose moznoirc moznomail moznopango mozsvg mp3 mp4 mpeg mpg123 mplayer mpm-worker msn mso mtp multilib musepack musicbrainz ncurses nepomuk net netifrc network new-hpcups nls no-old-linux no_wxgtk1 nowebdav nptl nptlonly nsplugin ntfs nvidia objc offensive ogg oggvorbis okular on-the-fly-crypt openal openctl openexr opengl openmp openssl opus osmesa pam pcf pci pcre pdf pdfkit pdflib perl pg_legacytimestamp phonon plasma png policykit postgres postproc ppds ps pulseaudio python qemu qmax qt qt3support qt4 quicktime raptor rdesktop readline real redland reports rtc sasl sblive scanner script sdl semantic-desktop session shared-glapi slp smbclient smp sndfile solver sound speex spell sql sqlite sse sse2 sse3 sse4 sse4_1 sse4_2 ssl ssse3 startup-notification subversion svg swscale system-sqlite systemd theora threads tidy tiff trash-plugin truetype truetype-fonts type1-fonts udev udisks unicode upower usb userlocales utempter v4l v4l2 vaapi vapigen vcd vdesktop vdpau vhosts videos virtuoso visualization vlc vlm vmmouse vnc vorbis webdav-neon webgl webkit webkit2 webm win32codecs wmf wv2 xa xanim xattr xcb xcomposite xfs xine xinerama xklavier xml xml2 xorg xpm xrandr xulrunner xv xvid xvmc yv12 zip zlib" ABI_X86="64 32" ALSA_CARDS="emu10k1 intel8x0" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="braindump flow karbon kexi krita plan stage tables words sheets" CURL_SSL="gnutls" ELIBC="glibc" ENLIGHTENMENT_MODULES="access clock comp conf-applications conf-dialogs conf-display conf-edgebindings conf-interaction conf-intl conf-keybindings conf-menus conf-paths conf-performance conf-randr conf-shelves conf-theme conf-window-manipulation conf-window-remembers cpufreq dropshadow fileman fileman-opinfo gadman ibar ibox illume2 mixer msgbus notification pager quickaccess shot start syscon systray tasks temperature tiling winlist wizard xkbswitch" GRUB_PLATFORMS="pc multiboot" INPUT_DEVICES="evdev keyboard mouse" KERNEL="linux" LINGUAS="fr en en_US en_GB" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_3" QEMU_SOFTMMU_TARGETS="i386 sparc x86_64 arm ppc" QEMU_USER_TARGETS="i386 sparc sparc64 x86_64 arm ppc" RUBY_TARGETS="ruby20" SANE_BACKENDS="epson2" USERLAND="GNU" VIDEO_CARDS="nvidia" XFCE_PLUGINS="trash menu"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Comment 1 Cedric Godin 2015-03-01 08:49:21 UTC
Hello, any advice on this ? Should I redirect the problem to the grub maintener in order for him to adapt the way to grep ? I'm asking because I think I've seen a GLSA about grep :-) #537046
Comment 2 SpanKY gentoo-dev 2015-03-01 19:00:08 UTC
please attach the file you're testing with here
Comment 3 Andreas Proteus 2015-03-01 20:41:11 UTC
I get similar problems with >=sys-apps/grep-2.21 when I grep my history.

e.g. 
history | grep spell 
returns nothing but with older versions of grep returns all the entries that contain 'spell'.
Comment 4 SpanKY gentoo-dev 2015-03-02 00:45:35 UTC
(In reply to Andreas Proteus from comment #3)

again, we need exact inputs to reproduce here.  things that are specific to your system are obviously not easy to recreate on others.

simply write your history to a file:
  history > history.log

if the grep still fails:
  grep spell history.log

then attach that file here.
Comment 5 Cedric Godin 2015-03-02 19:55:01 UTC
Created attachment 397904 [details]
Test file to grep

Here is a test file that shows the behaviour change

cedric@endymion ~ $ grep -V
grep (GNU grep) 2.20                                                                          
...                                                                                                                                                                                  cedric@endymion ~ $ grep ".grep" test-grep                                                         
Binary file test-grep matches

cedric@endymion ~ $ grep -V
grep (GNU grep) 2.21                                                                             
...
cedric@endymion ~ $ grep ".grep" test-grep                                                         
cedric@endymion ~ $
Comment 6 Andreas Proteus 2015-03-02 23:30:29 UTC
Created attachment 397910 [details]
history file to illustrate bug


$ grep spell hist.log 
Binary file (standard input) matches

$ grep -I spell hist.log
$ (returns nothing).

$ grep -a spell hist.log
$ returns all lines containing 'spell'

With older than 2.21 versions 
$ grep spell hist.log 
returns the lines containg 'spell' as expected.

I think that this problem may be caused when the file contains strings of different encodings. i.e. strings from other than the current locale.
Comment 7 SpanKY gentoo-dev 2015-03-03 01:39:47 UTC
what does `locale` say for both of you guys ?  and `emerge -pv grep` ?

i can't reproduce with Cedric's file directly, but running it through hexdump, it looks like binary data to me -- you've got an embedded NUL in there.

Andreas file is arguably broken: you've got binary data (ISO-8859-1?) at line 690 and UTF-8 at lines 852-854.

my guess is that you're both using a UTF-8 locale which means the files are (rightly) detected as binary as neither are encodable as UTF-8.  i would say grep is working correctly, but this would be something to take to upstream to see what they think.
Comment 8 Andreas Proteus 2015-03-03 04:02:44 UTC
Thank you for the reply.  
I also read the reply from bug-grep.
My history is not broken. My default locale is UTF-8 but I frequently deal with
media containing ISO named files and directories.  So commands including ISO
characters are saved in history.

So I presume the answer is either 
alias grep='grep -a' 
or stick to an older version of grep.

P.S. The first instance I noticed the new behaviour of grep was 
when I was clearing old kernels by greping the output of qlist.

qlist -ICv gentoo-sources | grep -v "\."[78]

I had kernels installed which were no longer in portage and grep returned
nothing.  This means that grep found "binary" data in the output of qlist 
and gave up.
Comment 9 Andreas Proteus 2015-03-03 04:15:19 UTC
What I meant to say in my P.S. above is that there may be many scripts broken as a result of this new behaviour of grep.
Comment 10 SpanKY gentoo-dev 2015-03-03 05:21:19 UTC
(In reply to Andreas Proteus from comment #8)

i don't see how your qlist example makes sense.  it would only have output characters in the ASCII printable set.  do you have an actual list that shows a problem there ?
Comment 11 Andreas Proteus 2015-03-03 07:14:09 UTC
(In reply to SpanKY from comment #10)
Unfortunately this weekend I cleaned up all my machines and I cannot reproduce this error to post more details.  I will keep an eye for it and if it occurs again I will post a separate bug report.
Comment 12 Cedric Godin 2015-03-03 20:39:05 UTC
This change of behaviour is on binary files only I think (like the BCD file grub is using to detect the windows version). And if you treat the file as text, it will work.

> emerge -pv grep

These are the packages that would be merged, in order:

Calculating dependencies                   ... done!                  
[ebuild   R    ] sys-apps/grep-2.21-r1::gentoo  USE="nls pcre -static" 0 KiB

Total: 1 package (1 reinstall), Size of downloads: 0 KiB
> locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
Comment 13 SpanKY gentoo-dev 2015-03-03 21:56:49 UTC
(In reply to Cedric Godin from comment #12)

ok, your case is slightly different, but still intentional.  from the NEWS:
  When searching binary data, grep now may treat non-text bytes as
  line terminators.  This can boost performance significantly.

grep has always considered these files as binary.  the difference is that now NULs, rather than being matchable (e.g. by your "."), are used as terminators.  looking at your example file:
  This is a test file to\0grep\n

grep-2.20 would treat that as one line and allow ".grep" to match the "\0grep".  but grep-2.21 treats that as two lines like:
  This is a test file to\ngrep\n
so the "." doesn't get a chance to match the \0.  you can see this by using "^grep" on the file -- it'll match.

use -aq to get correct behavior with both grep 2.20 and 2.21.  if you want to plead your case, feel free to e-mail upstream, but i'm closing this as the change you describe is covered in the NEWS file.