Looks like the unicode performance has regressed on grep again :-(. # find /usr/portage >/dev/shm/portage.files # wc -l /dev/shm/portage.files 177885 /dev/shm/portage.files # time LANG=C grep -c Manifest /dev/shm/portage.files 13818 real 0m0.015s user 0m0.007s sys 0m0.010s # time LANG=en_US.UTF-8 grep -c Manifest /dev/shm/portage.files 13818 real 0m8.778s user 0m8.773s sys 0m0.003s
that's because the trade off comes at the cost of correctness. every unicode speedup patch in the past introduced subtle incorrect behavior. i'm not going to include a patch (even if USE optional) with these trade offs. pretty sure this is in upstream savannah (if it isnt, it's a known issue by people), and there's nothing we're going to do about it ...
this is fixed with grep-2.7 $ grep --version GNU grep 2.5.4 $ time LANG=C grep -c Manifest /dev/shm/portage.files real 0m0.010s $ time LANG=en_US.UTF8 grep -c Manifest /dev/shm/portage.files real 0m7.241s $ time LANG=en_US.UTF8 /var/tmp/portage/sys-apps/grep-2.7/image/bin/grep -c Manifest /dev/shm/portage.files real 0m0.026s