Box A is a Desktop linux, with php 5.6.1, with libpcre 8.33 ,
Box B is a Hardened linux, with php 5.5.18 with libpcre 8.35 .
This little test script SegFaults on both:
$str = '/*'.str_repeat(' ',17000);
If I replace preg_replace with mb_ereg_replace (and removing the @ signs) the script does not SegFault.
Steps to Reproduce:
Run the script above.
I don't think a ~17k string should cause havoc in preg_replace().
I will provide stack trace later.
Oh, on the stack trace it seems libpcre recurses too deeply:
#0 0x00007ffff4f01c56 in ?? () from /lib64/libpcre.so.1
#1 0x00007ffff4f0f2c2 in ?? () from /lib64/libpcre.so.1
#2 0x00007ffff4f0f993 in ?? () from /lib64/libpcre.so.1
#10893 0x00007ffff4f0f2c2 in ?? () from /lib64/libpcre.so.1
#10894 0x00007ffff4f0f993 in ?? () from /lib64/libpcre.so.1
#10895 0x00007ffff4f03d5e in ?? () from /lib64/libpcre.so.1
#10896 0x00007ffff4f135b7 in pcre_exec () from /lib64/libpcre.so.1
#10897 0x00000000004cc30d in php_pcre_replace_impl ()
#10898 0x00000000004cd233 in ?? ()
#10899 0x00000000004cd7f9 in ?? ()
#10900 0x00000000007f6a22 in ?? ()
#10901 0x00000000007bd098 in execute_ex ()
#10902 0x0000000000755c09 in zend_execute_scripts ()
#10903 0x00000000006f32ff in php_execute_script ()
#10904 0x00000000007f9c59 in ?? ()
#10905 0x000000000048376f in main ()
Is this normal? Shouldn't an application - like this one - prefer iteration over recursion?
Oh, okay, to sum it up, this is the default behaviour because it's faster. Cool. So, we have a fast implementation that crashes, and a slower one, which wouldn't but it's not supported by the build system.
There seem to be a configure option:
I think it would be a nice option to able to pass that to libpcre. But it's not that essential since mb_ereg_replace seem to work that way, so the developer has the option to use a fast or safe solution.
Upstream bug, for the curious:
I tried emerging libpcre with USE=jit (per bug #514454) but it didn't seem to help.
Well, for me it seems to be a libpcre "bug", instead of a php bug. A disable-stack-for-recursion USE flag would be nice to have (and make it default in the hardened profile perhaps). There are a lot of cases where preg_match/preg_replace are used to filter *user* input. I don't think *any* user input should affect the stack in any way (at least not in a linear way, logarithmic should be fine).
Compiled libpcre with disable-stack-for-recursion. The engine is 2.8 times slower now with the regular expression above. This is sad, because mb_ereg_replace resolves that 4 times faster than the stack-using-libpcre, and does not even crash with bigger inputs.
Created attachment 389224 [details, diff]
Adds disable-stack-for-recursion USE flag
Added opportunity to build libpcre without relying on stack size. Slows down pcre severely.
Here's a simple test case that should print "OK":
$output = preg_replace("/<span>(((?!(<\/span>)).)*)<\/span>/",
"BEGIN \\1 END" ,$input);
(hit it with "php" on the command-line").
Compiling libpcre with --disable-stack-for-recursion fixes the segfault. Would something like that be appropriate behind USE=hardened? It does slow things down, but a lot of the time USE=hardened means "make this safe and slow."
*** Bug 481216 has been marked as a duplicate of this bug. ***