instead - 10 months ago 104
PHP Question

# PHP PREG_JIT_STACKLIMIT_ERROR - inefficient regex

I am getting PREG_JIT_STACKLIMIT_ERROR error in

preg_replace_callback()
function when working with a bit longer string. Above 2000 characters it is not woking (above 2000 characters that match regex, not 2000 character string).

I've read already that it's caused by inefficient regex, but I can't make my regex simpler. Here's my regex:

/\{@([a-z0-9_]+)-((%?[a-z0-9_]+(:[a-z0-9_]+)*)+)\|(((?R)|.)*)@\}/Us

It should match strings like these:

1)
{@if-statement|echo this|echo otherwise@}

2)
{@if-statement:sub|echo this|echo otherwise@}

3)
{@if-statement%statament2:sub|echo this@}

and also nested like this:

4)
{@if-statement|echo this|
{@if-statement2|echo this|echo otherwise@}
@}

I've tried to simplify it to:

/\{@([a-z0-9_]+)-([a-z0-9_]+)\|(((?R)|.)*)@\}/Us

But it looks like error is caused by
(((?R)|.)*)

Code for testing:

$string = '{@if-is_not_logged_homepage| <header id="header_home"> <div class="in"> <div class="top"> <h1 class="logo"><a href="/"><img src="/img/logo-home.png" alt=""></a></h1> <div class="login_outer_wrapper"> <button id="login"><div class="a"><i class="stripe"><i></i></i>Log in</div></button> <div id="login_wrapper"> <form method="post" action="{^login^}" id="form_login_global"> <div class="form_field no_description"> <label>{!auth:login_email!}</label> <div class="input"><input type="text" name="form[login]"></div> </div> <div class="form_field no_description password"> <label>{!auth:password!}</label> <div class="input"><input type="password" name="form[password]"></div> </div> <div class="remember"> <input type="checkbox" name="remember" id="remember_me_check" checked> <label for="remember_me_check"><i class="fa fa-check" aria-hidden="true"></i>Remember</label> </div> <div class="submit_box"> <button class="btn btn_check">Log in</button> </div> </form> </div> </div> </div> <div class="content clr"> <div class="main_menu"> <a href=""> <i class="ico a"><i class="fa fa-lightbulb-o" aria-hidden="true"></i></i> <span>Idea</span> <div>&nbsp;</div> </a> <a href=""> <i class="ico b"><i class="fa fa-user" aria-hidden="true"></i></i> <span>FFa</span> </a> <a href=""> <i class="ico c"><i class="fa fa-briefcase" aria-hidden="true"></i></i> <span>Buss</span> </a> </div> <div class="text_wrapper"> <div> <div class="register_wrapper"> <a id="main_register" class="btn register">Załóż konto</a> <form method="post" action="{^login^}" id="form_register_home"> <div class="form_field no_description"> <label>{!auth:email!}</label> <div class="input"><input type="text" name="form2[email]"></div> </div> <div class="form_field no_description password"> <label>{!auth:password!}</label> <div class="input tooltip"><input type="password" name="form2[password]"><i class="fa fa-info-circle tooltip_open" aria-hidden="true" title="{!auth:password_format!}"></i></div> </div> <div class="form_field terms no_description"> <div class="input"> <input type="checkbox" name="form2[terms]" id="terms_check"> <label for="terms_check"><i class="fa fa-check" aria-hidden="true"></i>Agree</label> </div> </div> <div class="form_field no_description"> <div class="input captcha_wrapper"> <div class="g-recaptcha" data-sitekey="{%captcha_public_key%}"></div> </div> </div> <div class="submit_box"> <button class="btn btn_check">{!auth:register_btn!}</button> </div> </form> </div> </div> </div> </div> </div> </header> @}';$if_counter = 0;

$parsed_view = preg_replace_callback( '/\{@([a-z0-9_]+)-((%?[a-z0-9_]+(:[a-z0-9_]+)*)+)\|(((?R)|.)*)@\}/Us', function($match ) use( &$if_counter ){ return '<-{'. ($if_counter ++ ) .'}->';
}, $string ); var_dump($parsed_view); // NULL

What is PCRE JIT?

Just-in-time compiling is a heavyweight optimization that can greatly speed up pattern matching. However, it comes at the cost of extra processing before the match is performed. Therefore, it is of most benefit when the same pattern is going to be matched many times.

and how does it work basically?

PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where the local data of the current node is pushed before checking its child nodes... When the compiled JIT code runs, it needs a block of memory to use as a stack. By default, it uses 32K on the machine stack. However, some large or complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT is given when there is not enough stack.

By first quote you will understand JIT is an optional feature that is on by default in PHP [v7.*] PCRE. So you can easily turn it off: pcre.jit = 0 (it's not recommended though)

However, while receiving error code #6 of preg_* functions it means possibly JIT hits the stack size limit.

Since capturing groups consume more memory than non-capturing groups (even more memory is intended to be used as per type of quantifier(s) of clusters):

1. Capturing group OP_CBRA (pcre_jit_compile.c:#1138) - (real memory is more than this):
case OP_CBRA:
case OP_SCBRA:
bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
break;
1. Non-capturing group OP_BRA (pcre_jit_compile.c:#1134) - (real memory is more than this):
case OP_BRA:
break;

Therefore changing capturing groups to non-capturing groups in your own RegEx makes it to give proper output (which I don't know exactly how much memory is saved by that)

But it seems you need capturing groups and they are necessary. Then you should re-write your RegEx for the sake of performance. Backtracking is almost everything in a RegEx that should be considered.

## Update #1

Solution:

(?(DEFINE)
(?<recurs>
(?! {@|@} ) [^|] [^{@|\\]* ( \\.[^{@|\\]* )* | (?R)
)
)
{@
(?<If> \w+)-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}

Live demo

PHP code (watch backslash escaping):

preg_match_all('/(?(DEFINE)
(?<recurs>
(?! {@|@} ) [^|] [^{@|\\\\]* ( \\\\.[^{@|\\\\]* )* | (?R)
)
)
{@
(?<If> \w+ )-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}/x', $string,$matches);

This is your own RegEx that is optimized in a way to have least backtracking steps. So whatever was supposed to be matched by your own one is matched by this too.

Most of quantifiers are written possessively (avoids backtrack) by appending + to them.