Mulkave Mulkave - 5 months ago 14
PHP Question

How can I grab the contents of a the body of a function using Regex?

Given a dummy function as such:

public function handle()
{
if (isset($input['data']) {
switch($data) {
...
}
} else {
switch($data) {
...
}
}
}


My intention is to get the contents of that function, the problem is matching nested patterns of curly braces
{...}
.

I've come across recursive patterns but couldn't get my head around a regex that would match the function's contents.

I've tried the following (no recursion):

$pattern = "/function\shandle\([a-zA-Z0-9_\$\s,]+\)?". // match "function handle(...)"
'[\n\s]?[\t\s]*'. // regardless of the indentation preceding the {
'{([^{}]*)}/'; // find everything within braces.

preg_match($pattern, $contents, $match);


That pattern doesn't match at all. I am sure it is the last bit that is wrong
'{([^{}]*)}/'
since that pattern works when there are no other braces within the body.

By replacing it with:

'{([^}]*)}/';


It matched till the closing
}
of the switch inside the
if
statement and stopped there (including
}
of the switch but excluding that of the
if
).

As well as this pattern, same result:

'{(\K[^}]*(?=)})/m';

Answer

#Update

According to others comments

^\s*[\w\s]+\(.*\)\s*\K({((?>"[^"]*+"|'[^']*+'|//.*$|/\*[\s\S]*?\*/|#.*$|<<<\s*["']?(\w+)["']?[^;]+\3;$|[^{}<'"/#]++|[^{}]++|(?1))*)})

Note: A short RegEx i.e. {((?>[^{}]++|(?R))*)} is enough if you know your input does not contain { or } out of PHP syntax.

So a long RegEx, in what evil cases does it work?

  1. You have [{}] in a string between quotation marks ["']
  2. You have [{}] in a comment block. //... or /*...*/ or #...
  3. You have [{}] in a heredoc or nowdoc <<<STR or <<<['"]STR['"]

Otherwise it is meant to have a pair of opening/closing braces.

Do we have a case that it fails?

Unless you don't have a martian that lives inside your code.

 ^ \s* [\w\s]+ \( .* \) \s* \K      # how it matches a function definition
 (                             # (1 start)
      {                             # opening brace
      (                             # (2 start)
           (?>                      # atomic grouping (for its non-capturing purpose only)
                " [^"]*+ "          # double quoted strings
             |  ' [^']*+ '          # single quoted strings
             |  // .* $             # a comment block starting with //
             |  /\* [\s\S]*? \*/    # a multi line comment block /*...*/
             |  \# .* $             # a single line comment block starting with #...
             |  <<< \s* ["']?       # heredocs and nowdocs
                ( \w+ )             # (3) ^
                ["']? [^;]+ \3 ; $  # ^
             |  [^{}<'"/#]++        # force engine to backtack if it encounters special characters [<'"/#] (possessive)
             |  [^{}]++             # default matching bahaviour (possessive)
             |  (?1)                # recurse 1st capturing group
           )*                       # zero to many times of atomic group
      )                             # (2 end)
      }                             # closing brace
 )                             # (1 end)

Formatting is done by @sln's RegexFormatter software.

What I provided in live demo?

Laravel's Eloquent Model.php file (~3500 lines) randomly is given as input. Check it out: Live demo

Comments