Master DJon Master DJon - 5 months ago 5x
PHP Question

Regex matching nested beginning and ending tags

Here are strings that I'd like to extract the contain between the tags

, I mean the first and last one (inner ones will be rechecked by the engine) :

  • "before {{if^^p1^p2}} IN1; {{if^ ^p1}} {{iif}} IN3 {{/if}} IN1-1 {{/if}} after"

  • "before {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{/if}} after"

  • "before {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{if^ ^p1}} IN4 {{/if}} {{/if}} after"

The regex is :

EDIT 3 : I removed the obligation to support TAGs without ending one. I reformatted the question for futur users, to understand some comments below, see first version of the post

More, I have it to works for all three at the same time giving me three matches, which is not working on the website regex101. Line breaks have to be supported within the match. Though, I could accept that only last two combined gives two matches because I could change the tag of alone

My other solution is not using regular expressions, but I would like to do so if it's possible.


You can use

~{{             # Opening tag start
  (\w+)         # (Group 1) Tag name
  \^            # Aux delimiter
  ([^^\{\}]?)   # (Group 2) Specific delimiter
  \^            # Aux delimiter
  ([^\{\}]+)    # (Group 3) Parameters
 }}             # Opening tag end
  (             # (Group 4)
     (?R)       # Repeat the whole pattern
     |          # or match all that is not the opening/closing tag
   )*           # Zero or more times
 {{/\1}}        # Closing tag

See the regex demo

In general, the expression is based on recursion and a tempered greedy token. The [^{]*(?:\{(?!{/?\1[^\{\}]*}})[^{]*)* part is an unrolled (?s:(?!{{/?\1}}).)* pattern that matches any character (.) that is not the starting point for a {{TAG}} or {{/TAG}} character sequences.

You do not need a DOTALL modifier for this pattern as there is no . in the pattern.

Here is a PHP demo:

$re = '~{{(\w+)\^([^^\{\}]?)\^([^\{\}]+)}}((?>(?R)|[^{]*(?:\{(?!{/?\1[^\{\}]*}})[^{]*)*)*){{/\1}}~i'; 
$str = "before {{if^^p1^p2}} IN1; {{if^ ^p1}} {{iif}} IN3 {{/if}} IN1-1 {{/if}} after\nbefore {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{/if}} after\nbefore {{if^ ^p1}} IN1; {{if^ ^p1}} {{if^ ^p1}} IN3 {{/if}} {{/if}} IN1-1 {{if^ ^p1}} IN4 {{/if}} {{/if}} after"; 
preg_match_all($re, $str, $matches);