Blue Eyed Behemoth Blue Eyed Behemoth - 2 months ago 5
C# Question

Retrieving Inner Most If Condition in Text with Regex

I have a text file containing the following text (sample nested if, I know it doesn't make sense.):

<if string=%fld.plaintiffsSex eql=Male>
<set field=plaintiffPronoun1 value=[his]>
<set field=plaintiffPronoun2 value=[he]>
<set field=plaintiffPronoun3 value=[him]>
<else>
<if string=%fld.plaintiffsSex eql=Female>
<set field=plaintiffPronoun1 value=[her]>
<set field=plaintiffPronoun2 value=[she]>
<set field=plaintiffPronoun3 value=[her]>
</if>
</if>


Unfortunately, I have to use Regex to get the innermost if statement. I currently have the following Regex, but it's not working as I'd expect. The Regex statement essentially just has to be any if statement that doesn't contain
<if
.

// first if that doesn't contain <if to </if>
[\s\S]*(<if[\s\S]*?(?!.*<if)[\s\S]*?<\/if>)


See it here http://regexr.com/3e8p7

What I want to capture is just:

<if string=%fld.plaintiffsSex eql=Female>
<set field=plaintiffPronoun1 value=[her]>
<set field=plaintiffPronoun2 value=[she]>
<set field=plaintiffPronoun3 value=[her]>
</if>


Currently, it gets what I want as Group[1], but I just want it to be the whole match.

Please don't answer with alternative methods/extensions for parsing XML or text.

EDIT:

I tried copying and pasting the same thing twice but it still comes up with just one match when it should be two.

EDIT 2:

I'm working in C#.

Answer

Regex:

<if[^<]*(?:<(?!if)[^<]*)*?<\/if>

Live demo

The idea is to check if there is no opening <if tag inside current if statement.

Explanation:

<if         # Match `<if` tag
[^<]*       # Anything up to a `<`
(?:         # Start of non-capturing group (a)
    <(?!if)     # If `<` is not followed by `if` (there is no `if` inside current `if`)
    [^<]*       # Anything up to a `<`
)*?         # End of non-capturing group (a) - repeat current pattern zero or more times (un-greedy)
<\/if>      # Up to closing `</if` tag