Sag1v Sag1v - 2 months ago 16
C# Question

Regex Matches in C# get groups of strings that not contains a pattern

I'm trying to get collection of string subsets from a string,
in this example pairs of

<tags></tags>

Given the string:



<tag>abc</tag><tag>123</tag>


I want 2 groups:
<tag>abc</tag>
and
<tag>123</tag>


That's easy as
<tag>.*?</tag>
pattern.

Example

But I would like it to be more precise.

Given the string:

<tag>abc</tag><tag><tag>123</tag>


I would it to omit the second
<tag>
in the middle (because I'm searching for open and closing tags).

I want this result:

<tag>abc</tag>
<tag>123</tag>


I've tried to create a lookahead or lookbehind but no luck (I'm sure I'm using it wrong):

<tag>.*?(?<!<tag>)</tag>

Answer

I assume the <tag> and </tag> are used as an example as leading/trailing delimiters.

Note that the lazy dot matching will still match from the first leading delimiter till the first occurrence of the trailing delimiter including any occurrences of the leading one.

To work around it, use a tempered greedy token:

<tag>(?:(?!</?tag>).)*</tag>

See the regex demo

Since the lookahead is executed at each position, this construct is rather resource consuming. You can unroll it as

<tag>[^<]*(?:<(?!/?tag>)[^<]*)*</tag>

See another regex demo.

Comments