M.Emin Yıldız M.Emin Yıldız - 4 months ago 41
C# Question

C# regex for matching sepcific text inside nested parentheses

I have these code lines for take to operators between parentheses:

string filtered = Regex.Replace(input, "\\(.*?\\)", string.Empty);
var result = filtered.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries)
.Where(element => element == "OR" || element == "AND");
string temp = string.Join(" ", result);


These lines do not work for nested parentheses.

For example; it is working for this input :

X1 OR ( X2 AND X3 AND X4 AND X5 ) OR X6


It give me this result: OR OR

But, when my input has more than one nested parentheses, it works wrongly.

For this input:

X1 OR ( X2 AND( X3 AND X4 ) AND X5 ) OR X6


I want to take for result OR OR but it prints OR AND OR.

Although there are two
(
characters in string, when it ends processing after matching the first
)
character.

How can I adjust my regex pattern?

Answer

Your \(.*?\) regex contains 3 parts: 1) \( matching a literal (, 2) .*? lazy dot matching pattern (that matches 0+ any characters other than a newline, as few as possible, up to the first ), and 3) a \) matching a literal ).

Use balancing construct if your strings cannot have escaped sequences:

@"\((?>[^()]|(?<o>)\(|(?<-o>)\))*\)(?(o)(?!))"

The point here is that the expression should not be enclosed with any anchors (as in What are regular expression Balancing Groups).

Details:

  • \( - a literal (
  • (?> - start of an atomic group to prevent backtracking into it
    • [^()] - any char other than ( and )
    • | - or
    • (?<o>)\( - matches a literal ( and pushes an empty value into stack "o"
    • | - or
    • (?<-o>)\) - matches a literal ) and removes one value from stack "o"
  • )* - zero or more occurrences of the atomic group are matched
  • \) - a literal )
  • (?(o)(?!)) - a conditional construct failing the match if stack "o" contains values (is not empty).

See the regex demo.

var input = "X1 OR ( X2 AND( X3 AND X4 ) AND X5 ) OR X6";
var filtered = Regex.Replace(input, @"\((?>[^()]|(?<o>)\(|(?<-o>)\))*\)(?(o)(?!))", string.Empty);
var result = filtered.Split(new[] { ' ' }, 
    StringSplitOptions.RemoveEmptyEntries)
    .Where(element => element == "OR" || element == "AND");    
var temp = string.Join(" ", result);

See the C# demo

Comments