John John - 1 month ago 8
C# Question

Regex: Matching all words EXCEPT those inside of parenthesis (C#)

So given:

COLUMN_1, COLUMN_2, COLUMN_3, ((COLUMN_1) AS SOME TEXT) AS COLUMN_4, COLUMN_5


How would I go about getting my matches as:

COLUMN_1
COLUMN_2
COLUMN_3
COLUMN_4
COLUMN_5


I've tried:

(?<!(\(.*?\)))(\w+)(,\s*\w+)*?


But I feel like I'm way off base :( I'm using regexstorm.net for testing.

Appreciate any help :)

Answer

You need a regex that keeps track of opening and closing parentheses and makes sure that a word is only matched if a balanced set of parentheses (or no parentheses at all) follow:

Regex regexObj = new Regex(
    @"\w+                  # Match a word
    (?=                    # only if it's possible to match the following:
        (?>                # Atomic group (used to avoid catastrophic backtracking):
           [^()]+          # Match any characters except parens
        |                  # or
           \(  (?<DEPTH>)  # a (, increasing the depth counter
        |                  # or
           \)  (?<-DEPTH>) # a ), decreasing the depth counter
        )*                 # any number of times.
        (?(DEPTH)(?!))     # Then make sure the depth counter is zero again
        $                  # at the end of the string.
    )                      # (End of lookahead assertion)", 
    RegexOptions.IgnorePatternWhitespace);

I tried to provide a test link to regexstorm.net, but it was too long for StackOverflow. Apparently, SO also doesn't like URL shorteners, so I can't link this directly, but you should be able to recreate the link easily: http://bit[dot]ly/2cNZS0O

Comments