p.campbell p.campbell - 11 days ago 4
C# Question

Remove text in-between delimiters in a string (using a regex?)

Consider the requirement to find a matched pair of set of characters, and remove any characters between them, as well as those characters/delimiters.

Here are the sets of delimiters:

[] square brackets
() parentheses
"" double quotes
'' single quotes


Here are some examples of strings that should match:

Given: Results In:
-------------------------------------------
Hello "some" World Hello World
Give [Me Some] Purple Give Purple
Have Fifteen (Lunch Today) Have Fifteen
Have 'a good'day Have day


And some examples of strings that should not match:

Does Not Match:
------------------
Hello "world
Brown]co[w
Cheese'factory


If the given string doesn't contain a matching set of delimiters, it isn't modified. The input string may have many matching pairs of delimiters. If a set of 2 delimiters are overlapping (i.e.
he[llo "worl]d"
), that'd be an edge case that we can ignore here.

The algorithm would look something like this:

string myInput = "Give [Me Some] Purple (And More) Elephants";
string pattern; //some pattern
string output = Regex.Replace(myInput, pattern, string.Empty);


Question: How would you achieve this with C#? I am leaning towards a regex.

Bonus: Are there easy ways of matching those start and end delimiters in constants or in a list of some kind? The solution I am looking for would be easy to change the delimiters in case the business analysts come up with new sets of delimiters.

Answer

Simple regex would be:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "(\\[.*\\])|(\".*\")|('.*')|(\\(.*\\))";
string output = Regex.Replace(input, regex, "");

As for doing it a custom way where you want to build up the regex you would just need to build up the parts:

('.*')  // example of the single quote check

Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.

In my first example that would take the place of the following line:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "Your built up regex here";
string sOutput = Regex.Replace(input, regex, "");

I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.

Comments