HaBo HaBo - 5 days ago 6
C# Question

C# Regex to match all occurrences of a pattern and replace with empty string

I am trying to match a pattern

<two alpha chars>single space<two digits>single space<two digits>
and remove in all occurrences in a string.

var myRegex = @"(?:^|[\s]|[, ]|[.]|[\n]|[\t])([A-Za-z]{2}\s[0-9]{2}\s[0-9]{2})($|[,]|[.]|[\s]|[\n]|[\t])";

string myString = "this 02 34, HU 23 76 , hh 76 745 1.HO 12 33. HO 34 56";
var matches = Regex.Matches(myString, myRegex);

foreach (Match match in matches)
{
myString = myString.Replace(match.Value, "");
}


In above variable myString "this 02 34" will not match as there is no
space or period or comma or new line or tab
. This is expected behavior.

But "HO 34 56" is not matching as it is not ending with
space or period or comma or new line or tab
. How can I include this in the match and not have a match for "hh 76 745"

After executing above code, I expect
myString
variable to have "this 02 34, , hh 76 745 1.. "

Answer

Use this regex with word boundaries:

\b[A-Za-z]{2}\s[0-9]{2}\s[0-9]{2}\b

See the regex demo

Details:

  • \b - a leading word boundary
  • [A-Za-z]{2} - 2 alpha
  • \s - a whitespace
  • [0-9]{2} - 2 digits
  • \s - a whitespace
  • [0-9]{2} - 2 digits
  • \b - a trailing word boundary.

If you need to say "not preceded with alpha" replace the first \b with (?<![a-zA-Z]) and if you want to say "not followed with digit" replace the last \b with (?!\d). That is, use lookarounds, that, like word boundaries, are zero-width assertions.

If you really after matching that chunk when it has leading or trailer with following space or period or comma or new line or tab or beginning of string or end of string, use

(?<=^|[\s,.])[A-Za-z]{2}\s[0-9]{2}\s[0-9]{2}(?=$|[\s,.])

See this demo

Comments