Michael Tot Korsgaard Michael Tot Korsgaard - 23 days ago 10
C# Question

Get index of second non-alphanumeric

So I'm trying to sanitize some text chunks. I thought that

regex
might be a nice solution rather than having a bunch of
if
statemement. But alas I'm not that good with the regex expressions. So I hoped that some of you guys might be willing to help me.

The case
I have different text which needs to be formattet

string one = "tbEmails";
string two = "dbo.tbEmails";
string three = "dbo.tbEmails,\n\t";
string four = "dbo.tbEmails.";


The result I'm looking for is

one = "tbEmails";
two = "dbo.tbEmails";
three = "dbo.tbEmails";
four = "dbo.tbEmails";


I know that I can get the index of the first non-alphanumeric value by using

int index = new Regex("[^a-zA-Z ]").Match("dbo.tbEmails,\n\t").Index;


But how can I ignore the first
.
in the regex and get the index of the second non-alphanumeric value. And as a bonus: is there a way to return the first non-alphanumeric value in case there's no
.
in the string?

Answer

Basically, to get an Nth match index, just use Regex.Matches to find all the matches and check if the item with the necessary index can be accessed, and if yes, get the details you need from the Match object:

var index = -1; 
var matches = Regex.Matches(str, @"[^a-zA-Z ]");
if (matches.Count > 1)  // at least 2
{
    index = matches[1].Index;
}

BTW, a non-alphanumeric pattern is [\W_], and an alphanumeric is [^\W_] (or [\w-[_]]).

It also seems that you may use a regex replace operation to get the results you seek with

Regex.Replace(str, @"(?s)^([^\W_]+(?:[\W_][^\W_]+)?).*", "$1");

See the regex demo

Or a simpler matching regex:

var match = Regex.Match(str, @"^[^\W_]+(?:[\W_][^\W_]+)?");
if (match.Success) 
{
    Console.Write(match.Value);
}

Details:

  • ^ - start of string
  • [^\W_]+ - 1 or more alphanumeric chars
  • (?:[\W_][^\W_]+)? - 1 or 0 occurrences of:
    • [\W_] - 1 char other than an alphanumeric char
    • [^\W_]+ - 1 or more alphanumeric chars