Royi Namir Royi Namir - 3 months ago 9
C# Question

Regex string reducer in C#?

Say I have this unknown string :

var t = "G9906QZN-SXK9-TUCE-10F5-CB2C1DA9D24A.hello";


I need to generate a regex for that string in a general way.

Please notice , not a regex for the exact string , otherwise I would've used the exact chars.

In other words : all those 3 should have the same regex :

G9906QZN-SXK9-TUCE-10F5-CB2C1DA9D24A.hello
G9906QZN-SXK9-TUCE-267F-F361D103A627.hello
G9906QZN-SXK9-TUCE-0360-370482E00155.hello


And all those 3 should also have the same regex :

G9906QZN^SXK9^TUCE^10F5^CB2C1DA9D24A.hello
G9906QZN^SXK9^TUCE^267F^F361D103A627.hello
G9906QZN^SXK9^TUCE^0360^370482E00155.hello


Also - there can be more than one splitter - it's a random pattern generated files :

So all those 3 should also have the same regex :

G9906QZN^SXK9 TUCE[10F5-CB2C1DA9D24A.hello
G9906QZN^SXK9 TUCE[267F-F361D103A627.hello
G9906QZN^SXK9 TUCE[0360-370482E00155.hello


So this is what I've done : (ignore case sensitive for now)

Code:

var t = "G9906QZN-SXK9-TUCE-10F5-CB2C1DA9D24A.hello";

List<string> lst = new List<string>(); //stringBuilder can also be used.

foreach (char element in t)
{
if (char.IsDigit(element) || char.IsLetter(element))
lst.Add(@"\w");
else
lst.Add(@"\"+element); //escape all other

}
Console.WriteLine(string.Join( "",lst.ToArray()) );


Result:

\w\w\w\w\w\w\w\w\-\w\w\w\w\-\w\w\w\w\-\w\w\w\w\-\w\w\w\w\w\w\w\w\w\w\w\w\.\w\w\w\w\w


Question:

I want to "shrink" that regex into something like :

\w{8}\-\w{4}\-\w{4}\-\w{12}\.\w{5}


Before I start doing something very ugly like : first occurrence , last occurrence , and reset counters , Is there any more elegant way of doing it?

Answer

You can just use regex to generate it

var t = "G9906QZN-SXK9-TUCE-10F5-CB2C1DA9D24A.hello";

Console.WriteLine(Regex.Replace(Regex.Escape(t), @"\w+", m => @"\w{" + m.Length + "}"));

result:

\w{8}-\w{4}-\w{4}-\w{4}-\w{12}\.\w{5}

As _ is also a separator you can't use \w because it'll match _ and have to use a class instead. (the example also shows the use of an interpolated string)

Console.WriteLine (Regex.Replace (Regex.Escape (t),
                                  "[a-zA-Z0-9]+", m => $"[a-zA-Z0-9]{{{m.Length}}}"));
Comments