JTsoi JTsoi - 3 months ago 6
SQL Question

Order a set of regex patterns OR Get the biggest regex match

Given a list of numbers that are regex patterns, sort them by the last 4 numbers in that numeric value.

I have a list of regex (phone number) patterns and I am trying to sort them by the last 4 characters. Here's a sample list of phone numbers:

8062
\+13066598273
4083100
408320[0-3]
408320[4-6]
752[234569]
\+13066598305
8059


I would like to re-order these numbers by the last 4 numbers so that I would end up with a list like this:

4083100
408320[0-3]
408320[4-6]
752[234569]
8059
8062
\+13066598273
\+13066598305


Now if my patterns were nothing but numbers, I could sort them easily in either SQL or my MVC C# project. In SQL, I could use ORDER BY RIGHT(pattern, 4), or in C# MVC, I could sort my IQueryable list with patterns = patterns.OrderByDescending(s => s.Substring(...etc...)).

Patterns are a bit more difficult. The brackets count as characters, so sorting by the last 4 characters doesn't work as well.

Are there any built-in utilities in C#, MVC, or SQL that allow me to convert regex patterns to the largest possible match?


  • Given a regex pattern, return the largest possible matching regex that matches my condition. For example, if I had the pattern 4[12]00[1-3], I'd have 6 possible results: 41001, 41002, 41003, 42001, 42002, 42003. Then I can get the biggest number possible, and use that for ordering in my original regex list.


    • The regex pattern doesn't contain anything like * or +, special characters that may cause infinite combinations.


  • There may be a C# library that does exactly what I ask for, sorting regex pattern strings.



EDIT:

I've accepted Diego's answer, but it took me a bit of time to wrap my head around it. For other readers who want to know what it's doing, this is what I think Diego is doing:


  1. Make a range of ints, starting at 9999, all the way back to 0. [9999], [9998], [9997]...[0].

  2. Replace the regex-ish part of the strings with a single character. Example, "500[1-5]" would become "500X", or "20[1-9]00[89]" would become "20X00X", so on, so forth.

  3. Get the length of the "last" 4 characters + regex characters.

    var len = lastNChars + pattern.Length - Regex.Replace(pattern, @"\[[^\]]+\]", "X").Length;


    So for the pattern 20[1-9]00[89], the above formula translates to "len = 4 + 13 - 6", or 11.

  4. Using the len variable from above, get a substring that represents the "last" 4 numbers of the phone number, even with regex characters. The original string = "20[1-9]00[89]", while the new substring = "[1-9]00[89]" (the 20 is gone now)

  5. Enumerate through and compare array values to the substring regex pattern. [9999] doesn't match regex pattern, [9998] doesn't match...[9997] doesn't match...aha! 9009 matches! The first match I get is going to the biggest possible regex match, which is what I want.

  6. So each regex pattern has been converted to its largest possible matching pattern. Now we can use C#/LINQ/other in-built methods that can sort by those sub-regex matches for us!



Thank god I'm dealing with only numbers. If I were trying to do sort regex patterns that were actually words/had alpha characters, that would be way harder, and that array would be hella bigger (I think).

Answer

It is hard to find sample strings that match a regular expression without enumerating them all and testing them. I also don't think you will be able to find a C# library that sorts regexes. However, you can solve this problem, however, for the special case of patterns that do not contain quantifiers (such as [0-9]+ or [3-6]{4}) as follows:

const int lastNChars = 4;
var patterns = new string[]{@"8062", @"\+13066598273", @"4083100", 
        @"408320[0-3]", @"408320[4-6]", @"752[234569]", 
        @"\+13066598305", @"8059"};
var range = Enumerable.Range(0, (int) Math.Pow(10, lastNChars))
            .Reverse().ToArray();

var sortedPatterns = patterns.OrderBy(pattern=> {
    var len = lastNChars + pattern.Length 
            - Regex.Replace(pattern, @"\[[^\]]+\]", "X").Length;

    // Get the biggest number in range that matches this regex:
    var biggestNumberMatched = range.FirstOrDefault(x => 
        Regex.IsMatch(x.ToString(new String('0', lastNChars)), 
                    pattern.Substring(pattern.Length - len, len))
    );
    return biggestNumberMatched;
}).ToArray();

After which, sortedPatterns contains the desired output.

Comments