eocron06 eocron06 - 2 days ago 5
C# Question

How to search patterns in arbitrary sequences?

Regex is on string's only, but what if that functionality can be extended to not only character but objects or even further to functions?
Suppose our object's will be integers, they can be in any order:

1 2 3 4 5 6 7 8 9 10 11 12 13


And the task you want to solve is to find prime pairs (or similar pattern search task) like this:

{prime}{anyNumber}{prime}


So the answer is this:

(3,4,5) (5,6,7) (11,12,13)


Or a little more complex example for chain of primes:

{prime}({anyNumber}{prime})+


Answer:

(3,(4,5),(6,7)) (11,(12,13))


Pretty much like Regex work, right?

What happens is that you define some function named isPrime(x) and use it when you need to check if next input element is actualy prime (so it is some sort of equality to object or object space)

What I created so far

I created ObjectRegex class similar to Regex class in C#. It accepts patterns in above and execute predicate asociated with it to identify object.
It works perfectly fine, but the problem is for it to work any sequence of type TValue should be converted to string before it will be passed to Regex pattern and for that I should apply ALL predicates to entire sequence. O(n*m) is a bad idea afterall....

I decided to go around it the hard way and....try to inherit string, which is sealed and inheritance is forbidden. What is needed from this inherited class is override accessor

char this[int index] {get;}


for the benefit of deferred execution of predicates to moment when it actualy make sense.

So, any idea how to make it? I love .NET Regex and it's syntax, is there a way to go around this string curse and deceive engine? Reflection maybe or some hardcore I don't know?

Update 1

I found this article http://www.codeproject.com/Articles/463508/NET-CLR-Injection-Modify-IL-Code-during-Run-time
and think it can be done through replacement of this[int index] method by my code, but i think it will corrupt everything else, cause you just can't replace method for only one instance.

Answer

String inheritance

After some research, I found that idea to optimize existing Regex is impossible. This is because even if I know index in string, I still don't have access to possible states in Regex automaton, which I should look to filter unneccesary calculations.

ORegex

As to answer, I decided to implement my own engine similar to Microsoft Regex engine. Syntax is the same as Microsoft Regex syntax. You can find more information and examples at Nuget and github:

Currently, it supports basic Regex engine features and also some of popular features like lookahead and capturing.

Example

public static bool IsPrime(int number)
{
    int boundary = (int)Math.Floor(Math.Sqrt(number));
    if (number == 1) return false;
    if (number == 2) return true;
    for (int i = 2; i <= boundary; ++i)
    {
        if (number % i == 0) return false;
    }
    return true;
}

public void PrimeTest()
{
    var oregex = new ORegex<int>("{0}(.{0})*", IsPrime);
    var input = new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13};
    foreach (var match in oregex.Matches(input))
    {
        Trace.WriteLine(string.Join(",", match.Values));
    }
}

//OUTPUT:
//2
//3,4,5,6,7
//11,12,13
Comments