dgills dgills - 4 months ago 7
C# Question

Regex match text at Nth occurence between a set of characters/a given pattern

I'm looking to be able to take an MDX string like this:

[Event Product].[Event Category Filter].[Category Group].&[E].&[F].&[G].&[H]


And to use regex to individually parse out the first, second and third strings between the square brackets.

So, ideally three custom regex expressions that would return:

Event Product

Event Category Filter

Category Group


The fourth grouping and onward can be optional (although the first three will always be present). It would be great to have a fourth expression capable of returning every subsequent match after in a delimited one-liner, so something like:

E,F,G,H


I've captured the first group:

((?<=\[)(.*?)(?=\]))


However I'm struggling with the nth exclusions piece.

Would anyone be able to lend a hand here? :)

Edit:

Should've mentioned this upfront, we're using regex assembly script and doing this in vanilla T-SQL.

https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/

A C# solution is what I'd ultimately want to use, but a pure regex solution for the time being would work best. I work mostly with a non-dot net stack so we don't have a full tool kit at our disposal as far as our ETL goes, unfortunately.

Answer

If you create a C# class like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;

namespace ExtractMdxParts
{
    public partial class UserDefinedFunctions
    {
        public class EventData
        {
            public SqlString Product;
            public SqlString CategoryFilter;
            public SqlString Group;
            public SqlString ExtraData;
        }

        [Microsoft.SqlServer.Server.SqlFunction(
        FillRowMethodName = "FillRow",
        TableDefinition = "Product nvarchar(128), CategoryFilter nvarchar(128), Group nvarchar(128), Extradata nvarchar(MAX)",
        IsDeterministic = true)]

        public static IEnumerable ExtractParts([SqlFacet(MaxSize = -1)] String MdxString)
        {
            string[] parts = MdxString.Split(".".ToCharArray(), 4, StringSplitOptions.None);
            if (parts.Length < 3)
            {
                return null;
            }

            List<EventData> x = new List<EventData> { };
            char[] trimChars = "[]".ToCharArray();
            EventData y = new EventData { Product = parts[0].Trim(trimChars), CategoryFilter = parts[1].Trim(trimChars), Group = parts[2].Trim(trimChars) };

            if (parts.Length == 4)
            {
                y.ExtraData = string.Join(",", parts[3].Split(".".ToCharArray()).Select(p => p.Substring(1).Trim(trimChars)));
            }

            x.Add(y);

            return x;

        }

        public static void FillRow(object eventData, out SqlString product, out SqlString categoryFilter, out SqlString group, out SqlString extraData)
        {
            //I'm using here the FileProperties class defined above
            EventData ed = (EventData)eventData;
            product = new SqlString(ed.Product.ToString());
            categoryFilter = new SqlString(ed.CategoryFilter.ToString());
            group = new SqlString(ed.Group.ToString());
            extraData = new SqlString(ed.ExtraData.ToString());
        }

    }

}

And add it into SQL Server with

use [testing]
go

CREATE ASSEMBLY ExtractMdxParts
   FROM 'C:\Your\Path\Here\ExtractMdxParts.dll'
   WITH PERMISSION_SET = EXTERNAL_ACCESS

GO

CREATE FUNCTION ExtractParts (@MdxString nvarchar(MAX))
   RETURNS TABLE (Product nvarchar(128),
                    CategoryFilter nvarchar(128),
                    [Group] nvarchar(128),
                    ExtraData nvarchar(MAX)
                    )
   AS EXTERNAL NAME [ExtractMdxParts].[ExtractMdxParts.UserDefinedFunctions].ExtractParts
GO

where 'testing' is the name of your database and you do all the fiddly bits to allow CLR procedures...

then you can do things like

SELECT * FROM dbo.ExtractParts(N'[a].[b].[c].&[e].&[f]')

and get a table like

Product CategoryFilter  Group   ExtraData
-----------------------------------------
a       b               c       e,f

Important parts I had to use to make it work:

  • using System.Collections; so that you can public static IEnumerable ExtractParts...
  • new SqlString(... may not be vital, but I was getting desperate with "because T-SQL and CLR types for ... do not match"
  • I didn't go the way of a signed assembly because VS wouldn't let me sign the assembly and didn't give a meaningful error message to me so I did the brute-force ALTER AUTHORIZATION ON DATABASE::testing TO [computername\accountname]. Use signing: I'm only writing this as an answer to the question; you will have more time to shout at VS without the neighbours complaining.
  • another possibly-not-required thing is [SqlFacet(MaxSize = -1)] which comes out of something written by Adam Machanic. But if he writes it, it is very probably a Good Idea.

Oh... and I used SQL Server 2014 in conjuction with .NET 4.5.2 in VS2015 Community Edition.