dgills dgills - 3 months ago 24
C# Question

CLR function to parse MDX string

I'm looking to be able to take an MDX string like this:

[Event Product].[Event Category Filter].[Category Group].&[E].&[F].&[G].&[H]


And to use regex to individually parse out the first, second and third strings between the square brackets.

So, ideally three custom regex expressions that would return:

Event Product

Event Category Filter

Category Group


The fourth grouping and onward can be optional (although the first three will always be present). It would be great to have a fourth expression capable of returning every subsequent match after in a delimited one-liner, so something like:

E,F,G,H


I've captured the first group:

((?<=\[)(.*?)(?=\]))


However I'm struggling with the nth exclusions piece.

Would anyone be able to lend a hand here? :)

Edit:

Should've mentioned this upfront, we're using regex assembly script and doing this in vanilla T-SQL.

https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/

A C# solution is what I'd ultimately want to use, but a pure regex solution for the time being would work best. I work mostly with a non-dot net stack so we don't have a full tool kit at our disposal as far as our ETL goes, unfortunately.

Answer

If you create a C# class like this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;

namespace ExtractMdxParts
{
    public partial class UserDefinedFunctions
    {
        public class EventData
        {
            public SqlString Product;
            public SqlString CategoryFilter;
            public SqlString Group;
            public SqlString ExtraData;
        }

        [Microsoft.SqlServer.Server.SqlFunction(
        FillRowMethodName = "FillRow",
        TableDefinition = "Product nvarchar(128), CategoryFilter nvarchar(128), Group nvarchar(128), Extradata nvarchar(MAX)",
        IsDeterministic = true)]

        public static IEnumerable ExtractParts([SqlFacet(MaxSize = -1)] String MdxString)
        {
            string[] parts = MdxString.Split(".".ToCharArray(), 4, StringSplitOptions.None);
            if (parts.Length < 3)
            {
                return null;
            }

            List<EventData> x = new List<EventData> { };
            char[] trimChars = "[]".ToCharArray();
            EventData y = new EventData { Product = parts[0].Trim(trimChars), CategoryFilter = parts[1].Trim(trimChars), Group = parts[2].Trim(trimChars) };

            if (parts.Length == 4)
            {
                y.ExtraData = string.Join(",", parts[3].Split(".".ToCharArray()).Select(p => p.Substring(1).Trim(trimChars)));
            }

            x.Add(y);

            return x;

        }

        public static void FillRow(object eventData, out SqlString product, out SqlString categoryFilter, out SqlString group, out SqlString extraData)
        {
            //I'm using here the EventData class defined above
            EventData ed = (EventData)eventData;
            product = new SqlString(ed.Product.ToString());
            categoryFilter = new SqlString(ed.CategoryFilter.ToString());
            group = new SqlString(ed.Group.ToString());
            extraData = new SqlString(ed.ExtraData.ToString());
        }

    }

}

And add it into SQL Server with

use [testing]
go

CREATE ASSEMBLY ExtractMdxParts
   FROM 'C:\Your\Path\Here\ExtractMdxParts.dll'
   WITH PERMISSION_SET = EXTERNAL_ACCESS

GO

CREATE FUNCTION ExtractParts (@MdxString nvarchar(MAX))
   RETURNS TABLE (Product nvarchar(128),
                    CategoryFilter nvarchar(128),
                    [Group] nvarchar(128),
                    ExtraData nvarchar(MAX)
                    )
   AS EXTERNAL NAME [ExtractMdxParts].[ExtractMdxParts.UserDefinedFunctions].ExtractParts
GO

where 'testing' is the name of your database and you do all the fiddly bits to allow CLR procedures...

then you can do things like

SELECT * FROM dbo.ExtractParts(N'[a].[b].[c].&[e].&[f]')

and get a table like

Product CategoryFilter  Group   ExtraData
-----------------------------------------
a       b               c       e,f

Important parts I had to use to make it work:

  • using System.Collections; so that you can public static IEnumerable ExtractParts...
  • new SqlString(... may not be vital, but I was getting desperate with "because T-SQL and CLR types for ... do not match"
  • I didn't go the way of a signed assembly because VS wouldn't let me sign the assembly and didn't give a meaningful error message to me so I did the brute-force ALTER AUTHORIZATION ON DATABASE::testing TO [computername\accountname]. Use signing: I'm only writing this as an answer to the question; you will have more time to shout at VS without the neighbours complaining.
  • another possibly-not-required thing is [SqlFacet(MaxSize = -1)] which comes out of something written by Adam Machanic. But if he writes it, it is very probably a Good Idea.

Oh... and I used SQL Server 2014 in conjuction with .NET 4.5.2 in VS2015 Community Edition.

I tested it as working with the following data:

[].[bhgshsfhsf].[chhhhhhhhhhhhhhhhhhhh].&[qwert].&[asd].&[tyu].&[].&vbncmzxvb] [a134141].[bhgshsfhsf].[chhhhhhhhhhhhhhhhhhhh].&[qwert].&[asd].&[tyu].&[].&vbncmzxvb] [a134141].[bhgshsfhsf].[chhhhhhhhhhhhhhhhhhhh] [a].[b].[c]

It returns NULL for the ExtraData column if there is no extra data.

It fails if NULL is passed to it, and gives an empty string for every column if an empty string is passed to it. This may or may not be a problem to you.

Reference: Introduction to SQL Server CLR table valued functions