Sam Thornton Sam Thornton - 1 month ago 8
SQL Question

Split SQL Column into Multiple Rows by Regex Match

I'm in the middle of converting an NTEXT column into multiple records. I'm looking to split the original column by new line or json object. It's a unique scenario, to be sure, but outside of a sql environment this regex correctly matches everything I need from the original column:

({(.*)(.*\r\n)*?})|(.+\r\n)
.

If I have a record with a column that has the value:

Foo bar baz
hello world
{
foo: 'bar',
bar: 'foo'
}
{
foo: 'foo',
bar: 'bar'
}


I want to break it into multiple records:

| ID | Text |
---------------------
| 1 | Foo bar baz |

| 2 | hello world |

| 3 | { |
| | foo: 'bar' |
| | bar: 'foo' |
| | } |

| 4 | { |
| | foo: 'foo' |
| | bar: 'bar' |
| | } |


Any easy way to accomplish this? It's a SQL Express server.

Answer

With the help of a split/parse function

Declare @String varchar(max)='Foo bar baz
hello world
{
  foo: ''bar'',
  bar: ''foo''
}
{
  foo: ''foo'',
  bar: ''bar''
}'

Select ID=Row_Number() over (Order By (Select NULL))
      ,Text = B.RetVal
 From (Select RetSeq,RetVal = IIF(CharIndex('}',RetVal)>0,'{'+RetVal,RetVal) from [dbo].[udf-Str-Parse](@String,'{')) A
 Cross Apply (
               Select * from [dbo].[udf-Str-Parse](A.RetVal,IIF(CharIndex('{',A.RetVal)>0,char(1),char(10)))
             ) B
 Where B.RetVal is Not Null

Returns

ID  Text
1   Foo bar baz
2   hello world
3   {
     foo: 'bar',
     bar: 'foo'
    }

4   {
     foo: 'foo',
     bar: 'bar'
    }

The UDF if needed

CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    Select RetSeq = Row_Number() over (Order By (Select null))
          ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
    From (Select x = Cast('<x>'+ Replace(@String,@Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A 
    Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Performance On a 5,000 random sample -8K 77.8ms, -1M 79ms (+1.16), -- 91.66ms (+13.8)