user2908901 user2908901 - 6 months ago 15
SQL Question

SQL query regex

I have TableA with ColumnA. ColumnA is VARCHAR that contains values like:

#1002#15#
#1002#16#
#1003#17#
#1003#17#16#
#1004#18#
#1004#18#3#
#1004#18#3#1155#


I need to find repeating numbers in ColumnA, .e.g:

#1002#15#1002#
#1004#18#3#1004#
#1003#17#17#


I tried number of LIKE patterns as described here:
https://technet.microsoft.com/en-us/library/ms187489(v=sql.105).aspx

to no avail so far primarily because those repeating digits can be 2-5 chars long..

Answer

Since normalizing the database is impossible in your case, you need a different approach.
One option is to use CROSS APPLY to a string splitting function that will take the content of your column and split it to rows, thus enabling you to to query the content of the column.

For this demonstration I've chosen the SplitStrings_XML from Aaron's article:

CREATE FUNCTION dbo.SplitStrings_XML
(
   @List       NVARCHAR(MAX),
   @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
   RETURN 
   (  
      SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
      FROM 
      ( 
        SELECT x = CONVERT(XML, '<i>' 
          + REPLACE(@List, @Delimiter, '</i><i>') 
          + '</i>').query('.')
      ) AS a CROSS APPLY x.nodes('i') AS y(i)
   );
GO

Now, create and populate the sample table:

DECLARE @TableA as table
(
    id int identity(1,1),
    ColumnA varchar(100)
)

INSERT INTO @TableA VALUES
('#1002#15#'),
('#1002#16#'),
('#1003#17#'),
('#1003#17#16#'),
('#1004#18#'),
('#1004#18#3#'),
('#1004#18#3#1155#'),
('#1002#15#1002#'),
('#1004#18#3#1004#'),
('#1003#17#17#')

That makes the query quite simple:

SELECT id, ColumnA
FROM @TableA 
CROSS APPLY dbo.SplitStrings_XML(ColumnA, '#')
WHERE Item IS NOT NULL
GROUP BY id, ColumnA
HAVING COUNT(Item) > COUNT(DISTINCT Item)

Results:

id          ColumnA
----------- -----------------
8           #1002#15#1002#
9           #1004#18#3#1004#
10          #1003#17#17#
Comments