gaffcz gaffcz - 4 months ago 8
SQL Question

REPLACE/REMOVE similar substrings from string

How to remove all the

\trxxx
substrings from following test string?

blabla\tr568\tr1136\tr1704\tr2272-\tr2840\tr3408\tr3976\tr4544\tr5112\tr5680blabla


To get
blabla-blabla
?

I've tried regex, but I failed:

select REPLACE('blabla\tr568\tr1136\tr1704\tr2272-\tr2840\tr3408\tr3976\tr4544\tr5112\tr5680blabla', '\tr[0-9]+', '')

Answer

SQL Server lacks Regex. You can use CLR integration (preferred if you have millions of records) or use recursive query. Example of recursive query with some test cases below:

WITH TestCases AS
(
    SELECT * FROM (VALUES
    ('blabla\tr568\tr1136\tr1704\tr2272-\tr2840\tr3408\tr3976\tr4544\tr5112\tr5680blabla'),
    ('\tr23SomeText\tr1'),
    ('bla99bla\tr568\tr1136\tr1704\tr2272-\tr2840\tr3408\tr3976\tr4544\tr5112\tr5680rock'),
    (''),
    (NULL)) T(Expr)
), Cte AS
(
    SELECT 1 R, ROW_NUMBER() OVER (ORDER BY Expr) Rec, CONVERT(varchar(MAX), Expr) Expr
    FROM TestCases
    UNION ALL
    SELECT R+1 R, Rec, CASE
        WHEN PATINDEX('%\tr[0-9][0-9][0-9][0-9]%', Expr)>0 THEN STUFF(Expr, PATINDEX('%\tr[0-9][0-9][0-9][0-9]%', Expr), 7, '')
        WHEN PATINDEX('%\tr[0-9][0-9][0-9]%', Expr)>0 THEN STUFF(Expr, PATINDEX('%\tr[0-9][0-9][0-9]%', Expr), 6, '')
        WHEN PATINDEX('%\tr[0-9][0-9]%', Expr)>0 THEN STUFF(Expr, PATINDEX('%\tr[0-9][0-9]%', Expr), 5, '')
        ELSE STUFF(Expr, PATINDEX('%\tr[0-9]%', Expr), 4, '') END
    FROM Cte
    WHERE PATINDEX('%\tr[0-9]%', Expr)>0
)
SELECT TOP 1 WITH TIES Expr FROM Cte
ORDER BY (ROW_NUMBER() OVER (PARTITION BY Rec ORDER BY R DESC))

It yields:

bla99bla-rock
blabla-blabla
NULL

SomeText