Drammy Drammy - 21 days ago 7
SQL Question

Most performant way in SQL Server to condense multiple data changes into before and after values

I have a SQL Server database with some audit records showing changes to a third party database (OpenEdge). I have no control over the structure of the audit data, nor the way the third party database audits data changes. So I'm left with, for example, the following data...

source dataset

If you follow the first five rows you can see they all belong to TransId 1532102 (represents a database transaction) where the TransSeq represents a database action within a single transaction.

In the columns prefix

New
the audit changes are visible. If the value is NULL then no change to that field took place.

Looking at the data you can see that where TransId = 1532102 the PrimaryIdentifier is changed from 2 to -2 (row 1), then from -2 to 3 (row 3), then from 3 to 4 (row 4) and finally from 4 to 5 (row 5). You might also notice that when the PrimaryIdentifier changes from 3 to 4 the SecondaryIdentifier changes from 'abcd' to 'efgh' (row 4).
So these multiple changes are actually only occurring on a single source record. So with this in mind rows 1, 3, 4 & 5 can all be condensed into a single row (see below)

enter image description here

Ultimately there are only two record changes in TransId 1532102..

enter image description here

I need to translate these changes into a single UPDATE statement on a target database. In order to do this I need to ensure I have a single record showing the before and after values.

So given the source data presented here I need to produce the following data set..

required dataset

What query structures could I use to achieve this? I was thinking recursive CTEs or perhaps using Hierarchical structures?
Ultimately I need this to perform as well as possible so I wanted to pose the question here in case I hadn't considered all possible approaches.

Thoughts welcome and here's a script for the sample data

DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT @TestTable
SELECT 128, 1532102, 0, 2, 'abcd', -2, NULL, NULL, 'test data'
UNION SELECT 128, 1532102, 1, 3, 'abcd', 2, NULL, NULL, NULL
UNION SELECT 128, 1532102, 2, -2, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532102, 3, 3, 'abcd', 4, 'efgh', NULL, NULL
UNION SELECT 128, 1532102, 4, 4, 'efgh', 5, NULL, 2, NULL
UNION SELECT 128, 1532106, 0, 3, 'abcd', -3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 1, 4, 'abcd', 3, NULL, NULL, NULL
UNION SELECT 128, 1532106, 2, -3, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 0, 4, 'abcd', -4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 1, 5, 'abcd', 4, NULL, NULL, NULL
UNION SELECT 128, 1532110, 2, -4, 'abcd', 5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 0, 5, 'abcd', -5, NULL, NULL, NULL
UNION SELECT 128, 1532114, 1, 4, 'abcd', 5, NULL, 1, NULL
UNION SELECT 128, 1532114, 2, -5, 'abcd', 4, NULL, NULL, 'some more test data'

SELECT *
FROM @TestTable


EDIT:
I've actually been unable to write any queries that successfully track the identifier changes. Can anyone help - I need a query that tracks the changes in PrimaryIdentifier values and ultimately provides a single record for each tracking with start values and end values.

EDIT 2:
There's been a deleted answer that suggests the update to the key identifiers is not possible when condensed and that I should step through the changes instead. I thought it would be valuable to add my comments for further info to the question..

I need to condense the dataset because of the volume of audit records being generated; most of which are unecessary because of the way the source DBMS makes its changes. I need to reduce the dataset and I need to track key identifier changes. The update should be possible without clashing on id change during the update statement - see this example.

Answer

I assume that
1) (PrimaryIdentifier, SecondaryIdentifier) is a PK of the target table,
2) Every transacton in the audit table leaves target table in a consistent state. So the update of the PK in a single statement for every transaction using case will run OK:

declare @t table (id int primary key, old int);
insert @t(id, old) values (4,4),(5,5);
update @t set id = case id 
     when 4 then 5 
     when 5 then 4 end;
select * from @t;

The plan is 1. Condense transactions 2. Generate update sql into temp table. Then you can run all or selected items from the temp table. Every item is of the form

UPDATE myTable SET 
         PrimaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 5 
                                  WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 2 END,  
        SecondaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'efgh' 
                                   WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 'abcd' END , 
        Level= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 2 
                    WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN  Level  END , 
        Value= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'test data' 
                    WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN  Value  END
WHERE 1=2 OR (PrimaryIdentifier=2 AND SecondaryIdentifier='abcd') 
          OR (PrimaryIdentifier=3 AND SecondaryIdentifier='abcd')

The query

DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT  @TestTable
        SELECT 128, 1532102, 0,  2, 'abcd', -2, NULL,   NULL,   'test data'
UNION   SELECT 128, 1532102, 1,  3, 'abcd',  2, NULL,   NULL,   NULL
UNION   SELECT 128, 1532102, 2, -2, 'abcd',  3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532102, 3,  3, 'abcd',  4, 'efgh', NULL,   NULL
UNION   SELECT 128, 1532102, 4,  4, 'efgh',  5, NULL,   2,      NULL
UNION   SELECT 128, 1532106, 0,  3, 'abcd', -3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532106, 1,  4, 'abcd',  3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532106, 2, -3, 'abcd',  4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 0,  4, 'abcd', -4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 1,  5, 'abcd',  4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 2, -4, 'abcd',  5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 3,  5, 'abcd',  6, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 4,  6, 'abcd',  5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532114, 0,  5, 'abcd', -5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532114, 1,  4, 'abcd',  5, NULL,   1,      NULL
UNION   SELECT 128, 1532114, 2, -5, 'abcd',  4, NULL,   NULL,   'some more test data'
;
WITH root AS (
    -- Top parent updates within transactions
    SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier, 
    NewPrimaryIdentifier, 
    coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
    newLevel, NewValue
    FROM  @TestTable t
    WHERE NOT EXISTS (SELECT 1 
                   FROM  @TestTable t2 
                   WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
                       AND t2.TransSeq < t.TransSeq 
                       AND t.PrimaryIdentifier = t2.NewPrimaryIdentifier
                       AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier) 
                   )
    -- recursion to track the chain of updates
    UNION ALL
    SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
         t.NewPrimaryIdentifier,
         coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
         coalesce(root.NewLevel, t.NewLevel), coalesce(root.NewValue, t.NewValue)
    FROM root 
    JOIN @TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
                       AND root.TransSeq < t.TransSeq 
                       AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
                       AND t.SecondaryIdentifier = root.NewSecondaryIdentifier

)
,condensed as (
    -- last update in the chain
    SELECT TOP(1) WITH TIES *  
    FROM root
    ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier 
                                order by TransSeq desc)
)
-- generate sql
SELECT SyncId, TransId, sql = 'UPDATE myTable SET PrimaryIdentifier = CASE'

    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier 
             +''' THEN ' + CAST(NewPrimaryIdentifier as varchar(20))             
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') ) 
    + ' END,  SecondaryIdentifier = CASE'
    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier
             +''' THEN ''' + NewSecondaryIdentifier + ''''
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') )
    + ' END , Level= CASE'
    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier
             +''' THEN ' 
             + CASE WHEN NewLevel IS NULL THEN ' Level ' ELSE CAST(NewLevel  as varchar(20)) END 
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') )
    + ' END , Value= CASE'
    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier
             +''' THEN ' 
             + CASE WHEN NewValue IS NULL THEN ' Value ' ELSE '''' + NewValue + '''' END 
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') )
     + ' END'
     + ' WHERE 1=2'
     + (SELECT ' OR (PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
         +' AND SecondaryIdentifier=''' + rSecondaryIdentifier +''')'
    FROM condensed c2 
    WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
    FOR XML PATH('') )
INTO #UpdSql    
FROM condensed c1 
GROUP BY SyncId, TransId


SELECT * 
FROM #UpdSql
ORDER BY SyncId, TransId