higy higy - 7 months ago 19
SQL Question

SQL Compare pairs of fields and append matches to another field

We have a large database, and have imported old, archived data from an old schema. We have created two queries from the new and old data (A and B) and want to 'flag' the fields that have a different values for review.

We have set up a third query, and have manually matched fields up like so:

CREATE VIEW vuAB_COMPARISON AS
SELECT
A.1
,B.1
,A.2
,B.2
,A.3
,B.3
...
...
...
,CASE
WHEN A.PK = B.PK
THEN 'RecordMatch'
WHEN A.PK <> B.PK
THEN 'RecordNoMatch'
END AS ComparisonCheck

FROM Table A
INNER JOIN Table B
ON A.PK = B.PK


What we'd like to do is append to the ComparisonCheck column all the individual fields that don't match. For example, if A.1<>B.1 then the ComparisonCheck column would be 'RecordMatch;Field1_NoMatch'

Similarly, if A.1 <> B.1 AND A.2 <> B.2 then the ComparisonCheck column would be 'RecordMatch;Field1_NoMatch;Field2_NoMatch' etc.

The only way I can think about approaching this is with a complicated nested case statement. However there are a few dozen fields to match so this would be tedius.

Is there a better solution? I've also thought about splitting this into two queries, but I'm not sure that get's me any closer to the solution.

Note I've simplified things a bit as in reality the join uses multiple fields, and not all records from queries A/B match, which is why the CASE statement is set up how it is.

Answer

UNPIVOT or CROSS APPLY can make this work easier.

Assume you have two tables below:

create table t1 (id int, c1 int, c2 int, c3 int);
create table t2 (id int, c1 int, c2 int, c3 int);

insert into t1 values(0,0,0,0);
insert into t1 values(1,1,1,1);
insert into t1 values(2,2,2,2);
insert into t1 values(3,3,3,3);
--insert into t1 values(4,null,3,3);
--insert into t1 values(5,null,3,3);
insert into t1 values(6,3,3,3); --t2's missing record

insert into t2 values(0,0,0,0); --match
insert into t2 values(1,1,2,1); --c2 mismatch
insert into t2 values(2,2,3,3); --c2,c3 mismatch
insert into t2 values(3,2,2,2); --all mismatch
--insert into t2 values(4,null,3,3);
--insert into t2 values(5,3,null,3);
insert into t2 values(7,2,2,2); --t1's missing record

Compare Steps:

  • Unpivot t1 and t2, transfer columns to rows by generating c_name column.
  • Full outer join from two vertical tables
    • Compare ids can figure out the missing Recoreds
    • Compare values can figure out the unmatched columns

with 
vt1 as 
  (select id, c_name , v1
   from t1
   unpivot (v1 for c_name in ([c1],[c2],[c3]) ) as t),
vt2 as 
  (select id, c_name , v2
   from t2
   unpivot (v2 for c_name in ([c1],[c2],[c3]) ) as t)
select distinct coalesce(vt1.id,vt2.id) as id
  ,case when vt1.id is null and vt2.id is not null
        then 'RecordT1Missing' 
        when vt2.id is null and not vt1.id is null
        then 'RecordT2Missing'
        else 'RecordMatch' end as RecordMatchType
  ,case when v1<>v2 then vt1.c_name + '_NoMatch'
        else '' end as ColumnMathType
from vt1
full outer join vt2
on vt1.id=vt2.id and vt1.c_name = vt2.c_name
;

SQLFiddle

Here is the output, not the format as described in the question, but there are many ways to pivot back to other readable format, that work could be left for the OP:

ID  RECORDMATCHTYPE COLUMNMATHTYPE
 0      RecordMatch 
 1      RecordMatch 
 1      RecordMatch    c2_NoMatch
 2      RecordMatch    
 2      RecordMatch    c2_NoMatch
 2      RecordMatch    c3_NoMatch
 3      RecordMatch    c1_NoMatch
 3      RecordMatch    c2_NoMatch
 3      RecordMatch    c3_NoMatch
 6  RecordT2Missing 
 7  RecordT1Missing 

NOTE:

  • All data values in my sample defined as integer, but in real scenario may have different data types, should be convert into varchar before unpivot and compare.
  • Unfortunately SqlServer unpivot can not keep null values, so I commented all the data witin null values. You could also add conditions to handle with those null values, but I think the better solution is transfer null values to some unused values, unless you have further logic to deal with the null values.

UPDATE:

With @ErikE's help, use CROSS APPLY can handle null values more easily.

with 
vt1 as 
  (select id, c_name , v1
   from t1
   CROSS APPLY (VALUES ('c1',c1),('c2',c2),('c3',c3))
            CrossApplied (c_name, v1) ),
vt2 as 
  (select id, c_name , v2
   from t2
   CROSS APPLY (VALUES ('c1',c1),('c2',c2),('c3',c3))
            CrossApplied (c_name, v2) )
select distinct coalesce(vt1.id,vt2.id) as id
  ,case when vt1.id is null and vt2.id is not null
        then 'RecordT1Missing' 
        when vt2.id is null and not vt1.id is null
        then 'RecordT2Missing'
        else 'RecordMatch' end as RecordMatchType
  ,case when v1<>v2 then vt1.c_name + '_NoMatch'
        when v1 is null and v2 is not null and vt1.id is not null then vt1.c_name + '_T1Missing'
        when v2 is null and v1 is not null and vt2.id is not null then vt1.c_name + '_T2Missing'
        else '' end as ColumnMathType
from vt1
full outer join vt2
on vt1.id=vt2.id and vt1.c_name = vt2.c_name
;

With those data have null values, will get the result below:

ID  RECORDMATCHTYPE COLUMNMATHTYPE
 0      RecordMatch     
 1      RecordMatch     
 1      RecordMatch       c2_NoMatch
 2      RecordMatch       
 2      RecordMatch       c2_NoMatch
 2      RecordMatch       c3_NoMatch
 3      RecordMatch       c1_NoMatch
 3      RecordMatch       c2_NoMatch
 3      RecordMatch       c3_NoMatch
 4      RecordMatch     
 5      RecordMatch     
 5      RecordMatch     c1_T1Missing
 5      RecordMatch     c2_T2Missing
 6  RecordT2Missing     
 7  RecordT1Missing     

It can figure out the missing data in which column and which table of ID 5.