Varoo Varoo - 8 days ago 6
SQL Question

Comparing two tables that doesn't have unique key

I need to compare two tables data and check which attributed are mismatching, tables have same table definition, but the problem is i dint have a unique key to compare. tried to use

CONCAT( CONCAT( CONCAT( table1.A,Table1.B))
=CONCAT( CONCAT( CONCAT( table2.A,Table2.B))


but still facing duplicate rows also tried NVL on few columns but didnt worked

SELECT
UT.cat,
PD.cat
FROM EM UT,EM_63 PD WHERE
NVL(UT.cat,1)=NVL(PD.cat,1) AND
NVL(UT.AT_NUMBER,1)=NVL(PD.AT_NUMBER,1)AND
NVL(UT.OFFSET,1)=NVL(PD.OFFSET,1)
and NVL(UT.PROD,1)=NVL(PD.PROD,1)
;


there are 34k records in one table 35k records in another table, but if i run the above query,the count of rows is 3 million.

Columns in table:
COUNTRY
CATEGORY
TYPE
DESCRIPTION



sample data :
table 1 :
COUNTRY CATEGORY TYPE DESCRIPTION
US C T1 In
IN A T2 OUT
B C T2 IN
Y C T1 INOUT

table 2:
COUNTRY CATEGORY TYPE DESCRIPTION
US C T2 In
IN B T2 Out
Q C T2 IN

expected output:
coulumn Matched unmatched
COUNTRY 2 1
CATEGORY 2 1
TYPE 2 1
DESCRIPTION 3 0

Answer

In the most general case (when you may have duplicate rows, and you want to see which rows exist in one table but not in the other, and ALSO which rows may exist in both tables, but the row exists 3 times in the first table but 5 times in the other):

This is a very common problem with a settled "best solution" which for some reason it seems most people are still not aware of, even though it was developed on AskTom many years ago and has been presented numerous times.

You do NOT need a join, you do not need a unique key of any kind, and you don't need to read either table more than once. The idea is to add two columns to show from which table each row comes, do a UNION ALL, then GROUP BY all the columns except the "source" columns and show the count for each table. Something like this:

select   count(t_1) as count_table_1, count(t_2) as count_table_2, col1, col2, ...
from     (
           select 'x' as t_1, null as t_2, col1, col2, ... 
             from table_1
           union all
           select null as t_1, 'x' as t_2, col1, col2, ...
             from table_2
         )
group by col1, col2, ...
having   count(t_1) != count(t_2)
;