divya.trehan573 divya.trehan573 - 1 month ago 10
SQL Question

Query in which Start_date of two tables do not match in sql

I have two tables

per_asg
and
xx_asg
, they have same no. of columns and should ideally have same no. of data.

per_asg:

person_id start_date end_date
-------------------------------------------
1 01-jan-2016 07-jan-2016
1 02-feb-2016 08-march-2016


xx_per_asg

person_id start_date end_date
-------------------------------------
1 01-jan-2016 07-jan-2016
1 02-feb-2016 08-march-2016
1 03-feb-2016 04-sep-2016


As seen on xx_per_asg there is one extra row with start_date
'03-feb-2016'
and end_date
'04-sep-2016'.
I wrote a query to fetch such rows but I am not getting the extra rows completly :

select start_date
from xx_per_asg xx_per_asg, per_asg pa
where xx_per_asg.person_id = pa.person_id
and xx_per_asg.start_date <> pa.start_date


But this will still the entire set of data

Answer

If each table may have rows that are not in the other table, and you need to find both kinds, something like this should work. Assuming neither tables has duplicate rows (for example, that would be true if they each have primary keys), then if you do a union all of the two tables, the rows that exist in both tables will be duplicates in the union. Those that are not in both tables will not be duplicated. So after the union all we can group by all columns and use a HAVING COUNT(*) = 1 condition to find the rows that are only in one table but not in the other.

A small tweak to this will also tell us which table has the "unpaired" row, for each row. In the solution, I use max(source) (because we shouldn't group by "source", so we need to use an aggregate function on it), but it's really a max() over a single value; it will just be that value.

This solution is efficient, because it does not use joins and it does not use an expensive MINUS operation (or, actually, two MINUS operations and also reading each table twice instead of once, if the assignment was to find unpaired rows from both tables, not just from one).

select max(source), person_id, start_date, end_date
from   ( select 'per_asg'    as source, person_id, start_date, end_date from per_asg
         union all
         select 'xx_per_asg' as source, person_id, start_date, end_date from xx_per_asg
       )
group by person_id, start_date, end_date
having count(*) = 1
order by person_id, start_date, end_date    --  ORDER BY is optional
;

One more thing - if you compare by start_date, you must be sure all dates are "pure dates" (with no time-of-day component); that is, the time component should be 00:00:00. Is that the case? If it isn't, all solutions will need to be adjusted (and will become less efficient, that is more time-consuming).