Reid McCamish Reid McCamish - 1 year ago 53
SQL Question

Why does this SQL query need DISTINCT?

I've written a query to filter a table based on criteria found in a master table, and then remove rows that match a third table. I'm executing the query in Access, so I can't use MINUS. It works, but I found that it returns duplicate rows for some, but not all, of the selected records. I fixed it with DISTINCT, but I don't know why it would return duplicates in the first place. It's a pretty simple query:

select distinct sq.*
(select List_to_Check.*, Master_List.SELECTION_VAR
from List_to_Check
left join Master_List
on List_to_Check.SUB_ID = Master_List.SUB_ID
where Master_List.SELECTION_VAR = 'criteria'
) as sq
left join List_to_Exclude
on sq.SUB_ID = List_to_Exclude.SUB_ID
where List_to_Exclude.SUB_ID is null

Edit: The relationships between all three tables are 1-to-1 on the SUB_ID var. Combined with using a LEFT JOIN, I would expect one line per ID.

Answer Source

I recommend breaking your query apart and checking for duplicates. My guess is that it's your data/ the sub_ID isn't very unique.

Start with you sub query since you're returning all of those columns. If you get duplicates there, your query is going to return duplicates regardless of what is in your exclusion table.

Once you have those duplicates cleared up, check the exclusion table for duplicate sub_Id.

To save time in trouble-shooting, if there are known culprits that are duplicates, you may want to limit the returned values, so you can focus on the peculiarities of those data.

I'm not sure this is a problem, but look into the logic on

on List_to_Check.SUB_ID = 
where Master_List.SELECTION_VAR = 'criteria'

Where clauses on data in the right side of a left outer join may not be returning the data you expect. Try this and see what happens:

on List_to_Check.SUB_ID = Master_List.SUB_ID
and Master_List.SELECTION_VAR = 'criteria'