Jan Stankiewicz Jan Stankiewicz - 1 year ago 69
SQL Question

Join with multiple tables where at least 1 condition is met

Let's say I've got a main table 'Z' with sales, that contains Item_ID, Organisation_ID, Salesman_ID, Area_ID

I need to list items that were sold by one of Salesmen from table A or by Organisation in table B, Or in Area from table C. Area, Organisation and Salesmen are independent from each other.

I was able to create a query using SUBSELECTs in WHERE clause:

Select Z.Item_ID From Z WHERE
Z.Salesman_ID IN (SELECT Salesman_ID From A) OR
Z.Organisation_ID IN (SELECT Organisation_ID From B) OR
Z.Area_ID IN (SELECT Area_ID From C)

However, the performace is an issue, and I was told Joins are faster than Subselects. I can't wrap my mind around combining the joins in a way that would get the results of "OR"...

Sidenote: before posting I've read visual explaination of sql joins, and attempted to find answer to my question via Google (it's tricky to find the right answer based on very common keywords, tho).


A | B | C
S04 | O16 | A07
S12 | O01 | A08
S14 | |
S15 | |

Z table:

I01 S09 O12 A14
I02 S17 O20 A17
I03 S18 O20 A12
I04 S07 O15 A01
I05 S14 O03 A08
I06 S16 O18 A20
I07 S05 O03 A12
I08 S11 O14 A05
I09 S15 O19 A10
I10 S13 O07 A18
I11 S15 O08 A03
I12 S15 O14 A13
I13 S08 O16 A18
I14 S16 O19 A14
I15 S13 O18 A10
I16 S03 O12 A08
I17 S11 O09 A16
I18 S14 O17 A13
I19 S02 O13 A11
I20 S18 O10 A06
I21 S10 O05 A15
I22 S19 O20 A06
I23 S15 O16 A20
I24 S02 O07 A04

Expected result:


Answer Source

As with most performance related questions, it depends a bit on your table schema, indices, data, etc. With that said, often times exists and outer joins can be more efficient:

The should be the equivalent join approach you are referring to:

select z.item_id
from z
   left join a on z.salesman_id = a.salesman_id
   left join b on z.organisation_id = b.organisation_id
   left join c on z.area_id = c.area_id
where a.salesman_id is not null or
   b.organisation_id is not null or
   c.area_id is not null 

Depending on your data, you may also need to use distinct with this.

Edit: With your sample data, if your Id fields are unique per entity, then this solution using exists and union all will probably be the most efficient with an index on (z.salesman_id, z.organisation_id, z.area_id):

select item_id
from z
where exists (
    select 1
    from (  
        select salesman_id id from a union all
        select organisation_id from b union all
        select area_id from c 
    ) t
    where t.id in (z.salesman_id, z.organisation_id, z.area_id)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download