adam kim adam kim - 5 months ago 8
SQL Question

Join t1 on a, b, and c but only join on c if c is not null

I want some t-sql syntactically correct version of what is below (hopefully what is bellow is human readable)

select* from table1
join table2
on table1.a = table2.a
and table1.b = table2.b
and table1.c =
when table1.c IS NOT NULL THEN table2.c
when table1.c IS NULL THEN --dont join on column c, just use the other joins
END

I tried the syntax below but it doesn't seem to work (any explanation of why this is incorrect is also appreciated)

select* from table1
join table2
on table1.a = table2.a
and table1.b = table2.b
and table1.c =
when table1.c IS NOT NULL THEN table2.c
when table1.c IS NULL THEN null --this line is just guessing, but it doesnt work
end

Answer
SELECT
    T1.my_column,
    T2.my_other_column
FROM
    Table1 T1
INNER JOIN Table2 T2 ON
    T2.a = T1.a AND
    T2.b = T1.b AND
    (T2.c = T1.c OR T1.c IS NULL)

Alternatively, if T2.c can never be NULL:

SELECT
    T1.my_column,
    T2.my_other_column
FROM
    Table1 T1
INNER JOIN Table2 T2 ON
    T2.a = T1.a AND
    T2.b = T1.b AND
    T2.c = COALESCE(T1.c, T2.c)

I like the first in any case, due to readability.

EDIT: COALESCE returns the first non-null parameter pass into the function. So, if T1.c were NULL above, it would return T2.c. If T2.c is also NULL then it's going to try to evaluate NULL = NULL, which is going to be false.

You can get around this by doing COALESCE(T2.c, -1) = COALESCE(T1.c, T2.c, -1), but then SQL isn't going to be able to use indexes very well.

If you want to JOIN on them both being NULL then you can use the bit above, or just spell out each possible combination:

(
    (T2.c IS NULL AND T1.c IS NULL) OR
    T2.c = T1.c
)