pomber pomber - 3 years ago 57
SQL Question

Full join multiple tables on same columns

I have four tables:

╔═══╦════╗ ╔═══╦════╗ ╔═══╦════╗ ╔═══╦════╗
║ K ║ V1 ║ ║ K ║ V2 ║ ║ K ║ V3 ║ ║ K ║ V4 ║
╠═══╬════╣ ╠═══╬════╣ ╠═══╬════╣ ╠═══╬════╣
║ A ║ 1 ║ ║ B ║ 2 ║ ║ B ║ 3 ║ ║ C ║ 6 ║
║ C ║ 4 ║ ║ D ║ 7 ║ ║ C ║ 5 ║ ║ D ║ 8 ║
╚═══╩════╝ ╚═══╩════╝ ╚═══╩════╝ ╚═══╩════╝


I need to join them by the column K and select V1, V2, V3 and V4:

╔═══╦══════╦══════╦══════╦══════╗
║ K ║ V1 ║ V2 ║ V3 ║ V4 ║
╠═══╬══════╬══════╬══════╬══════╣
║ A ║ 1 ║ NULL ║ NULL ║ NULL ║
║ B ║ NULL ║ 2 ║ 3 ║ NULL ║
║ C ║ 4 ║ NULL ║ 5 ║ 6 ║
║ D ║ NULL ║ 7 ║ NULL ║ 8 ║
╚═══╩══════╩══════╩══════╩══════╝


One way to solve it is this:

DECLARE @T1 TABLE (K VARCHAR(1), V1 INT);
DECLARE @T2 TABLE (K VARCHAR(1), V2 INT);
DECLARE @T3 TABLE (K VARCHAR(1), V3 INT);
DECLARE @T4 TABLE (K VARCHAR(1), V4 INT);

INSERT INTO @T1 VALUES ('A', 1), ('C', 4);
INSERT INTO @T2 VALUES ('B', 2), ('D', 7);
INSERT INTO @T3 VALUES ('B', 3), ('C', 5);
INSERT INTO @T4 VALUES ('C', 6), ('D', 8);

SELECT COALESCE(t1.K, t2.K, t3.K, t4.K) [K], V1, V2, V3, V4
FROM @T1 t1
FULL JOIN @T2 t2 ON t2.K = t1.K
FULL JOIN @T3 t3 ON t3.K = t1.K OR t3.K = t2.K
FULL JOIN @T4 t4 ON t4.K = t1.K OR t4.K = t2.K OR t4.K = t3.K
ORDER BY 1;


But this works well only for this simple example, if I have more Ks (join conditions) and Vs, the join conditions end up being a lot more complex.

How can this query be improved?

Answer Source

There are several ways, but I think the following is the most popular:

select allk.k, t1.c1, t2.v2, t3.v3, t4.v4
from (select k from t1 union 
      select k from t2 union
      select k from t3 union
      select k from t4
     ) allk left outer join
     t1
     on allk.k = t1.k left outer join
     t2
     on allk.k = t2.k left outer join
     t3
     on allk.k = t3.k left outer join
     t4
     on allk.k = t4.k;

If you have a separate table of all the k values, you can use that instead of the allk subquery.

An easier way to write your full outer join query is to use coalesce():

SELECT COALESCE(t1.K, t2.K, t3.K, t4.K) [K], V1, V2, V3, V4
FROM @T1 t1 FULL JOIN
     @T2 t2
     ON t2.K = t1.K FULL JOIN
     @T3 t3
     ON t3.K = coalesce(t1.K, t2.K) FULL JOIN
     @T4 t4
     ON t4.K = coalesce(t1.K, t2.K, t3.k);

And the third method is the union all/group by method:

select k, max(v1) as v1, max(v2) as v2, max(v3) as v3, max(v4) as v4
from (select k, v1, NULL as v2, NULL as v3, NULL as v4
      from t1 union all
      select k, NULL as v1, v2, NULL as v3, NULL as v4
      from t2 union all
      select k, NULL as v1, NULL as v2, v3, NULL as v4
      from t3 union all
      select k, NULL as v1, NULL as v2, NULL as v3, v4
      from t4
     ) t
group by k;
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download