tember tember - 2 months ago 6
SQL Question

Group by with left outer join, exclude nulls

I have an order table and a trip table that includes payment information. It is a many to many relationship - order can split across many trips and one trip could have payment info for a couple of orders and sometimes an order. There is no "Zero" record in the Trips table - so a left join with that value as the key will return a NULL record. I am using SQL 2012

Order table
+----+----------+--------------+
| order_id | trip_nbr | veh_id |
+----+----------+--------------+
| 1 | 12 | 3 |
| 2 | 22 | 6 |
| 2 | 0 | 8 |
| 4 | 25 | 7 |
| 7 | 0 | 11 |
+----+----------+--------------+

Trips table
+----+------------+--------------+
| trip_nbr | payment | veh_id |
+----+------------+--------------+
| 12 | 20.00 | 3 |
| 22 | 123.00 | 6 |
| 22 | 12.50 | 6 |
| 25 | 133.33 | 7 |
+----+------------+--------------+


Here is my query:

select o.order_id,
t.trip_nbr,
sum(t.payment_amt)
from orders o
left outer join trips t
on o.trip_nbr = t.trip_nbr
group by o.order_id,
t.trip_nbr


Results:

+----+----------+--------------+
| order_id | trip_nbr | sum |
+----+----------+--------------+
| 1 | 12 | 20.00 |
| 2 | 22 | 135.50 |
| 2 | NULL | NULL |
| 4 | 25 | 133.33 |
| 7 | NULL | NULL |
+----+----------+--------------+


The problem is, I get lots of information from the orders table and just the payment info from the Trips table. So I don't want to exclude any order records (which would happen if I add the clause "WHERE t.trip_nbr is NOT NULL") - but I don't want to get 2 records in my grouping - one for t.trip_nbr is NULL and one where it finds a match.

Desired results:

+----+----------+--------------+
| order_id | trip_nbr | sum |
+----+----------+--------------+
| 1 | 12 | 20.00 |
| 2 | 22 | 135.50 |
| 4 | 25 | 133.33 |
| 7 | NULL | NULL |
+----+----------+--------------+


I want the unmatched record order_id = 2 to be "summarized away" - but keep the lone record for order_id = 7. The reason is that this table is later join with another table and the extra NULL records are creating duplicates.

Answer

This should work:

WITH orders2 AS
(
    SELECT  *,
            N = SUM(CASE WHEN trip_nbr <> 0 THEN 1 ELSE 0 END) OVER(PARTITION BY order_id)
    FROM orders
)
SELECT  o.order_id,  
        t.trip_nbr,
        SUM(t.payment_amt)
FROM orders2 o
LEFT OUTER JOIN trips t
    ON o.trip_nbr = t.trip_nbr
WHERE N = 0 OR (N > 1 AND o.trp_nbr <> 0)
GROUP BY o.order_id,
         t.trip_nbr;
Comments