user3186332 user3186332 - 4 years ago 143
SQL Question

Count columns of joined table

I am writing a query to summarize the data in a Postgres database:

SELECT products.id,
products.NAME,
product_types.type_name AS product_type,
delivery_types.delivery,
products.required_selections,
Count(s.id) AS selections_count,
Sum(CASE
WHEN ss.status = 'WARNING' THEN 1
ELSE 0
END) AS warning_count
FROM products
JOIN product_types
ON product_types.id = products.product_type_id
JOIN delivery_types
ON delivery_types.id = products.delivery_type_id
LEFT JOIN selections_products sp
ON products.id = sp.product_id
LEFT JOIN selections s
ON s.id = sp.selection_id
LEFT JOIN selection_statuses ss
ON ss.id = s.selection_status_id
LEFT JOIN listings l
ON ( s.listing_id = l.id
AND l.local_date_time BETWEEN
To_timestamp('2014/12/01', 'YYYY/mm/DD'
) AND
To_timestamp('2014/12/30', 'YYYY/mm/DD') )
GROUP BY products.id,
product_types.type_name,
delivery_types.delivery


Basically we have a product with selections, these selections have listings and the listings have a
local_date
. I need a list of all products and how many listings they have between the two dates. No matter what I do, I get a count of all selections (a total). I feel like I'm overlooking something. The same concept goes for
warning_count
. Also, I don't really understand why Postgres requires me to add a
group by
here.

The schema looks like this (the parts you would care about anyway):

products
name:string
, product_type:fk
, required_selections:integer
, deliver_type:fk

selections_products
product_id:fk
, selection_id:fk

selections
selection_status_id:fk
, listing_id:fk

selection_status
status:string

listing
local_date:datetime

Answer Source

The way you have it you LEFT JOIN to all selections irregardless of listings.local_date_time.

There is room for interpretation, we would need to see actual table definitions with all constraints and data types to be sure. Going out on a limb, my educated guess is you can fix your query with the use of parentheses in the FROM clause to prioritize joins:

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , count(f) AS selections_count  -- where does "f" come from?
     , sum(CASE WHEN ss.status = 'WARNING' THEN 1 ELSE 0 END) AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (  -- LEFT JOIN!
          selections_products sp
   JOIN   selections s  ON s.id  = sp.selection_id  -- INNER JOIN!
   JOIN   listings   l  ON l.id  = s.listing_id     -- INNER JOIN!
                       AND l.local_date_time >= '2014-12-01'
                       AND l.local_date_time <  '2014-12-31'
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   ) ON sp.product_id = p.id
GROUP  BY p.id, pt.type_name, dt.delivery;

This way, you first eliminate all selections outside the given time frame with [INNER] JOIN before you LEFT JOIN to products, thus keeping all products in the result, including those that aren't in any applicable selection.

Related:

While selecting all or most products, this can be rewritten to be faster:

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , COALESCE(s.selections_count, 0) AS selections_count
     , COALESCE(s.warning_count, 0)    AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (
   SELECT sp.product_id
        , count(*) AS selections_count  -- assuming ...
        , count(*) FILTER (WHERE ss.status = 'WARNING') AS warning_count
   FROM   selections_products sp
   JOIN   selections          s  ON s.id  = sp.selection_id
   JOIN   listings            l  ON l.id  = s.listing_id
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   WHERE  l.local_date_time >= '2014-12-01'
   AND    l.local_date_time <  '2014-12-31'
   GROUP  BY 1
   ) s ON s.product_id = p.id;

It's cheaper to aggregate and count selections and warnings per product_id first, and then join to products. (Unless you only retrieve a small selection of products, then it's cheaper to reduce related rows first.)

Related:

Also, I don't really understand why Postgres requires me to add a group by here.

Since Postgres 9.1, the PK column in GROUP BY covers all columns of the same table. That does not cover columns of other tables, even if they are functionally dependent. You need to list those explicitly in GROUP BY if you don't want to aggregate them.

My second query avoids this problem on the outset by aggregating before the join.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download