J. Doe J. Doe - 24 days ago 13
SQL Question

Postgres - Fast way to sum over rows from last day of month

I want to query a table and sum a column for all of the rows from the last day of the month.

Let's use the following table as an example:

CREATE TABLE example(dt date, value int)


(The real table has many more columns and is relatively large, and the real query is more complicated)

I have the following query:

SELECT dt, SUM(value)
FROM example
WHERE dt IN (SELECT DISTINCT
date_trunc('MONTH', generate_series('2012-01-01'::date,
'2016-12-01'::date,
interval '1 day') + INTERVAL '1 MONTH - 1 day')::date)
GROUP BY dt


It runs in about ~2 seconds on my real table.

However, if I generate the full list of end-of-month days in my range and parameterise the query like so:

SELECT dt, SUM(value)
FROM example
WHERE dt IN ('2012-01-31', ...)
GROUP BY dt


It's much quicker, ~750ms.

I would prefer not to generate the dates and pass them through to the query like that, is there a way I can do this entirely in SQL and make it as fast as the latter version?

Answer

The sub-select is needlessly complicated. It can be simplified to:

SELECT dt, SUM(value)
FROM example
WHERE dt IN (SELECT d::date
             from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') dates (d)
GROUP BY dt;  --<< the group by is necessary

Maybe that speeds up the query.

You can also try to put the date generation into a CTE:

with dates (d) as (
  SELECT t::date
  from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') t
)
SELECT dt, SUM(value)
FROM example
WHERE dt IN ( select d from dates)
GROUP BY dt;

Sometimes doing a JOIN is also more efficient:

with dates (d) as (
  SELECT t::date
  from generate_series('2012-01-01'::date, '2016-12-01'::date, interval '1 month') t
)
SELECT dt, SUM(value)
FROM example
  JOIN dates on example.dt = dates.d
GROUP BY dt;
Comments