I need to query for each minute the total count of rows up to that minute.
The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
Won't get much simpler than this:
SELECT DISTINCT
date_trunc('minute', "when") AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
Use date_trunc(). It gives you exactly what you need. While operating with timestamptz
, be aware that the start of a "day" is defined by the current time zone setting.
Don't include id
in the query, since you want to GROUP BY
minute slices.
count()
is mostly used as plain aggregate function. Appending an OVER
clause makes it a window function. Omit PARTITION BY
in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY
. I quote the manual:
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW;
it sets the frame to be all rows from the partition start up through the current row's last peer in theORDER BY
ordering.
And that happens to be exactly what you need.
Use count(*)
rather than count(id)
. It fits your question better ("count of rows"). It is generally slightly faster than count(id)
. And, while we might assume that id
is NOT NULL
, it has not been specified in the question, so count(id)
is wrong, strictly speaking.
You can't GROUP BY
minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*)
would only see 1 row per minute this way.
You can, however, SELECT DISTINCT
, because DISTINCT
is applied after window functions.
ORDER BY 1
is just shorthand for ORDER BY date_trunc('minute', "when")
here.
1
serves as positional parameter referencing the 1st expression in the SELECT
clause.
Use to_char() if you need to beautify the result. Like this:
SELECT DISTINCT
to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY date_trunc('minute', "when");
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) sub
ORDER BY 1;
Much like the above, but:
I use a subquery to fold and count rows per minute.
This way we get distinct rows per minute in the outer query and the DISTINCT
step is not needed.
Use sum()
as window aggregate function now to add up the counts from the subquery.
I found this to be substantially faster with many rows per minute.
@GabiMe asked in a comment how to get one row for every minute
in the time frame, including those where no event occurs (no row in base table):
SELECT DISTINCT
minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, '1 min')
FROM tbl
) m(minute)
LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER BY 1;
Generate a row for every minute in the time frame between the first and the last event with generate_series()
. Combine generate_series()
with aggregate functions in one subquery.
LEFT JOIN
to all timestamps truncated to the minute and count. NULL
values (where no row exists) do not add to the running count.
With CTE:
WITH cte AS (
SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
FROM tbl
GROUP BY 1
)
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (SELECT generate_series(date_trunc('minute', min("when"))
,max(minute), '1 min') AS minute FROM cte) m
LEFT JOIN cte c USING (minute)
ORDER BY 1;
Much like the above, but:
Again, fold and count rows per minute in the first step, omits the need for later DISTINCT
.
Different than count()
, sum() can return NULL
. So I wrapped it in COALESCE to get 0 instead.
With many rows and few rows per minute, and with an index on "when"
this version with a subquery should be even faster:
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (SELECT generate_series(date_trunc('minute', min("when"))
, max("when"), '1 min') AS minute FROM tbl) m
LEFT JOIN (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) c USING (minute)
ORDER BY 1;