AR7 AR7 - 4 months ago 12
SQL Question

PostgreSQL getting daily, weekly, and monthly averages of the occurrences of an event in one query

Currently I have this rather large query that works by


  1. Aggregating the daily, weekly, monthly counts into intermediate tables by taking the
    count()
    of an event grouped by the event name and the date.

  2. Selecting the avg count over each intermediate table by doing
    avg()
    group by just event, doing a union of the results, and because I want to have a separate column for daily, weekly, monthly, putting a filler value of 0 into empty columns.

  3. I then sum over all the columns, and the 0s basically act as a no-op, which gives me just a single value for each event.






The query is pretty large though, and I feel like I'm doing a lot of repetitive work. Is there any way to do this query better or make it smaller? I haven't really done queries like this before so I'm not quite sure.

WITH monthly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('month', created_at)
),
weekly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('week', created_at)
),
daily_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('day', created_at)
),
query as (
SELECT
event,
0 as daily_avg,
0 as weekly_avg,
avg(count) as monthly_avg
FROM monthly_counts
GROUP BY event
UNION
SELECT
event,
0 as daily_avg,
avg(count) as weekly_avg,
0 as monthly_avg
FROM weekly_counts
GROUP BY event
UNION
SELECT
event,
avg(count) as daily_avg,
0 as weekly_avg,
0 as monthly_avg
FROM daily_counts
GROUP BY event
)
SELECT
event,
sum(daily_avg) as daily_avg,
sum(weekly_avg) as weekly_avg,
sum(monthly_avg) as monthly_avg
FROM query
GROUP BY event;

Answer

I'd write the query in a way like this:

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
join (
    select event, avg(count) weekly_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('week', created_at)
    ) s
    group by 1
) weekly using(event)
join (
    select event, avg(count) daily_avg
    from (
        select event, count(*)
        from tracking_stuff
        where event in ('thing1', 'thing2', 'thing3')
        group by event, date_trunc('day', created_at)
    ) s
    group by 1
) daily using(event)
order by 1;

If the where condition eliminates a significant portion of the data (say more than a half) the use of cte could slightly speed up the query execution:

with the_data as (
    select event, created_at
    from tracking_stuff
    where event in ('thing1', 'thing2', 'thing3')
    )

select event, daily_avg, weekly_avg, monthly_avg
from (
    select event, avg(count) monthly_avg
    from (
        select event, count(*)
        from the_data
        group by event, date_trunc('month', created_at)
    ) s
    group by 1
) monthly
--  etc ... 
Comments