systemjack systemjack - 1 year ago 72
SQL Question

Aggregate sessions with timespan by minute or other duration

I have a table with records for user activity that cover a span indicated by a start and end time. What I'd like to get is a count of users active in the system per unit of time over the course of the previous day.

The maximum session length is an hour and they don't cross hour boundaries.

I've looked at many questions that are very close to this, but none I've found deals with this specific situation.

Here's a stripped down version of the query:

with minutes AS (
-- ignore generates a day's worth of timestamps for each minute
-- it's hairy but is what I'm stuck with on redshift
select (dateadd(minute, -row_number() over (order by true), sysdate::date)) as minute
from seed_table limit 1440
sessions as (
select sid, ts_start, ts_end
from user_sessions s
where ts_end >= sysdate::date-'1 day'::interval
and ts_start < sysdate::date
select m.minute, count(distinct(s.sid))
from minutes m
left join sessions s on s.ts_end >= m.minute and s.ts_start < m.minute+'1 min'::interval
group by 1

I'm trying to avoid that nasty left join:

-> XN Nested Loop Left Join DS_BCAST_INNER (cost=6913826151.95..4727012848741.55 rows=410434560 width=166)
Join Filter: (("inner".ts_start < ("outer"."minute" + '00:01:00'::interval)) AND ("inner".ts_end >= "outer"."minute"))

I'm open to any ideas. Maybe a different table structure to represent the span. I've wondered if a start with duration might be better. I've looked at window functions but didn't see a way they could help. An alternative way to do the bucketing?

Essentially the left join creates extra rows for each session entry, one per minute spanned which can then be counted. Other ways to fill out the span?

Maybe there's a way to limit how early and late the join has to search to find all the overlapping records so it would at least be a bit more efficient.


Answer Source

You can do this much faster by counting the number of starts and stops in each minute, and then taking the cumulative sum. The result is something like this:

select minute, sum(cnt) over (order by minute)
from ((select date_trunc('minute', ts_start) as minute, count(*) as cnt
       from sessions
       group by 1
      ) union all
      (select date_trunc('minute', ts_end), - count(*)
       from sessions
       group by 1
     ) s
group by minute
order by minute;