markw markw - 7 months ago 12
SQL Question

Conditionally Aggregate Rows

I have a table of user visits like such:

SELECT * FROM visits ORDER BY start_time;

user_id | start_time | end_time
---------+---------------------+---------------------
10 | 2016-06-01 05:45:00 | 2016-06-01 06:00:00
10 | 2016-06-01 06:05:00 | 2016-06-01 06:30:00
10 | 2016-06-01 06:10:00 | 2016-06-01 06:40:00
10 | 2016-06-02 10:00:00 | 2016-06-01 10:30:00
10 | 2016-06-03 13:00:00 | 2016-06-01 14:00:00


I want to "merge" rows where the visits overlap, or are within 10 minutes of each other, like so:

user_id | start_time | end_time
---------+---------------------+---------------------
10 | 2016-06-01 05:45:00 | 2016-06-01 06:40:00
10 | 2016-06-02 10:00:00 | 2016-06-01 10:30:00
10 | 2016-06-03 13:00:00 | 2016-06-01 14:00:00



  • The first 3 rows, are merged into 1 since the first two rows are within 5 minutes (less than 10 minutes) of each other and the third row overlaps the second

  • The last three rows are unchanged since no rows are within 10 minutes of them



I don't think this is possible using
GROUP BY
since the grouping function return value for each row would depend on multiple rows. I was thinking window functions could help, but have been struggling to build the query.

I am using AWS Redshift

Thanks for any help!

Answer

You can use window/analytic functions for this. I think this version works on your data:

select user_id, min(start_time) as start_time, max(end_time) as end_time
from (select t.*,
             sum(case when start_time > prev_end_time + interval '10' minute
                      then 1 else 0
                 end) over (partition by user_id order by start_time) as grp
      from (select t.*,
                   lag(end_time) over (partition by user_id order by start_time) as prev_end_time
            from t
           ) t
     ) t
group by grp, user_id;

This methodology does not work for all data. In particular, it assumes that there are no duplicate start times for a user. It can also get funky with complex multiple overlaps. But, for many situations, this logic does work.