Felix Felix - 3 months ago 10
SQL Question

SQL grouping by datetime with a maximum difference of x minutes

I have a problem with grouping my dataset in MS SQL Server.

My table looks like

# | CustomerID | SalesDate | Turnover
---| ---------- | ------------------- | ---------
1 | 1 | 2016-08-09 12:15:00 | 22.50
2 | 1 | 2016-08-09 12:17:00 | 10.00
3 | 1 | 2016-08-09 12:58:00 | 12.00
4 | 1 | 2016-08-09 13:01:00 | 55.00
5 | 1 | 2016-08-09 23:59:00 | 10.00
6 | 1 | 2016-08-10 00:02:00 | 5.00


Now I want to group the rows where the SalesDate difference to the next row is of a maximum of 5 minutes.
So that row 1 & 2, 3 & 4 and 5 & 6 are each one group.

My approach was getting the minutes with the DATEPART() function and divide the result by 5:

(DATEPART(MINUTE, SalesDate) / 5)


For row 1 and 2 the result would be 3 and grouping here would work perfectly.
But for the other rows where there is a change in the hour or even in the day part of the SalesDate, the result cannot be used for grouping.

So this is where I'm stuck. I would really appreciate, if someone could point me in the right direction.

Answer

You want to group adjacent transactions based on the timing between them. The idea is to assign some sort of grouping identifier, and then use that for aggregation.

Here is an approach:

  • Identify group starts using lag() and date arithmetic.
  • Do a cumulative sum of the group starts to identify each group.
  • Aggregate

The query looks like this:

select customerid, min(salesdate), max(saledate), sum(turnover)
from (select t.*,
             sum(case when salesdate > dateadd(minute, 5, prev_salesdate)
                      then 1 else 0
                 end) over (partition by customerid order by salesdate) as grp
      from (select t.*,
                   lag(salesdate) over (partition by customerid order by salesdate) as prev_salesdate
            from t
           ) t
     ) t
group by customerid, grp;
Comments