mr.bjerre mr.bjerre - 5 months ago 22
SQL Question

Efficient way to calculate average time between row dates grouped by ID

Suppose I have a table like this:

thedate ID
2014-10-20 14:13:42.063 1
2014-10-20 14:13:43.063 1
2014-10-20 14:13:47.063 1
2014-10-20 14:12:50.063 2
2014-10-20 14:13:49.063 2
2014-10-20 14:13:54.063 2
2014-10-20 14:20:24.063 2
2014-10-20 14:13:02.063 3

To replicate a similar toybox table as in this example you can use the following code:

declare @tmp as table(thedate datetime,ID int)
insert into @tmp (thedate, ID) values
(dateadd(s,0,getdate()),1), (dateadd(s,1,getdate()),1), (dateadd(s,5,getdate()),1),
(dateadd(s,-52,getdate()),2), (dateadd(s,7,getdate()),2), (dateadd(s,12,getdate()),2),(dateadd(s,402,getdate()),2),

For each ID I want the average time between the dates. Now the database is huge (lots of ID's and dates for each ID), so it has to be very efficient. I want a result like this:

ID AvgTime (seconds)
1 2,5
2 151,333333333333

The following code does what I want, but it is way too slow:

(select top 1 avg(cast(datediff(s,(select max(thedate)
from @tmp c where ID = b.ID
and thedate < b.thedate)
,thedate) as float)) over (partition by b.ID)
from @tmp b where ID = a.ID)
from @tmp a group by ID

Does anyone know how to do this efficiently?

Answer Source

The average is the maximum minus the minimum divided by one less than the count. You can use this to write a relatively simple query:

select id,
       cast(datediff(second, min(thedate), max(thedate)) as float) / (count(*) - 1)
from @tmp
group by id;

If some of the ids only have one row, then you'll want to check for potential divide by 0:

select id,
       (case when count(*) > 1
             then cast(datediff(second, min(thedate), max(thedate)) as float) / (count(*) - 1)
        end) as AvgDiff
from @tmp
group by id;