Seth Spearman Seth Spearman - 24 days ago 8
SQL Question

Is there a way to aggregate a variable range of dates in SQL using a SET operation

I have a table like this one....

CREATE TABLE AbsentStudents
(
Id int not null primary key identity(1,1),
StudentId int not null,
AbsentDate datetime not null
)


This is a very large table that has 1 row for each student for each day that they were absent.

I have been asked to write a stored procedure that gets student absences by date range. What makes this query tricky is that I have to filter/aggregate by "absence episodes". The number of days that constitutes an "absence episode" is a procedure parameter so it can vary.

So for example, I need to get a list of students who were absent between 1/1/2016 to 1/17/2016 but only if they were absent for more than @Days (2 or 3 or whatever the parameter dictates) days.

I think that alone I could figure out. However, within the date range a student can have more than one "absence episode". So a student might have been absent for 3 days at the beginning of the date range, 2 days in the middle of the date range, and 4 days at the end of the date range and each of those constitutes a different "absence episodes". Assuming that my @Days parameter is 2, that should return 3 rows for that student. And, each returned row should calculate how many days the student was absent for that "absence episode."

So I would like my procedure require 3 parameters (@StartDate datetime,@EndDate datetime, @Days int) and return something like this...

StudentId, InitialAbsentDate, ConsecutiveDaysMissed

And ideally it would do this using a SET operation and avoid cursors. (Although cursors are fine if that is the only option.)

UPDATE (by Shnugo)



A test scenario

DECLARE @AbsentStudents TABLE(
Id int not null primary key identity(1,1),
StudentId int not null,
AbsentDate datetime not null
);
INSERT INTO @AbsentStudents VALUES
--student 1
(1,{d'2016-10-01'}),(1,{d'2016-10-02'}),(1,{d'2016-10-03'}) --three days
,(1,{d'2016-10-05'}) --one day
,(1,{d'2016-10-07'}),(1,{d'2016-10-08'}) --two days
--student 2
,(2,{d'2016-10-01'}),(2,{d'2016-10-02'}),(2,{d'2016-10-03'}),(2,{d'2016-10-04'}) --four days
,(2,{d'2016-10-08'}),(2,{d'2016-10-09'}),(2,{d'2016-10-10'}) --three days
,(2,{d'2016-10-12'}); --one day

DECLARE @startDate DATETIME={d'2016-10-01'};
DECLARE @endDate DATETIME={d'2016-10-31'};
DECLARE @Days INT = 3;

Answer

If you just want periods of times when students are absent, you can do this with a difference of row numbers approach.

Now, the following assumes that days are sequential with no gaps and uses the difference of row numbers to get periods of absences:

select 
    student_id, 
    min(AbsentDate), 
    max(AbsentDate), 
    count(*) as number_of_days
from 
(
    select a.*,
    row_number() over (partition by student_id order by AbsentDate) as seqnum_sa
    from AbsentStudents a
) a
 group by student_id, 
    dateadd(day, - seqnum_sa, AbsentDate);

Notes:

  • You have additional requirements on minimum days and date ranges. These are easily handled with a where clause.
  • I suspect you have a hidden requirement on avoiding week ends an holidays. Neither this (nor other answers) cover this. Ask another question if this is an issue.
Comments