Jeff Martinez Jeff Martinez - 1 month ago 14
SQL Question

SQL 2012 Compare dates across several rows

I have an issue where I need to compare the dates of several rows. The requirement is the data needs to be grouped by 'Region/Area' combination with lowest 'StartDate' and highest 'EndDate' UNLESS there is a gap of more than 1 day between the previous 'EndDate' and the next 'StartDate'.

The 'StartDate' will always be the first of the month and the 'EndDate' will always the last day of the month.

Given a simplified table as such:

Region | Area | StartDate | EndDate
-------|------|---------------|-------------
A | 1 | 01/01/2016 | 03/31/2016
A | 1 | 04/01/2016 | 05/31/2016
A | 1 | 07/01/2016 | 09/30/2016
A | 1 | 10/01/2016 | 01/31/2017
A | 1 | 02/01/2017 | 12/31/2017
B | 2 | 01/01/2016 | 04/30/2016
B | 2 | 05/01/2016 | 09/30/2016
A | 4 | 01/01/2016 | 05/31/2016
A | 4 | 06/01/2016 | 12/31/2016


I need the results to look something like this:

Region | Area | StartDate | EndDate
-------|------|--------------|-----------
A | 1 | 01/01/2016 | 05/31/2016
A | 1 | 07/01/2016 | 12/31/2017
B | 2 | 01/01/2016 | 09/30/2016
A | 4 | 01/01/2016 | 12/31/2016


I have tried GROUP BY with MIN and MAX dates but I cannot seem to get the logic of it correct.

Any thoughts or suggestions would be greatly appreciated.

Answer

This seems like a data island problem. You can make use of the windowing functions introduced in SQL Server 2012. Using the LAG windowing function, you can determine if your last records end date time has a gap greater than a day with the current records start datetime. Next you can use the SUM OVER clause to generate a grouping id for each of your data islands.

DECLARE @SourceData TABLE
(
     Region         NVARCHAR(10)
    ,Area           INT
    ,StartDate      DATETIME
    ,EndDate        DATETIME
);

INSERT INTO @SourceData
VALUES
('A', 1, '01/01/2016', '03/31/2016'),
('A', 1, '04/01/2016', '05/31/2016'),
('A', 1, '07/01/2016', '09/30/2016'),
('A', 1, '10/01/2016', '01/31/2017'),
('A', 1, '02/01/2017', '12/31/2017'),
('B', 2, '01/01/2016', '04/30/2016'),
('B', 2, '05/01/2016', '09/30/2016'),
('A', 4, '01/01/2016', '05/31/2016'),
('A', 4, '06/01/2016', '12/31/2016');

;WITH CTE_DataIslands  -- First CTE determine the start of each new data island
AS
(
    SELECT           Region
                    ,Area
                    ,StartDate
                    ,EndDate
                    ,(
                        CASE
                            WHEN DATEADD(DAY, 1, LAG(EndDate, 1) OVER  (PARTITION BY Region, Area ORDER BY StartDate ASC)) < (StartDate) THEN 1 -- If prev record's end date + 1 day  is not equal to current record's start date then it is the start of a new data island.
                            ELSE 0
                        END
                     ) AS [IsNewDataIsland]
    FROM            @SourceData 
)
, CTE_GenerateGroupingID
AS
(
    SELECT  Region
            ,Area
            ,StartDate
            ,EndDate
            ,SUM([IsNewDataIsland]) OVER (PARTITION BY Region, Area ORDER BY StartDate ASC ROWS UNBOUNDED PRECEDING) AS GroupingID  -- Create a running total of the IsNewDataIsland column this will create a grouping id we can now group on
    FROM    CTE_DataIslands
)
SELECT      Region  
            ,Area
            ,MIN(StartDate) AS StartDate
            ,MAX(EndDate) AS StartDate
FROM        CTE_GenerateGroupingID
GROUP BY    Region, Area, GroupingID