007 007 - 3 months ago 11
SQL Question

Counting intersecting time intervals in T-SQL

CODE:

CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME)
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME)

INSERT INTO #Temp1 (CoachID, BusyST, BusyET)
SELECT 1,'2016-08-17 09:12:00','2016-08-17 10:11:00'
UNION
SELECT 3,'2016-08-17 09:30:00','2016-08-17 10:00:00'
UNION
SELECT 4,'2016-08-17 12:07:00','2016-08-17 13:10:00'

INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET)
SELECT 1,'2016-08-17 09:07:00','2016-08-17 11:09:00'
UNION
SELECT 2,'2016-08-17 09:11:00','2016-08-17 09:30:00'
UNION
SELECT 3,'2016-08-17 09:24:00','2016-08-17 13:08:00'
UNION
SELECT 1,'2016-08-17 11:34:00','2016-08-17 12:27:00'
UNION
SELECT 4,'2016-08-17 09:34:00','2016-08-17 13:00:00'
UNION
SELECT 5,'2016-08-17 09:10:00','2016-08-17 09:55:00'

--RESULT-SET QUERY GOES HERE

DROP TABLE #Temp1
DROP TABLE #Temp2


DESIRED OUTPUT:

CoachID CanCoachST CanCoachET NumOfCoaches
1 2016-08-17 09:12:00.000 2016-08-17 09:24:00.000 2 --(ID2 = 2,5)
1 2016-08-17 09:24:00.000 2016-08-17 09:30:00.000 3 --(ID2 = 2,3,5)
1 2016-08-17 09:30:00.000 2016-08-17 09:34:00.000 1 --(ID2 = 5)
1 2016-08-17 09:34:00.000 2016-08-17 09:55:00.000 2 --(ID2 = 4,5)
1 2016-08-17 09:55:00.000 2016-08-17 10:00:00.000 1 --(ID2 = 4)
1 2016-08-17 10:00:00.000 2016-08-17 10:11:00.000 2 --(ID2 = 3,4)
3 2016-08-17 09:30:00.000 2016-08-17 09:34:00.000 1 --(ID2 = 5)
3 2016-08-17 09:34:00.000 2016-08-17 09:55:00.000 2 --(ID2 = 4,5)
3 2016-08-17 09:55:00.000 2016-08-17 10:00:00.000 1 --(ID2 = 4)
4 2016-08-17 12:07:00.000 2016-08-17 12:27:00.000 2 --(ID2 = 1,3)
4 2016-08-17 12:27:00.000 2016-08-17 13:08:00.000 1 --(ID2 = 3)
4 2016-08-17 13:08:00.000 2016-08-17 13:10:00.000 0 --(No one is available)


GOAL:
Consider
#Temp1
as the table of team coaches (ID1) and their meeting times (ST1 = Meeting Start Time and ET1 = Meeting End Time).
Consider
#Temp2
as the table of team coaches (ID2) and their total available times (ST2 = Available Start Time and ET2 = Available End Time).

Now, the goal is to find all possible coaches from
#Temp2
who are available to coach during the meeting time of the coaches from
#Temp1
.

So for example, For the coach ID1 = 1, who is busy between 9:12 and 10:11 (data can span across multiple days, if that info matters), we have
coach ID2 = 2 and 5 that can coach between 9:12 and 9:24
, coach ID2 = 2,3, and 5 that can coach between 9:24 and 9:30
, coach ID2 = 5 that can coach between 9:30 and 9:34
, coach ID2 = 4 and 5 that can coach between 9:34 and 9:55
, coach ID2 = 4 that can coach between 9:55 and 10:00
, and coach ID2 = 3 and 4 that can coach between 10:00 and 10:11 (note how ID 3, although available in #Temp2 table between 9:24 and 13:08, it can't coach for ID1 = 1 between 9:24 and 10:00 because its also busy between 9:30 and 10:00.

My effort so far: Only dealing with breaking #Temp1's time bracket so far. Still need to figure out A) how to remove that non-busy time window from the output B) add a field/map it to right T1's CoachIDs.

;WITH ED
AS (SELECT BusyET, CoachID FROM #Temp1
UNION ALL
SELECT BusyST, CoachID FROM #Temp1
)
,Brackets
AS (SELECT MIN(BusyST) AS BusyST
,( SELECT MIN(BusyET)
FROM ED e
WHERE e.BusyET > MIN(BusyST)
) AS BusyET
FROM #Temp1 T
UNION ALL
SELECT B.BusyET
,e.BusyET
FROM Brackets B
INNER JOIN ED E ON B.BusyET < E.BusyET
WHERE NOT EXISTS (
SELECT *
FROM ED E2
WHERE E2.BusyET > B.BusyET
AND E2.BusyET < E.BusyET
)
)
SELECT *
FROM Brackets
ORDER BY BusyST;


I think I need to join on comparing ST/ET dates between two tables where IDs don't match each other. But I'm having trouble figuring out how to actually get only the meeting time window and unique count.

Updated with better schema/data-set. Also note, even though CoachID 4 is not "scheduled" to be available, he's still listed as busy for that last few minutes. And there can be scenario where no one else is available to work during that time, in which case, we can return 0 cnt record (or not return it if it's really complicated).

Again, the goal is to find count and combination of all available CoachIDs and their Available time window that can coach for the CoachIDs listed in the busy table.

Updated with more sample description matching sample data.

Answer

The query in this answer was inspired by the Packing Intervals by Itzik Ben-Gan.


At first I didn't understand the full complexity of the requirements and assumed that intervals in Table1 and Table2 don't overlap. I assumed that the same coach can't be busy and available at the same time.

It turns out that my assumption was wrong, so the first variant of the query that I'm leaving below has to be extended with preliminary step that subtracts all intervals stored in Table1 from intervals stored in Table2.

It uses the similar idea. Each start of the "available" interval is marked with +1 EventType and end of the "available" interval is marked with -1 EventType. For "busy" intervals the marks are reversed. "Busy" interval starts with -1 and ends with +1. This is done in C1_Subtract.

Then running total tells us where the "truly" available intervals are (C2_Subtract). Finally, CTE_Available leaves only "truly" available intervals.

Sample data

I added few rows to illustrate what happens if no coaches are available. I also added CoachID=9, which is not in the initial results of the first variant of the query.

CREATE TABLE #Temp1 (CoachID INT, BusyST DATETIME, BusyET DATETIME);
CREATE TABLE #Temp2 (CoachID INT, AvailableST DATETIME, AvailableET DATETIME);
-- Start time is inclusive
-- End time is exclusive

INSERT INTO #Temp1 (CoachID, BusyST, BusyET) VALUES
(1, '2016-08-17 09:12:00','2016-08-17 10:11:00'),
(3, '2016-08-17 09:30:00','2016-08-17 10:00:00'),
(4, '2016-08-17 12:07:00','2016-08-17 13:10:00'),

(6, '2016-08-17 15:00:00','2016-08-17 16:00:00'),
(9, '2016-08-17 15:00:00','2016-08-17 16:00:00');

INSERT INTO #Temp2 (CoachID, AvailableST, AvailableET) VALUES
(1,'2016-08-17 09:07:00','2016-08-17 11:09:00'),
(2,'2016-08-17 09:11:00','2016-08-17 09:30:00'),
(3,'2016-08-17 09:24:00','2016-08-17 13:08:00'),
(1,'2016-08-17 11:34:00','2016-08-17 12:27:00'),
(4,'2016-08-17 09:34:00','2016-08-17 13:00:00'),
(5,'2016-08-17 09:10:00','2016-08-17 09:55:00'),

(7,'2016-08-17 15:10:00','2016-08-17 15:20:00'),
(8,'2016-08-17 15:15:00','2016-08-17 15:25:00'),
(7,'2016-08-17 15:40:00','2016-08-17 15:55:00'),
(9,'2016-08-17 15:05:00','2016-08-17 15:07:00'),
(9,'2016-08-17 15:40:00','2016-08-17 16:55:00');

Intermediate results of CTE_Available

+---------+-------------------------+-------------------------+
| CoachID |       AvailableST       |       AvailableET       |
+---------+-------------------------+-------------------------+
|       1 | 2016-08-17 09:07:00.000 | 2016-08-17 09:12:00.000 |
|       1 | 2016-08-17 10:11:00.000 | 2016-08-17 11:09:00.000 |
|       1 | 2016-08-17 11:34:00.000 | 2016-08-17 12:27:00.000 |
|       2 | 2016-08-17 09:11:00.000 | 2016-08-17 09:30:00.000 |
|       3 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 |
|       3 | 2016-08-17 10:00:00.000 | 2016-08-17 13:08:00.000 |
|       4 | 2016-08-17 09:34:00.000 | 2016-08-17 12:07:00.000 |
|       5 | 2016-08-17 09:10:00.000 | 2016-08-17 09:55:00.000 |
|       7 | 2016-08-17 15:10:00.000 | 2016-08-17 15:20:00.000 |
|       7 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |
|       8 | 2016-08-17 15:15:00.000 | 2016-08-17 15:25:00.000 |
|       9 | 2016-08-17 16:00:00.000 | 2016-08-17 16:55:00.000 |
+---------+-------------------------+-------------------------+

Now we can use these intermediate results of CTE_Available instead of #Temp2 in the first variant of the query. See detailed explanations below the first variant of the query.

Full query

WITH
C1_Subtract
AS
(
    SELECT
        CoachID
        ,AvailableST AS ts
        ,+1 AS EventType
    FROM #Temp2

    UNION ALL

    SELECT
        CoachID
        ,AvailableET AS ts
        ,-1 AS EventType
    FROM #Temp2

    UNION ALL

    SELECT
        CoachID
        ,BusyST AS ts
        ,-1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID
        ,BusyET AS ts
        ,+1 AS EventType
    FROM #Temp1
)
,C2_Subtract AS
(
    SELECT
        C1_Subtract.*
        ,SUM(EventType)
            OVER (
            PARTITION BY CoachID
            ORDER BY ts, EventType DESC
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY CoachID
            ORDER BY ts, EventType DESC)
        AS NextTS
    FROM C1_Subtract
)
,CTE_Available
AS
(
    SELECT
        C2_Subtract.CoachID
        ,C2_Subtract.ts AS AvailableST
        ,C2_Subtract.NextTS AS AvailableET
    FROM C2_Subtract
    WHERE cnt > 0
)
,CTE_Intervals
AS
(
    SELECT
        TBusy.CoachID AS BusyCoachID
        ,TBusy.BusyST
        ,TBusy.BusyET
        ,CA.CoachID AS AvailableCoachID
        ,CA.AvailableST
        ,CA.AvailableET
        -- max of start time
        ,CASE WHEN CA.AvailableST < TBusy.BusyST
        THEN TBusy.BusyST
        ELSE CA.AvailableST 
        END AS ST
        -- min of end time
        ,CASE WHEN CA.AvailableET > TBusy.BusyET
        THEN TBusy.BusyET
        ELSE CA.AvailableET
        END AS ET
    FROM
        #Temp1 AS TBusy
        CROSS APPLY
        (
            SELECT
                TAvailable.*
            FROM
                CTE_Available AS TAvailable
            WHERE
                -- the same coach can't be available and busy
                TAvailable.CoachID <> TBusy.CoachID
                -- intervals intersect
                AND TAvailable.AvailableST < TBusy.BusyET
                AND TAvailable.AvailableET > TBusy.BusyST
        ) AS CA
)
,C1 AS
(
    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ST AS ts
        ,+1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ET AS ts
        ,-1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyST AS ts
        ,+1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyET AS ts
        ,-1 AS EventType
    FROM #Temp1
)
,C2 AS
(
    SELECT
        C1.*
        ,SUM(EventType)
            OVER (
            PARTITION BY BusyCoachID
            ORDER BY ts, EventType DESC, AvailableCoachID
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        - 1 AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY BusyCoachID 
            ORDER BY ts, EventType DESC, AvailableCoachID) 
        AS NextTS
    FROM C1
)
SELECT
    BusyCoachID AS CoachID
    ,ts AS CanCoachST
    ,NextTS AS CanCoachET
    ,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;

Final result

+---------+-------------------------+-------------------------+--------------+
| CoachID |       CanCoachST        |       CanCoachET        | NumOfCoaches |
+---------+-------------------------+-------------------------+--------------+
|       1 | 2016-08-17 09:12:00.000 | 2016-08-17 09:24:00.000 |            2 |
|       1 | 2016-08-17 09:24:00.000 | 2016-08-17 09:30:00.000 |            3 |
|       1 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 |            1 |
|       1 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 |            2 |
|       1 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 |            1 |
|       1 | 2016-08-17 10:00:00.000 | 2016-08-17 10:11:00.000 |            2 |
|       3 | 2016-08-17 09:30:00.000 | 2016-08-17 09:34:00.000 |            1 |
|       3 | 2016-08-17 09:34:00.000 | 2016-08-17 09:55:00.000 |            2 |
|       3 | 2016-08-17 09:55:00.000 | 2016-08-17 10:00:00.000 |            1 |
|       4 | 2016-08-17 12:07:00.000 | 2016-08-17 12:27:00.000 |            2 |
|       4 | 2016-08-17 12:27:00.000 | 2016-08-17 13:08:00.000 |            1 |
|       4 | 2016-08-17 13:08:00.000 | 2016-08-17 13:10:00.000 |            0 |
|       6 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 |            0 |
|       6 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 |            1 |
|       6 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 |            2 |
|       6 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 |            1 |
|       6 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 |            0 |
|       6 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |            1 |
|       6 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 |            0 |
|       9 | 2016-08-17 15:00:00.000 | 2016-08-17 15:10:00.000 |            0 |
|       9 | 2016-08-17 15:10:00.000 | 2016-08-17 15:15:00.000 |            1 |
|       9 | 2016-08-17 15:15:00.000 | 2016-08-17 15:20:00.000 |            2 |
|       9 | 2016-08-17 15:20:00.000 | 2016-08-17 15:25:00.000 |            1 |
|       9 | 2016-08-17 15:25:00.000 | 2016-08-17 15:40:00.000 |            0 |
|       9 | 2016-08-17 15:40:00.000 | 2016-08-17 15:55:00.000 |            1 |
|       9 | 2016-08-17 15:55:00.000 | 2016-08-17 16:00:00.000 |            0 |
+---------+-------------------------+-------------------------+--------------+

First variant of the query

Run the query step-by-step, CTE-by-CTE and examine intermediate results to undestand how it works.

CTE_Intervals gives us a list of available intervals that intersect with busy intervals. C1 puts both start and end times in the same column with the corresponding EventType. This will help us track when an interval starts or ends. A running total of EventType gives the count of available coaches. C1 unions busy coaches into the mix to properly count cases when no coach is available.

WITH
CTE_Intervals
AS
(
    SELECT
        TBusy.CoachID AS BusyCoachID
        ,TBusy.BusyST
        ,TBusy.BusyET
        ,CA.CoachID AS AvailableCoachID
        ,CA.AvailableST
        ,CA.AvailableET
        -- max of start time
        ,CASE WHEN CA.AvailableST < TBusy.BusyST
        THEN TBusy.BusyST
        ELSE CA.AvailableST 
        END AS ST
        -- min of end time
        ,CASE WHEN CA.AvailableET > TBusy.BusyET
        THEN TBusy.BusyET
        ELSE CA.AvailableET
        END AS ET
    FROM
        #Temp1 AS TBusy
        CROSS APPLY
        (
            SELECT
                TAvailable.*
            FROM
                #Temp2 AS TAvailable
            WHERE
                -- the same coach can't be available and busy
                TAvailable.CoachID <> TBusy.CoachID
                -- intervals intersect
                AND TAvailable.AvailableST < TBusy.BusyET
                AND TAvailable.AvailableET > TBusy.BusyST
        ) AS CA
)
,C1 AS
(
    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ST AS ts
        ,+1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        BusyCoachID
        ,AvailableCoachID
        ,ET AS ts
        ,-1 AS EventType
    FROM CTE_Intervals

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyST AS ts
        ,+1 AS EventType
    FROM #Temp1

    UNION ALL

    SELECT
        CoachID AS BusyCoachID
        ,CoachID AS AvailableCoachID
        ,BusyET AS ts
        ,-1 AS EventType
    FROM #Temp1
)
,C2 AS
(
    SELECT
        C1.*
        ,SUM(EventType)
            OVER (
            PARTITION BY BusyCoachID
            ORDER BY ts, EventType DESC, AvailableCoachID
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        - 1 AS cnt
        ,LEAD(ts) 
            OVER (
            PARTITION BY BusyCoachID 
            ORDER BY ts, EventType DESC, AvailableCoachID) 
        AS NextTS
    FROM C1
)
SELECT
    BusyCoachID AS CoachID
    ,ts AS CanCoachST
    ,NextTS AS CanCoachET
    ,cnt AS NumOfCoaches
FROM C2
WHERE ts <> NextTS
ORDER BY BusyCoachID, CanCoachST
;

DROP TABLE #Temp1;
DROP TABLE #Temp2;

Result

I've added comments for each line with IDs of available coaches that were counted.

Now I understand why my initial result was not the same as your expected result.

+---------+---------------------+---------------------+--------------+
| CoachID |       CanCoachST    |       CanCoachET    | NumOfCoaches |
+---------+---------------------+---------------------+--------------+
|       1 | 2016-08-17 09:12:00 | 2016-08-17 09:24:00 |            2 |  2,5
|       1 | 2016-08-17 09:24:00 | 2016-08-17 09:30:00 |            3 |  2,3,5
|       1 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 |            2 |  3,5
|       1 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 |            3 |  3,4,5
|       1 | 2016-08-17 09:55:00 | 2016-08-17 10:11:00 |            2 |  3,4
|       3 | 2016-08-17 09:30:00 | 2016-08-17 09:34:00 |            2 |  1,5
|       3 | 2016-08-17 09:34:00 | 2016-08-17 09:55:00 |            3 |  1,4,5
|       3 | 2016-08-17 09:55:00 | 2016-08-17 10:00:00 |            2 |  1,4
|       4 | 2016-08-17 12:07:00 | 2016-08-17 12:27:00 |            2 |  3,1
|       4 | 2016-08-17 12:27:00 | 2016-08-17 13:08:00 |            1 |  3
|       4 | 2016-08-17 13:08:00 | 2016-08-17 13:10:00 |            0 |  none
|       6 | 2016-08-17 15:00:00 | 2016-08-17 15:10:00 |            0 |  none
|       6 | 2016-08-17 15:10:00 | 2016-08-17 15:15:00 |            1 |  7
|       6 | 2016-08-17 15:15:00 | 2016-08-17 15:20:00 |            2 |  7,8
|       6 | 2016-08-17 15:20:00 | 2016-08-17 15:25:00 |            1 |  8
|       6 | 2016-08-17 15:25:00 | 2016-08-17 15:40:00 |            0 |  none
|       6 | 2016-08-17 15:40:00 | 2016-08-17 15:55:00 |            1 |  7
|       6 | 2016-08-17 15:55:00 | 2016-08-17 16:00:00 |            0 |  none
+---------+---------------------+---------------------+--------------+