user3150002 user3150002 - 2 months ago 11
SQL Question

Find a Sequence Span Based on End Points and Inconsistent Data With SQL Server

I have a table of records that represent geolocation pings from a vehicle. Due to various problems, not all records are recorded. I am trying to get beginning and ending sequences when data is not consistent. Since we know that this vehicle would not be making frequent trips into this location, and that the average arrive/depart trip is between 1 to 3 days, and that there is not more than 1 trip in a week, any arrive/depart combination where the start and end points are within 72 hours can be considered a valid check-in record. Here is an example of what the data looks like:

ID | TIME | ACTION
1 | 2016-01-01 00:00 | NULL
2 | 2016-04-01 00:00 | NULL
3 | 2016-08-01 00:01 | DEPART
4 | 2016-12-01 00:00 | NULL
5 | 2016-06-03 02:02 | ARRIVE
6 | 2016-07-03 03:15 | NULL
7 | 2016-08-03 07:29 | NULL
8 | 2016-08-03 10:00 | NULL
9 | 2016-09-03 01:25 | DEPART
10 | 2016-14-03 00:00 | NULL
11 | 2016-16-01 00:00 | NULL
12 | 2016-14-04 00:00 | DEPART
13 | 2016-01-04 00:00 | NULL
14 | 2016-01-04 00:00 | NULL
15 | 2016-14-05 16:04 | ARRIVE
16 | 2016-14-05 23:10 | NULL
17 | 2016-15-05 09:05 | NULL
18 | 2016-16-05 17:56 | DEPART


What is the best way to approach this? Some ideas I had include creating a temp table of all the arrive points, and another one of all the depart points, and then joining them and trying to find the area where the points are between 1 start and the next start, but this seems really clunky. Suggestions appreciated. Right now I am reading up on Common Table Expressions. Environment is sql server 2015

EDIT: Here's an example of the results I am trying to get:

START ID | END ID | START TIME | END TIME
5 | 9 | 2016-06-03 02:02 | 2016-09-03 01:25
15 | 18 | 2016-14-05 16:04 | 2016-16-05 17:56

Answer
DECLARE @Table AS TABLE (ID INT, TIME DATETIME, ACTION VARCHAR(10))
INSERT INTO @Table VALUES
(1,'2016-01-01 00:00',NULL)
,(2,'2016-01-04 00:00',NULL)
,(3,'2016-01-08 00:01','DEPART')
,(4,'2016-01-12 00:00',NULL)
,(5,'2016-03-06 02:02','ARRIVE')
,(6,'2016-03-07 03:15',NULL)
,(7,'2016-03-08 07:29',NULL)
,(8,'2016-03-08 10:00',NULL)
,(9,'2016-03-09 15:25','DEPART')
,(10,'2016-03-14 00:00',NULL)
,(11,'2016-01-16 00:00',NULL)
,(12,'2016-04-14 00:00','DEPART')
,(13,'2016-04-01 00:00',NULL)
,(14,'2016-04-01 00:00',NULL)
,(15,'2016-05-14 16:04','ARRIVE')
,(16,'2016-05-01 23:10',NULL)
,(17,'2016-05-01 09:05',NULL)
,(18,'2016-05-16 17:56','DEPART')
,(19,'2016-07-14 19:04',NULL)
,(20,'2016-07-14 20:04','ARRIVE')
,(21,'2016-07-16 08:04','ARRIVE')

;WITH cte AS (
    SELECT
       *
       ,LAG(CASE WHEN Action = 'ARRIVE' THEN ID END) OVER (ORDER BY Time) as ArriveId
       ,LAG(CASE WHEN Action = 'ARRIVE' THEN Time END) OVER (ORDER BY Time) as ArriveTime
       ,DATEDIFF(hour,LAG(CASE WHEN Action = 'ARRIVE' THEN Time END) OVER (ORDER BY Time),Time) as HoursDifferent
    FROM
       @Table
    WHERE
       Action IS NOT NULL
)

SELECT
    ArriveId as [Start ID]
    ,Id as [End ID]
    ,ArriveTime as [Start Time]
    ,Time as [End Time]
FROM
    cte
WHERE
     HoursDifferent < 72

Results

Start ID    End ID  Start Time                  End Time
15          18      2016-05-14 16:04:00.000     2016-05-16 17:56:00.000
20          21      2016-07-14 20:04:00.000     2016-07-16 08:04:00.000

5 & 9 are actually 85 hours apart so they shouldn't be in your dataset due to the 72 hour restriction.

Note I added case 20 & 21 where 2 arrives are next to each other this query will treat that as a trip but you can ignore the first arrive. To continue that arrive till the next depart will change up methodology but it is possible too.