Charles Duffy Charles Duffy - 6 months ago 7
SQL Question

Synthesizing SQL rows within a range

Using SQL server, I have a table that looks something like the following:

id | time | measurement
---+---------------------+-------------
1 | 2014-01-01T05:00:00 | 1.0
1 | 2014-01-01T06:45:00 | 2.0
1 | 2014-01-01T09:30:00 | 3.0
1 | 2014-01-01T11:00:00 | NULL
1 | 2014-02-05T03:00:00 | 1.0
1 | 2014-02-05T05:00:00 | NULL


The measurements being stored are presumed to be accurate until a new value is provided for the same id; the last measurement for a given id is the end of the sequence.

I'm interested in creating a query or view that synthesizes new data points on each hour defined by these spans if they don't exist (and the previous point was neither 0 nor NULL), thus:

id | time | measurement
---+---------------------+-------------
1 | 2014-01-01T05:00:00 | 1.0
1 | 2014-01-01T06:00:00 | 1.0
1 | 2014-01-01T06:45:00 | 2.0
1 | 2014-01-01T07:00:00 | 2.0
1 | 2014-01-01T08:00:00 | 2.0
1 | 2014-01-01T09:00:00 | 2.0
1 | 2014-01-01T09:30:00 | 3.0
1 | 2014-01-01T10:00:00 | 3.0
1 | 2014-02-05T03:00:00 | 1.0
1 | 2014-02-05T04:00:00 | 1.0


Is this feasible?

Would it be more feasible if each input row had a "duration", specifying the amount of time for which its measurement is valid? (In this case, we would be effectively unpacking a run-length encoding in SQL). [My target is SQL Server 2012, which has LEAD and LAG functions, allowing such to be easily constructed].




To provide that data in a format consumable by SQL Server:

select id, cast(stime as datetime) as [time], measurement
from
(values
(1, '2014-01-01T05:00:00', 1.0),
(1, '2014-01-01T05:00:00', 1.0),
(1, '2014-01-01T06:45:00', 2.0),
(1, '2014-01-01T09:30:00', 3.0),
(1, '2014-01-01T11:00:00', NULL),
(1, '2014-02-05T03:00:00', 1.0),
(1, '2014-02-05T05:00:00', NULL)
) t(id, stime, measurement)

Answer

Its complex, but working (for dataset you provided)

;WITH cte AS (
SELECT *
FROM (VALUES
(1, '2014-01-01T05:00:00', '1.0'),(1, '2014-01-01T06:45:00', '2.0'),
(1, '2014-01-01T09:30:00', '3.0'),(1, '2014-01-01T11:00:00', NULL),
(1, '2014-02-05T03:00:00', '1.0'),(1, '2014-02-05T05:00:00', NULL)
) as t (id, [time], measurement)
)
--Get intervals for every date
, dates AS (
SELECT MIN([time]) [min], DATEADD(hour,-1,MAX([time])) [max]
FROM cte
GROUP BY CAST([time] as date)
)
--Create table with gaps datetimes
, add_dates AS (
SELECT CAST([min] as datetime) as date_
FROM dates
UNION ALL
SELECT DATEADD(hour,1,a.date_)
FROM add_dates a
INNER JOIN dates d 
    ON a.date_ between d.[min] and d.[max]
WHERE a.date_ < d.[max]
)
--Get intervals of datetimes with ids and measurements
, res AS (
SELECT  id,
        [time],
        LEAD([time],1,NULL) OVER (ORDER BY [time])as [time1],
        measurement
FROM cte
)
--Final select
SELECT DISTINCT *
FROM (
    SELECT  r.id,
            a.date_,
            r.measurement
    FROM add_dates a
    LEFT JOIN res r
        ON a.date_ between r.time and r.time1
    WHERE measurement IS NOT NULL
    UNION ALL
    SELECT * 
    FROM cte
    WHERE measurement IS NOT NULL
) as t
ORDER BY t.date_

Output:

id  date_                   measurement
1   2014-01-01 05:00:00.000 1.0
1   2014-01-01 06:00:00.000 1.0
1   2014-01-01 06:45:00.000 2.0
1   2014-01-01 07:00:00.000 2.0
1   2014-01-01 08:00:00.000 2.0
1   2014-01-01 09:00:00.000 2.0
1   2014-01-01 09:30:00.000 3.0
1   2014-01-01 10:00:00.000 3.0
1   2014-02-05 03:00:00.000 1.0
1   2014-02-05 04:00:00.000 1.0