Alexander Alexander - 23 days ago 5
SQL Question

Gaps and Islands with start en end date (ValidPeriod)

I have searched but can not find the solution to the following problem.

I have several price lists with a couple of million rows, and I have found many examples which can be aggregated into one row because the start and end dates of the group are consecutive (enddate:20151231 next startdate: 20160101)

But I also found many gaps, meaning the straight forward approach using min() and max() function is not applicable as possible gaps will be ignored.

The following contains a #Prices table with example records and a #Target table with the result I am shooting for:

Thanks.

My definition of a gap is when two consecutive records have more than 1 day between them.

if object_id('tempdb..#Prices', 'table') is not null
drop table #Prices
;

create table #Prices (
Product varchar(50) not null
, Value decimal(18,5) not null
, ValidFrom date not null
, ValidTo date null
)

insert into #Prices
(
Product
, Value
, ValidFrom
, ValidTo
)
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20140101'
, ValidTo = '20140606'
union all
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20140607'
, ValidTo = '20141010'
union all
select
Product = 'Island A'
, Value = 10.11
, ValidFrom = '20141011'
, ValidTo = '20141231'
union all
select
Product = 'Island A'
, Value = 11.10
, ValidFrom = '20150101'
, ValidTo = '20151231'
union all
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20160101'
, ValidTo = null
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20140101'
, ValidTo = '20140606'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20140607'
, ValidTo = '20141010'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20150101'
, ValidTo = '20151231'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20160101'
, ValidTo = null

select *
from #Prices as P
order by P.Product, P.ValidFrom
;


if object_id('tempdb..#Target', 'table') is not null
drop table #Target
;

create table #Target (
Product varchar(50) not null
, Value decimal(18,5) not null
, ValidFrom date not null
, ValidTo date null
)

insert into #Target
(
Product
, Value
, ValidFrom
, ValidTo
)
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20140101'
, ValidTo = '20141010'
union all
select
Product = 'Island A'
, Value = 10.11
, ValidFrom = '20141011'
, ValidTo = '20141231'
union all
select
Product = 'Island A'
, Value = 11.10
, ValidFrom = '20150101'
, ValidTo = '20151231'
union all
select
Product = 'Island A'
, Value = 10.10
, ValidFrom = '20160101'
, ValidTo = null
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20140101'
, ValidTo = '20141010'
union all
select
Product = 'Gap B'
, Value = 20.10
, ValidFrom = '20150101'
, ValidTo = null

select *
from #Target as P
order by P.Product, P.ValidFrom
;


EDIT
I hope the edit is the answer to your question. The records which are consecutive (max 1 day between the records) can be aggregated by taking the min(ValidFrom) and the max(ValidTo). the problem is with the gaps, these will be ignored. The result for Product 'Gap B' will then be one record.
Any hit on this record with a Date will get the value 20.10 even when the date is in the period of the Gap.

Gap B | 20.10 | 20140101 | null


Therefore I need 2 records so all the joins on the table will result in the correct value and no value in the period of the Gap

Gap B | 20.10 | 20140101 | 20141010
Gap B | 20.10 | 20151231 | null

Answer

Here is a different solution that uses a recursive cte, which I think is a little easier to understand compared to Jon's. On this amount of data it is also a lot more efficient, though you will need to test the performance yourself for a larger dataset:

;with rownum
as
(
    select row_number() over (order by Product, ValidFrom) as rn
            ,Product
            ,Value
            ,ValidFrom
            ,ValidTo
    from #Prices
)
,cte
as
(
    select rn
            ,Product
            ,Value
            ,ValidFrom
            ,ValidFrom as ValidFrom2
            ,ValidTo
    from rownum
    where rn = 1

    union all

    select r.rn
            ,r.Product
            ,r.Value

            ,r.ValidFrom
            ,case when c.Product = r.Product
                    then case when dateadd(d,1,c.ValidTo) = r.ValidFrom
                            then c.ValidFrom
                            else r.ValidFrom
                            end
                    else r.ValidFrom
                    end as ValidFrom2

            ,isnull(r.ValidTo,'29990101') as ValidTo
    from rownum r
        inner join cte c
            on(r.rn = c.rn+1)
)
select Product
        ,Value
        ,ValidFrom2 as ValidFrom
        ,nullif(max(ValidTo),'29990101') as ValidTo
from cte
group by Product
        ,Value
        ,ValidFrom2
order by Product
        ,ValidFrom2;