Michael Karnerfors Michael Karnerfors - 1 month ago 14
SQL Question

Flattening date intervals in SQL

I have a database table where there are three columns that are essential to this question:


  • A group ID, that groups rows together

  • A start date

  • An end date



I want to make a view from this table so that overlapping date intervals, that have the same grouping ID, are flattened.

Date intervals that are not overlapping shall not be flattened.

Example:

Group ID Start End
1 2016-01-01 2017-12-31
1 2016-06-01 2020-01-01
1 2022-08-31 2030-12-31
2 2010-03-01 2017-01-01
2 2012-01-01 2013-12-31
3 2001-01-01 9999-13-31


...becomes...

Group ID Start End
1 2016-01-01 2020-01-01
1 2022-08-31 2030-12-31
2 2010-03-01 2017-01-01
3 2001-01-01 9999-12-31


Intervals that overlap may do so in any way, completely enclosed by other intervals, or they may be staggered, or they may even have the same start and/or end dates.

There are few similar ids. Commonly (> 95%) there is only one row with a particular group ID. There are about a thousand IDs that show up in two rows; a handful of IDs that exist in three rows; none that are in four rows or more.

But I need to be prepared that there may show up group IDs that exist in four or more rows.

How can I write an SQL statement that creates a view that shows the table flattened this way?

Do note that every row also has a unique ID. This does not need to be preserved in any way, but in case it helps when writing the SQL, I am letting you know.

Answer

First, find intervals that are not continuation of overlapping sequence:

select * 
from dateclap d1
where not exists(
    select * 
    from dateclap d2 
    where d2.group_id=d1.group_id and 
        d2.end_date >= d1.start_date and 
        (d2.start_date < d1.start_date or 
        (d1.start_date=d2.start_date and d2.r_id<d1.r_id)))

Last line distinguishes intervals starting at the same date/time, ordering them by unique record id (r_id).

Then for each such record we can get hierarchical selection of records with connect_by_root r_id distinguishing clamp groups. After that all we need is to get min/max for clamp group (connect_by_root r_id is id of parent record in group):

select group_id, min(start_date) as start_date, max(end_date) as end_date
from dateclap d1
start with not exists(
    select * 
    from dateclap d2 
    where d2.group_id=d1.group_id and 
        d2.end_date >= d1.start_date and 
        (d2.start_date < d1.start_date or 
        (d1.start_date=d2.start_date and d2.r_id<d1.r_id)))
connect by nocycle
    prior group_id=group_id and 
    start_date between prior start_date and prior end_date
group by group_id, connect_by_root r_id

Note nocycle here - it is a dirty trick to avoid exceptions because connection is weak and in fact tries to connect record to itself. You can refine condition after "connect by" similar to "exists" condition to avoid nocycle usage.

P.S. Table was created for tests like this:

CREATE TABLE "ANIKIN"."DATECLAP" 
(   
    "R_ID" NUMBER, 
    "GROUP_ID" NUMBER, 
    "START_DATE" DATE, 
    "END_DATE" DATE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "ANIKIN" ;

Unique key (or probably primary key) for r_id and corresponding seuqence/triggers are not something specific to tests, just populate r_id with unique values.