Marko Mahl Marko Mahl - 2 months ago 11
SQL Question

Teradata partitioned query ... following rows dynamically

I have a table with the following columns and data. Data describes certain customer activity periods

cust_id s_date e_date
11111 01.03.2014 31.03.2014
11111 10.04.2014 30.04.2014
11111 01.05.2014 10.05.2014
11111 15.06.2014 31.07.2014
22222 01.04.2014 31.05.2014
22222 01.06.2014 30.06.2014
22222 01.07.2014 15.07.2014


And I want to write a query which gives this result:

cust_id s_date e_date
11111 01.03.2014 10.05.2014
11111 15.06.2014 31.07.2014
22222 01.04.2014 15.07.2014


The query result purpose is to "merge" rows into one row when customer IN-activity period is less than 15 days. I can handle with "1 row preceding" but if needed to merge 3 or more rows then it does not work. I run out of ideas how to write this query.

My "half" 1 row preceding query:

SELECT cust_id
, start_date as current_period_start_date
, end_date as current_period_end_date
, end_date+15 as current_period_expired_date
, coalesce(
min(current_period_expire_date)
over(partition by cust_id
order by start_date
rows between 1 preceding and 1 preceding)
, cast('1900-01-01' as date)) as previous_period_expire_date
, case
when current_period_start_date <= previous_period_expire_date
then min(current_period_start_date)
over(partition by cust_id
order by start_date
rows between 1 preceding and current row)
else current_period_start_date
end as new_current_period_start_date

FROM MY_DB.my_table
. . .


Also, is it possible to change preceding into dynamical way like this?

... over(partition by ... order by ... rows between X preceding and current row)

Answer

Gordon's answer can be modified as the basic LAG syntax is easily rewritten:

LAG(col, n) OVER (ORDER BY c) 

is the same as a

MIN(col) OVER (ORDER BY c ROWS BETWEEN n PRECEDING AND n PRECEDING)

The possible default value as a 3rd parameter can be done using a COALESCE(LAG...., default value), only the IGNORE NULLS option is a really tough one.

This results in:

SELECT cust_id, MIN(s_date) AS s_date, MAX(e_date) AS e_date
FROM (SELECT t.*, SUM(GroupStartFlag) OVER (PARTITION BY cust_id ORDER BY s_date ROWS UNBOUNDED PRECEDING) AS grpid
      FROM (SELECT cust_id, s_date, e_date,
                   (CASE WHEN s_date <= MIN(e_date) 
                                        OVER (PARTITION BY cust_id 
                                              ORDER BY s_date
                                              ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 15
                         THEN 0
                         ELSE 1
                    END) AS GroupStartFlag
            FROM  vt
           ) t
     ) t
GROUP BY cust_id, grpid;

If you don't need any additional columns (just the cust_id and the dates) you can also utilize a specific TD 13.10 table function for normalizing periods. To include the 15 day difference you might simply subtract/add 15 days:

WITH cte (cust_id, pd)
AS 
 ( SELECT cust_id, PERIOD(s_date-15, e_date) AS pd
   FROM vt
 )
SELECT cust_id,
   BEGIN(pd)+15,
   END(pd),
   cnt
FROM TABLE (TD_NORMALIZE_OVERLAP_MEET
            (NEW VARIANT_TYPE(cte.cust_id)
                ,cte.pd)
        RETURNS (cust_id INTEGER
                ,pd PERIOD(DATE)
                ,cnt INTEGER) --optional: number of rows normalized in one result row
        HASH BY cust_id
        LOCAL ORDER BY cust_id, pd
        ) AS t;

In TD 14.10 there's also a real nice syntax for normalizing periods:

SELECT cust_id, BEGIN (pd)+15, END(pd) 
FROM
 (
   SELECT NORMALIZE
      cust_id, PERIOD(s_date-15, e_date) AS pd
   FROM vt
 ) AS dt

Btw, periods are defined with an inclusive start but exclusive end (i.e. for a gapless period end of the previous period and start of the next got the same value), so you might have to change 15 to 16 to get the desired result.