beckham beckham - 10 months ago 34
SQL Question

How to adjust the dates and distinct records based on a column

REG_ID| EVENT_TYPE_CD | EVENT_DATE | PACKAGE_DESC |PRODUCT_TYPE|TERM_START_DATE|TERM_END_DATE
------|------------------|------------|-----------------|------------|---------------|----------

11156 | NEW SUBSCRIPTION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11156 | CANCELLATION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11156 | UPSELL | 23-FEB-16 | CONNECTED CARE |GOODWILL | 23-FEB-16 | 22-MAR-16
11156 | CANCELLATION | 11-MAR-16 | CONNECTED CARE |GOODWILL | 23-FEB-16 | 11-MAR-16
11156 | UPSELL | 14-MAR-16 | CONNECTED CARE |GOODWILL | 14-APR-16 | 13-APR-17
11156 | EXPIRATION | 14-APR-16 | CONNECTED CARE |GOODWILL | 14-MAR-16 | 13-APR-17
11163 | UPSELL | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11163 | CANCELLATION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 22-MAR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | CONNECTED CARE |TRIAL | 18-JAN-16 | 17-JAN-17
17215 | NEW SUBSCRIPTION | 18-JAN-16 | GUIDANCE |TRIAL | 18-JAN-16 | 17-APR-16
17215 | CANCELLATION | 22-FEB-16 | GUIDANCE |TRIAL | 18-JAN-16 | 22-FEB-16
17215 | UPSELL | 25-FEB-16 | GUIDANCE |GOODWILL | 25-FEB-16 | 24-APR-16
17215 | EXPIRATION | 25-APR-16 | GUIDANCE |GOODWILL | 25-FEB-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | REMOTE |TRIAL | 18-JAN-16 | 17-APR-16
17215 | UPSELL | 25-FEB-16 | REMOTE |GOODWILL | 25-FEB-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JUN-16 | REMOTE |PAID | 18-JUN-16 | 17-JUL-16
17215 | UPSELL | 25-JUL-16 | REMOTE |GOODWILL | 25-JUL-16 | 24-AUG-16


The output I need is that everything needs to be sorted by EVENT_DATE (Series of Events)


  1. if a 'Goodwill' EVENT_DATE follows a 'Trial' product EVENT_DATE
    then treat this as 'Trial'. If a 'Goodwill' EVENT_DATE follows a
    'Paid' product EVENT_DATE then treat this as 'Paid' and adjust the
    TERM_END_DATE (Example for REMOTE Ideal scenario when there is no
    Cancellation or EXPIRATION event_type_cd for particular PACKAGE_DESC
    in a REG_ID)

  2. If there is a Event After the Cancellation then ignore Cancellation (11163 shows up: that's the reason since there is no new
    event after the cancellation)

  3. If there are multiple Goodwill's
    following the trail we need to take the maximum TERM_END_DATE in the
    cycle needs to assigned as TERM_END_DATE (REG_ID 17215 and GUIDANCE).

  4. EXPIRATION RECORD should always be there and its term_start_date
    needs to get adjusted to the Term_start_date of the Cycle's 1st
    record.



REG_ID| EVENT_TYPE_CD | EVENT_DATE | PACKAGE_DESC |PRODUCT_TYPE|TERM_START_DATE| TERM_END_DATE
------|------------------|------------|-----------------|------------|---------------|----------

11156 | NEW SUBSCRIPTION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 13-APR-17
11156 | EXPIRATION | 14-APR-16 | CONNECTED CARE |PAID | 23-FEB-16 | 13-APR-17
11163 | UPSELL | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11163 | CANCELLATION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 22-MAR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | CONNECTED CARE |TRIAL | 18-JAN-16 | 17-JAN-17
17215 | NEW SUBSCRIPTION | 18-JAN-16 | GUIDANCE |TRIAL | 18-JAN-16 | 24-APR-16
17215 | EXPIRATION | 25-APR-16 | GUIDANCE |TRAIL | 18-JAN-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | REMOTE |TRIAL | 18-JAN-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JUN-16 | REMOTE |PAID | 18-JUN-16 | 24-AUG-16

Answer

The rules are quite extensive, and your might get better results and performance with PL/SQL code, as that can use variables while iterating over a cursor.

Still, I think the following query may do what you need:

select  reg_id, 
        event_type_cd,
        event_date,
        package_desc,
        case product_type when 'GOODWILL' then coalesce(prev_product_type, 'TRIAL')
            else product_type
        end as product_type,
        case event_type_cd when 'EXPIRATION' then first_term_start_date
            else term_start_date
        end as term_start_date,
        case next_product_type when 'GOODWILL' then next_term_end_date
            else term_end_date
        end as term_end_date
from    (select reg_id, 
                event_type_cd,
                event_date,
                package_desc,
                product_type,
                term_start_date,
                term_end_date,
                first_value(term_start_date) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as first_term_start_date,
                lead(term_end_date, 1) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as next_term_end_date,
                lag(product_type, 1) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as prev_product_type,
                lead(product_type, 1) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as next_product_type
        from    (select reg_id, 
                        event_type_cd,
                        event_date,
                        package_desc,
                        product_type,
                        term_start_date,
                        term_end_date,
                        lead(product_type, 1, '-') over (
                            partition by reg_id, package_desc 
                            order by event_date, term_end_date, event_type_cd desc) as next_product_type
                from    mytable)
                where   not (event_type_cd = 'CANCELLATION' and next_product_type <> '-')
                and     not (product_type = 'GOODWILL' and next_product_type = 'GOODWILL')
                )
where    not (product_type = 'GOODWILL' and event_type_cd <> 'EXPIRATION' and prev_product_type is not null)
order by reg_id, package_desc, event_date, term_end_date, event_type_cd desc

The query has a two-level nested sub query.

The inner-most query only serves to get the product_type of the next record within the cycle (i.e. within the same partition of reg_id and package_desc).

The middle query uses that information to eliminate:

  • 'CANCELLATION' records, unless they are the very last record of their cycle;
  • Consecutive 'GOODWILL' records, leaving only the last in place of every sequence -- this is last one is temporary, but is at this stage still needed;

The middle query also re-fetches the product_type of the next record in the cycle, as it might now have changed due to the eliminated records. Additionally, it determines:

  • the term_start_date of the first record in the cycle;
  • the term_end_date of the next record in the cycle;
  • the product_type of the previous record in the cycle;

Finally, the outer query uses this information to:

  • set the product_type to that of the previous record (or 'TRIAL' if there is no previous) if it concerns a 'GOODWILL' record;
  • set the term_start_date to the that of the first record in the cycle, if it concerns an 'EXPIRATION' record;
  • set the term_end_date to that of the next record in the cycle, if the next record concerns a 'GOODWILL' record.

The 'GOODWILL' records (before the change in the first bullet above) are excluded from the result, except when they relate to a first record in their cycle, or correspond to an 'EXPIRATION' record.

The order by clause uses the order as you mentioned in comments, with an additional event_type_cd desc to make sure "The EVENT_TYPE_CD's for Cancellation or Expiration will always follow the New Subscription or UPSELL for a particular REG_ID, PACKAGE_DESC". This is because by fortune both 'NEW SUBSCRIPTION' and 'UPSELL' come both later in the alphabetical order than 'CANCELLATION' and 'EXPIRATION', so in descending order we get them sorted right.