beckham beckham - 4 months ago 8
SQL Question

How to adjust the dates and distinct records based on a column

REG_ID| EVENT_TYPE_CD | EVENT_DATE | PACKAGE_DESC |PRODUCT_TYPE|TERM_START_DATE|TERM_END_DATE
------|------------------|------------|-----------------|------------|---------------|----------

11156 | NEW SUBSCRIPTION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11156 | CANCELLATION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11156 | UPSELL | 23-FEB-16 | CONNECTED CARE |GOODWILL | 23-FEB-16 | 22-MAR-16
11156 | CANCELLATION | 11-MAR-16 | CONNECTED CARE |GOODWILL | 23-FEB-16 | 11-MAR-16
11156 | UPSELL | 14-MAR-16 | CONNECTED CARE |GOODWILL | 14-APR-16 | 13-APR-17
11156 | EXPIRATION | 14-APR-16 | CONNECTED CARE |GOODWILL | 14-MAR-16 | 13-APR-17
11163 | UPSELL | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11163 | CANCELLATION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 22-MAR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | CONNECTED CARE |TRIAL | 18-JAN-16 | 17-JAN-17
17215 | NEW SUBSCRIPTION | 18-JAN-16 | GUIDANCE |TRIAL | 18-JAN-16 | 17-APR-16
17215 | CANCELLATION | 22-FEB-16 | GUIDANCE |TRIAL | 18-JAN-16 | 22-FEB-16
17215 | UPSELL | 25-FEB-16 | GUIDANCE |GOODWILL | 25-FEB-16 | 24-APR-16
17215 | EXPIRATION | 25-APR-16 | GUIDANCE |GOODWILL | 25-FEB-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | REMOTE |TRIAL | 18-JAN-16 | 17-APR-16
17215 | UPSELL | 25-FEB-16 | REMOTE |GOODWILL | 25-FEB-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JUN-16 | REMOTE |PAID | 18-JUN-16 | 17-JUL-16
17215 | UPSELL | 25-JUL-16 | REMOTE |GOODWILL | 25-JUL-16 | 24-AUG-16


The output I need is that everything needs to be sorted by EVENT_DATE (Series of Events)


  1. if a 'Goodwill' EVENT_DATE follows a 'Trial' product EVENT_DATE
    then treat this as 'Trial'. If a 'Goodwill' EVENT_DATE follows a
    'Paid' product EVENT_DATE then treat this as 'Paid' and adjust the
    TERM_END_DATE (Example for REMOTE Ideal scenario when there is no
    Cancellation or EXPIRATION event_type_cd for particular PACKAGE_DESC
    in a REG_ID)

  2. If there is a Event After the Cancellation then ignore Cancellation (11163 shows up: that's the reason since there is no new
    event after the cancellation)

  3. If there are multiple Goodwill's
    following the trail we need to take the maximum TERM_END_DATE in the
    cycle needs to assigned as TERM_END_DATE (REG_ID 17215 and GUIDANCE).

  4. EXPIRATION RECORD should always be there and its term_start_date
    needs to get adjusted to the Term_start_date of the Cycle's 1st
    record.



REG_ID| EVENT_TYPE_CD | EVENT_DATE | PACKAGE_DESC |PRODUCT_TYPE|TERM_START_DATE| TERM_END_DATE
------|------------------|------------|-----------------|------------|---------------|----------

11156 | NEW SUBSCRIPTION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 13-APR-17
11156 | EXPIRATION | 14-APR-16 | CONNECTED CARE |PAID | 23-FEB-16 | 13-APR-17
11163 | UPSELL | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 23-FEB-16
11163 | CANCELLATION | 23-FEB-16 | CONNECTED CARE |PAID | 23-FEB-16 | 22-MAR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | CONNECTED CARE |TRIAL | 18-JAN-16 | 17-JAN-17
17215 | NEW SUBSCRIPTION | 18-JAN-16 | GUIDANCE |TRIAL | 18-JAN-16 | 24-APR-16
17215 | EXPIRATION | 25-APR-16 | GUIDANCE |TRAIL | 18-JAN-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JAN-16 | REMOTE |TRIAL | 18-JAN-16 | 24-APR-16
17215 | NEW SUBSCRIPTION | 18-JUN-16 | REMOTE |PAID | 18-JUN-16 | 24-AUG-16

Answer

The rules are quite extensive, and your might get better results and performance with PL/SQL code, as that can use variables while iterating over a cursor.

Still, I think the following query may do what you need:

select  reg_id, 
        event_type_cd,
        event_date,
        package_desc,
        case product_type when 'GOODWILL' then coalesce(prev_product_type, 'TRIAL')
            else product_type
        end as product_type,
        case event_type_cd when 'EXPIRATION' then first_term_start_date
            else term_start_date
        end as term_start_date,
        case next_product_type when 'GOODWILL' then next_term_end_date
            else term_end_date
        end as term_end_date
from    (select reg_id, 
                event_type_cd,
                event_date,
                package_desc,
                product_type,
                term_start_date,
                term_end_date,
                first_value(term_start_date) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as first_term_start_date,
                lead(term_end_date, 1) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as next_term_end_date,
                lag(product_type, 1) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as prev_product_type,
                lead(product_type, 1) over (
                    partition by reg_id, package_desc 
                    order by event_date, term_end_date, event_type_cd desc) as next_product_type
        from    (select reg_id, 
                        event_type_cd,
                        event_date,
                        package_desc,
                        product_type,
                        term_start_date,
                        term_end_date,
                        lead(product_type, 1, '-') over (
                            partition by reg_id, package_desc 
                            order by event_date, term_end_date, event_type_cd desc) as next_product_type
                from    mytable)
                where   not (event_type_cd = 'CANCELLATION' and next_product_type <> '-')
                and     not (product_type = 'GOODWILL' and next_product_type = 'GOODWILL')
                )
where    not (product_type = 'GOODWILL' and event_type_cd <> 'EXPIRATION' and prev_product_type is not null)
order by reg_id, package_desc, event_date, term_end_date, event_type_cd desc

The query has a two-level nested sub query.

The inner-most query only serves to get the product_type of the next record within the cycle (i.e. within the same partition of reg_id and package_desc).

The middle query uses that information to eliminate:

  • 'CANCELLATION' records, unless they are the very last record of their cycle;
  • Consecutive 'GOODWILL' records, leaving only the last in place of every sequence -- this is last one is temporary, but is at this stage still needed;

The middle query also re-fetches the product_type of the next record in the cycle, as it might now have changed due to the eliminated records. Additionally, it determines:

  • the term_start_date of the first record in the cycle;
  • the term_end_date of the next record in the cycle;
  • the product_type of the previous record in the cycle;

Finally, the outer query uses this information to:

  • set the product_type to that of the previous record (or 'TRIAL' if there is no previous) if it concerns a 'GOODWILL' record;
  • set the term_start_date to the that of the first record in the cycle, if it concerns an 'EXPIRATION' record;
  • set the term_end_date to that of the next record in the cycle, if the next record concerns a 'GOODWILL' record.

The 'GOODWILL' records (before the change in the first bullet above) are excluded from the result, except when they relate to a first record in their cycle, or correspond to an 'EXPIRATION' record.

The order by clause uses the order as you mentioned in comments, with an additional event_type_cd desc to make sure "The EVENT_TYPE_CD's for Cancellation or Expiration will always follow the New Subscription or UPSELL for a particular REG_ID, PACKAGE_DESC". This is because by fortune both 'NEW SUBSCRIPTION' and 'UPSELL' come both later in the alphabetical order than 'CANCELLATION' and 'EXPIRATION', so in descending order we get them sorted right.

Comments