Ryan Barker Ryan Barker - 2 months ago 12
SQL Question

Oracle SQL Trending MTD Data

I am trying to solve a trending problem at work very similar to the below example. I think I have a method but don't know how to do it in SQL.

The input data is:

MTD LOC_ID RAINED
1-Apr-16 1 Y
1-Apr-16 2 N
1-May-16 1 N
1-May-16 2 N
1-Jun-16 1 N
1-Jun-16 2 N
1-Jul-16 1 Y
1-Jul-16 2 N
1-Aug-16 1 N
1-Aug-16 2 Y


The desired output is:

MTD LOC_ID RAINED TRENDS
1-Apr-16 1 Y New
1-May-16 1 N No Rain
1-Jun-16 1 N No Rain
1-Jul-16 1 Y Carryover
1-Aug-16 1 N No Rain
1-Apr-16 2 N No Rain
1-May-16 2 N No Rain
1-Jun-16 2 N No Rain
1-Jul-16 2 N No Rain
1-Aug-16 2 Y New


I'm trying to produce the output from the input by trending on MTD without depending on it. This way, when new months are added to the input, the output changes without editing the query.

The logic for TRENDS will occur on each unique LOC_ID. Trends will have three values: "New" in the first month RAINED is "Y", "Carryover" in any following months where RAINED is "Y", and "No Rain" in any months where RAINED is "N".

I'd like to automate this problem by introducing an intermediate step with a listagg. For example, for LOC_ID = "1":

MTD LOC_ID RAINED PREV_RAINED
1-Apr-16 1 Y (null) / 0 / (I don't care)
1-May-16 1 N Y
1-Jun-16 1 N Y;N
1-Jul-16 1 Y Y;N;N
1-Aug-16 1 N Y;N;N;Y


This way, to produce "TRENDS" in the output, I can say:

case when RAINED = 'Y' then
case when not regexp_like(PREV_RAINED, 'Y', 'i') then
'New'
else
'Carryover'
end
else
'No Rain'
end as TRENDS


My problem is that I'm not sure how to produce PREV_RAINED for each unique LOC_ID. I have a feeling it needs to combine LAG() statements and partition by LOC_ID order by MTD, but the number of lags I need to do depends on each month.

Is there an easy way to produce PREV_RAINED or a simpler way to solve my overall problem while preserving automation each month?

Thanks for reading all of this! :)

Answer

In the below SQL there are two parts.

(i) Calculating the ROWNUMBER value for rained attribute at loc_id,rained level.
(ii) Get the count at partition level loc_id,rained.

By computing the above two we can write the CASE WHEN logic to calculate the trends based on your requirement.

SELECT mtd,
       loc_id,
       rained,
       CASE WHEN rained = 'N' THEN 'No Rain'
            WHEN rained = 'Y' AND rn = 1 THEN 'New'
            ELSE 'Carry Over'    
        END AS Trends       
  FROM
        ( 
            SELECT mtd,
                   loc_id,
                   rained,                   
                   ROW_NUMBER() OVER ( PARTITION BY loc_id,rained ORDER BY mtd ) AS rn,
                   COUNT(*) OVER ( PARTITION BY loc_id,rained ) AS count_locid_rained               
              FROM INPUT
              ORDER BY loc_id,mtd,rained,rn
         ) X;
Comments