rffs rffs - 6 months ago 18
SQL Question

SQL Server - Insert lines with null values when month doesn't exist

I have a table like this one:

Yr | Mnth | W_ID | X_ID | Y_ID | Z_ID | Purchases | Sales | Returns |
2015 | 10 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2015 | 12 | 1 | 5210 | 1402 | 2 | 12000.00 | etc | etc |
2016 | 1 | 1 | 5210 | 1402 | 2 | 1000.00 | etc | etc |
2016 | 3 | 1 | 5210 | 1402 | 2 | etc | etc | etc |
2014 | 3 | 9 | 880 | 2 | 7 | etc | etc | etc |
2014 | 12 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 5 | 9 | 880 | 2 | 7 | etc | etc | etc |
2015 | 7 | 9 | 880 | 2 | 7 | etc | etc | etc |


For each combination of (W, X, Y, Z) I would like to insert the months that don't appear in the table and are between the first and last month.

In this example, for combination (W=1, X=5210, Y=1402, Z=2), I would like to have additional rows for 2015/11 and 2016/02, where Purchases, Sales and Returns are NULL. For combination (W=9, X=880, Y=2, Z=7) I would like to have additional rows for months between 2014/4 and 2014/11, 2015/01 and 2015/04, 2016/06.

I hope I have explained myself correctly.
Thank you in advance for any help you can provide.

Answer

The process is rather cumbersome in this case, but quite possible. One method uses a recursive CTE. Another uses a numbers table. I'm going to use the latter.

The idea is:

  • Find the minimum and maximum values for the year/month combination for each set of ids. For this, the values will be turned into months since time 0 using the formula year*12 + month.
  • Generate a bunch of numbers.
  • Generate all rows between the two values for each combination of ids.
  • For each generated row, use arithmetic to re-extract the year and month.
  • Use left join to bring in the original data.

The query looks like:

with n as (
      select row_number() over (order by (select null)) - 1 as n -- start at 0
      from master.spt_values
     ),
     minmax as (
      select w_id, x_id, y_id, z_id, min(yr*12 + mnth) as minyyyymm,
             max(yr*12 + mnth) as maxyyyymm
      from t
      group by w_id, x_id, y_id, z_id
     ),
     wxyz as (
      select minmax.*, minmax.minyyyymm + n.n,
             (minmax.minyyyymm + n.n) / 12 as yyyy,
             ((minmax.minyyyymm + n.n) % 12) + 1 as mm
      from minmax join
           n
           on minmax.minyyyymm + n.n <= minmax.maxyyyymm
     )
select wxyz.yyyy, wxyz.mm, wxyz.w_id, wxyz.x_id, wxyz.y_id, wxyz.z_id, 
       <columns from t here>
from wxyz left join
     t
     on wxyz.w_id = t.w_id and wxyz.x_id = t.x_id and wxyz.y_id = t.y_id and
        wxyz.z_id = t.z_id and wxyz.yyyy = t.yr and wxyz.mm = t.mnth;