nop77svk nop77svk - 6 months ago 10
SQL Question

fill in "missing" values using match_recognize

A question here, on SO, asked for resolving a rather common use case of filling in "missing" (based on defined criteria of being missing) values with non-"missing" values from "previous" (based on defined criteria of ordering) rows.

My usual approach to this problem is to


  • remap (
    decode
    /
    case
    ) the "missing" values to NULLs,

  • use the
    last_value()
    analytic function on the non-"missing"-value, with
    ignore nulls
    and the window of all preceding rows up to the current row in the defined ordering.



I.e., given a (taken from the original post) input set of rows
my_table
of ...

ORD COL1  COL2  COL3  COL4
--- ---- ---- ---- ----
1  A   0    1    5
2  B   0    4   0
3  C   2   0    0
4  D   0   0    0
5  E   3    5   0
6  F   0   3   0
7  G    0    3    1
8  A    0    1    5
9  E   3   5   0


..., an ordering of
order by ord asc
, a criterion for a value being "missing" being the value is negative or zero (
case when colX <= 0 then null else colX end
for
X
of {2,3,4}), the query of ...

select X.ord, X.col1,
nvl(last_value(case when col2 > 0 then col2 end) ignore nulls over (order by ord), col2) as col2,
nvl(last_value(case when col3 > 0 then col3 end) ignore nulls over (order by ord), col3) as col3,
nvl(last_value(case when col4 > 0 then col4 end) ignore nulls over (order by ord), col4) as col4
from my_table X
order by ord;


... would yield the desired result of ...

ORD COL1 COL2 COL3 COL4
--- ---- ---- ---- ----
1 A 0 1 5
2 B 0 4 5
3 C 2 4 5
4 D 2 4 5
5 E 3 5 5
6 F 3 3 5
7 G 3 3 1
8 A 3 1 5
9 E 3 5 5


Lukas Eder proposed a beautiful alternative solution with the
model
clause, working with the fact that the
model
operation in his case works inductively (based on my observation of the
SQL MODEL ORDERED
operation in his query's execution plan), from the first row to the last row in the desired ordering, resulting in the row
n
having the "missing" values already filled in when the row
n+1
is being processed.

One of Lukas's comments also mentioned the possibility of using the
match_recognize
clause of Oracle 12c. My attempts to get this (completely new to me) clause understood and working failed badly. So, ...

A bounty offer! :-)

I'm offering a small bounty to the most elegant
match_recognize
-based solution to the above problem. All preprocessings (views, inline views,
with
clauses, ...) of the data with
pivot
,
unpivot
, self-joins,
model
, analytics, aggregations, PL/SQL, whatever, are prohibited. Only standard scalar SQL functions are allowed. I'm interested in pure
match_recognize
working on the base
my_table
data source.

Answer

I don't think you can accomplish the result using pure MATCH_RECOGNIZE clause. Simply because the values col2, col3, col4 are independent on each other but the pattern is evaluated per entire row. So if multiple symbols can be matched (multiple columns have zero or non-zero values in your case) only the first one is matched as classifier and the others are ignored.

For single column there is simple solution:

SELECT
    ORD, COL1, COL2R COL2
FROM
    my_table
MATCH_RECOGNIZE (
     ORDER BY ORD
     MEASURES
        NVL(LAST(V2.COL2), 0) AS COL2R
     ALL ROWS PER MATCH
     PATTERN ((V20*V2+V20*)+)
     DEFINE
        V2  AS  V2.COL2 > 0,
        V20 AS V20.COL2 = 0
     )

Anyway, analytic functions are definitely the best option in your case. Both MODEL and MATCH_RECOGNIZE clauses are meant to solve different problems where analytic functions will not help and although there are well optimized both are more heavy weight than analytic functions.