hammerfest hammerfest - 2 months ago 10
SQL Question

Optimize self-join Oracle SQL query with analytic functions?

We have an Oracle SQL query to identify records where the value of a table column has changed from one record to another. Relevant columns are (ID, SOME_COLUMN, FROM_DATE, TO_DATE) where the ID is not unique, and FROM_DATE and TO_DATE determine the time interval for which the the particular row for that ID was effective, i.e.

(ID1, VAL1, 01/01/2016, 03/01/2016)
(ID1, VAL2, 04/01/2016, 09/01/2016)
(ID1, VAL3, 10/01/2016, 19/01/2016)


etc.

We could implement this using the following self-join

SELECT ID
O.SOME_COLUMN OLD_VALUE,
N.SOME_COLUMN NEW_VALUE
FROM OUR_TABLE N, OUR_TABLE O
WHERE N.ID = O.ID
AND N.FROM_DATE - 1 = O.TO_DATE
AND N.SOME_COLUMN <> O.SOME_COLUMN


however since the table contains 100 millions of records, it quite hits the performance. Is there a more effective way to do this? Someone hinted analytic functions (e.g. LAG), but we could not figure out a working solution so far. Any ideas would be appreciated

Answer

I think this is the LAG() approach you were talking about.

SELECT * 
  FROM (
    SELECT ID
           N.SOME_COLUMN NEW_VALUE,
           N.FROM_DATE,
           lag(N.SOME_COLUMN) over (partition by N.ID order by FROM_DATE) OLD_VALUE,
           lag(N.TO_DATE) over (partition by N.ID order by FROM_DATE) OLD_TO_DATE,
    FROM OUR_TABLE N
) T
WHERE FROM_DATE - 1 = OLD_TO_DATE
  AND NEW_VALUE<> OLD_VALUE;
Comments