hammerfest hammerfest - 2 months ago 8
SQL Question

Optimize self-join Oracle SQL query with LAG/LEAD analytic functions?

We have an Oracle SQL query to identify records where the value of a table column has changed from one record to another. Relevant columns are (ID, SOME_COLUMN, FROM_DATE, TO_DATE) where the ID is not unique, and FROM_DATE and TO_DATE determine the time interval for which the the particular row for that ID was effective, i.e.

(ID1, VAL1, 01/01/2016, 03/01/2016)
(ID1, VAL2, 04/01/2016, 09/01/2016)
(ID1, VAL3, 10/01/2016, 19/01/2016)


etc.

We could implement this using the following self-join

SELECT N.ID
O.SOME_COLUMN OLD_VALUE,
N.SOME_COLUMN NEW_VALUE
FROM OUR_TABLE N, OUR_TABLE O
WHERE N.ID = O.ID
AND N.FROM_DATE - 1 = O.TO_DATE
AND N.SOME_COLUMN <> O.SOME_COLUMN


however since the table contains 100 millions of records, it quite hits the performance. Is there a more effective way to do this? Someone hinted analytic functions (e.g. LAG), but we could not figure out a working solution so far. Any ideas would be appreciated

Answer

Yes, you can use LEAD() to fetch the last value :

SELECT t.id,
       t.some_column as OLD_VALUE,
       LEAD(t.some_column) OVER(PARTITION BY t.id ORDER BY t.from_date) as NEW_VALUE
FROM YourTable t

If you want only changes, wrap it with another select and filter OLD_VALUE <> NEW_VALUE