Lex Lex - 3 months ago 7
SQL Question

Oracle top-n query sort performance

I'm new to sql optimization and i'm trying to understand why having more than 1 item in the IN clause results in large performance hit and how can i prevent it if it is even possible. Below is more or less what i'm working with. The second query is currently what I have and I'm looking to improve the performance. In real life TABLE_1 has millions of rows and the sorting part of the plan has a cpu cost of 21M.

SELECT
TOPNWRAPPER.*,
TABLE_2.X,
TABLE_2.Y
FROM
TABLE_2,
(
SELECT
*
FROM
(
SELECT
/*+ index (TABLE_1 TABLE_1_B_E_F_ID) */
TABLE_1.ID,
TABLE_1.C,
TABLE_1.B,
TABLE_1.E,
TABLE_1.F
FROM
TABLE_1
WHERE
( TABLE_1.F IN ( ‘STATE1’ ) ) AND
( TABLE_1.B= 'SOMETEXT' ) AND
( TABLE_1.C=1 ) AND
( TABLE_1.E= 'IN' ) AND
( TABLE_1.D IS NULL )
ORDER BY
TABLE_1.ID
)
WHERE
( ROWNUM <= 150 )
) TOPNWRAPPER
WHERE
( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK )
ORDER BY
TOPNWRAPPER.ID ASC


stats:

|--------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers ||
|--------------------------------------------------------------------------------------------------------------------------|
|| 0 ||SELECT STATEMENT | | 1 | | 120 |00:00:00.01 | 965 ||
|| 1 |||NESTED LOOPS | | 1 | | 120 |00:00:00.01 | 965 ||
|| 2 ||||NESTED LOOPS | | 1 | 1 | 120 |00:00:00.01 | 845 ||
|| 3 |||||VIEW | | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 4 ||||||COUNT STOPKEY | | 1 | | 120 |00:00:00.01 | 245 ||
|| 5 |||||||VIEW | | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 6 ||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1 | 1 | 1 | 120 |00:00:00.01 | 245 ||
||* 7 |||||||||INDEX RANGE SCAN | TABLE_1_B_E_F_ID | 1 | 25 | 120 |00:00:00.01 | 125 ||
||* 8 |||||INDEX RANGE SCAN | TABLE_2_T1_ID_FK | 120 | 1 | 120 |00:00:00.01 | 600 ||
|| 9 ||||TABLE ACCESS BY INDEX ROWID | TABLE_2 | 120 | 1 | 120 |00:00:00.01 | 120 ||
|--------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter(ROWNUM<=150) |
| 6 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL)) |
| 7 - access(“TABLE_1”.”B”='SOMETEXT' AND |
| “TABLE_1”.”E”=‘IN' AND “TABLE_1”.”F”=’STATE1’) |
| 8 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”) |
+--------------------------------------------------------------------------------------------------------------------------+


When i update the query to have 'STATE2' in the IN clause an additional sort step is added to the plan.

SELECT
TOPNWRAPPER.*,
TABLE_2.X,
TABLE_2.Y
FROM
TABLE_2,
(
SELECT
*
FROM
(
SELECT
/*+ index (TABLE_1 TABLE_1_B_E_F_ID) */
TABLE_1.ID,
TABLE_1.C,
TABLE_1.B,
TABLE_1.E,
TABLE_1.F
FROM
TABLE_1
WHERE
( TABLE_1.F IN ( 'STATE1', 'STATE2' ) ) AND
( TABLE_1.B= 'SOMETEXT' ) AND
( TABLE_1.C=1 ) AND
( TABLE_1.E= 'IN' ) AND
( TABLE_1.D IS NULL )
ORDER BY
TABLE_1.ID
)
WHERE
( ROWNUM <= 150 )
) TOPNWRAPPER
WHERE
( TOPNWRAPPER.ID = TABLE_2.T1_ID_FK )
ORDER BY
TOPNWRAPPER.ID ASC


stats:

|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|| 0 ||SELECT STATEMENT | | 1 | | 150 |00:00:00.01 | 1076 | | | ||
|| 1 |||NESTED LOOPS | | 1 | | 150 |00:00:00.01 | 1076 | | | ||
|| 2 ||||NESTED LOOPS | | 1 | 1 | 150 |00:00:00.01 | 926 | | | ||
|| 3 |||||VIEW | | 1 | 1 | 150 |00:00:00.01 | 176 | | | ||
||* 4 ||||||COUNT STOPKEY | | 1 | | 150 |00:00:00.01 | 176 | | | ||
|| 5 |||||||VIEW | | 1 | 1 | 150 |00:00:00.01 | 176 | | | ||
||* 6 ||||||||SORT ORDER BY STOPKEY | | 1 | 1 | 150 |00:00:00.01 | 176 | 15360 | 15360 |14336 (0)||
|| 7 |||||||||INLIST ITERATOR | | 1 | | 165 |00:00:00.01 | 176 | | | ||
||* 8 ||||||||||TABLE ACCESS BY INDEX ROWID| TABLE_1 | 2 | 1 | 165 |00:00:00.01 | 176 | | | ||
||* 9 |||||||||||INDEX RANGE SCAN | TABLE_1_B_E_F_ID | 2 | 50 | 165 |00:00:00.01 | 11 | | | ||
||* 10 |||||INDEX RANGE SCAN | TABLE_2_T1_ID_FK | 150 | 1 | 150 |00:00:00.01 | 750 | | | ||
|| 11 ||||TABLE ACCESS BY INDEX ROWID | TABLE_2 | 150 | 1 | 150 |00:00:00.01 | 150 | | | ||
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| |
|Predicate Information (identified by operation id): |
|--------------------------------------------------- |
| |
| 4 - filter(ROWNUM<=150) |
| 6 - filter(ROWNUM<=150) |
| 8 - filter((“TABLE_1”.”C”=1 AND “TABLE_1”.”D” IS NULL)) |
| 9 - access(“TABLE_1”.”B”='SOMETEXT' AND |
| “TABLE_1”.”E”='IN' AND ((“TABLE_1”.”F”='STATE1') OR (“TABLE_1”.”F”='STATE2')) |
| 10 - access(“TOPNWRAPPER”.”ID”=“TABLE_2”.”T1_ID_FK”) |
| |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+


I've been looking around for a few days now. One suggesting i tried was using the hint /*+ USE_CONCAT (OR_PREDICATES(1)) */ and that helps a bit by reducing the mem usage by half, but it did not completely eliminate the problem.

EDIT: Was looking around (http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by#tip-ixord-full) and think this might be due to the order by. If i change the order by statements to be:
TABLE_1.F,TABLE_1.ID
and
TOPNWRAPPER.F,TOPNWRAPPER.ID ASC
then the sort operation goes away, unfortunately i need the top n rows based on ID. Alternatively i tried making a new index on (ID F) to test with and it did also remove the sort operation, but ID is unique per row making the table access operations less effective.

Answer

The performance difference probably doesn't matter. The execution plan difference is because multi-column index accesses are only implicitly sorted if the leading columns use equality conditions.

Performance Difference

Don't worry too much about the cost of an execution plan. Even though it's called the "Cost Based Optimizer", the cost is a weird number that only a few people in the world fully understand.

One reason why it is complicated to compare explain plan costs is that the total cost is sometimes less than one of the child operation costs. As I explain in my answer here, this can happen with a COUNT STOPKEY operation. This is Oracle's way of saying "this child operation would cost this huge amount, but the COUNT STOPKEY will probably cut it off before it gets that high". It's usually best to compare the top cost of the plan, but even that number can sometimes be misleading, as other examples in that answer demonstrate.

Which means that normally the run time is the only thing that matters. If the A-Time (actual time) is only 0.1 seconds for both queries then your job is probably done here.

Execution Plan Difference

The difference in the execution plans are caused by the ways multi-column indexes are stored and accessed. Sometimes when an index is scanned the results will be automatically stored and sometimes they will not. That's why one plan has COUNT STOPKEY and the other has the more expensive SORT ORDER BY STOPKEY.

To demonstrate this plan difference, create a simple table and index with just 2 columns and 4 rows:

create table test1 as 
select 1 a, 10 b from dual union all
select 1 a, 30 b from dual union all
select 2 a, 20 b from dual union all
select 2 a, 40 b from dual;

create index test1_idx on test1(a, b);

begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/

Below is a simplified idea of how the index is stored. The data is stored ordered by the leading column first, then ordered by the trailing column.

               +----+
        +------+Root+-------+
        |      +----+       |
        |                   |
      +-v-+               +-v-+
   +--+A=1+--+         +--+A=2+--+
   |  +---+  |         |  +---+  |
   |         |         |         |
 +-v--+   +--v-+     +-v--+   +--v-+
 |B=10|   |B=30|     |B=20|   |B=40|
 +----+   +----+     +----+   +----+

If the query only accesses one value in the leading column A, then it can read the values from column B in order without any extra effort. Oracle goes to one of the A blocks and then reads the B blocks in order without even trying.

Note how this query has an ORDER BY but there's no SORT in the execution plan.

explain plan for select * from test1 where a = 1 and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));

Plan hash value: 598212486

--------------------------------------
| Id  | Operation        | Name      |
--------------------------------------
|   0 | SELECT STATEMENT |           |
|   1 |  INDEX RANGE SCAN| TEST1_IDX |
--------------------------------------

But if the query accesses more than one value in the leading column A, the results from B will not necessarily be retrieved in order. Oracle may read the A blocks in order, but the B block order is only true for one A value.

Now, an extra SORT ORDER BY operation shows up in the execution plan.

explain plan for select * from test1 where a in (1,2) and b > 0 order by b;
select * from table(dbms_xplan.display(format => 'basic'));

Plan hash value: 704117715

----------------------------------------
| Id  | Operation          | Name      |
----------------------------------------
|   0 | SELECT STATEMENT   |           |
|   1 |  SORT ORDER BY     |           |
|   2 |   INLIST ITERATOR  |           |
|   3 |    INDEX RANGE SCAN| TEST1_IDX |
----------------------------------------

This is why changing column1 = value1 to column1 in (value1, value2) may add an extra SORT operation.

Comments