PiC PiC - 26 days ago 13
SQL Question

UPDATE works OK (but really, really slow) despite ORA-904 in subquery

I have an UPDATE statement with a subquery in WHERE to find the duplicates. The subquery has errors that is revealed when running the subquery itself but when running that in the UPDATE statement no error is shown and the DML runs OK (but really slow).

See the table setup:

CREATE TABLE RAW_table
(
ERROR_LEVEL NUMBER(3),
RAW_DATA_ROW_ID INTEGER,
ATTRIBUTE_1 VARCHAR2(4000 BYTE)
)
;

INSERT INTO RAW_table VALUES (0, 2, '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0, 2, '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0, 2, '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0, 3, '509NTVS9Q863');
INSERT INTO RAW_table VALUES (0, 3, '509NTVS9Q863');
INSERT INTO RAW_table VALUES (0, 3, '509NTVS9Q863');

COMMIT;


The query with error is:

SELECT UPPER(ATTRIBUTE_1), rid
FROM ( SELECT UPPER(ATTRIBUTE_1)
, ROWID AS rid
, ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
FROM RAW_table
)
WHERE RN > 1;


It gives
ORA-00904: "ATTRIBUTE_1": invalid identifier
when run.

However, the below DML that uses the above query (as of line 4) in the WHERE statement works OK:

set timing on

UPDATE RAW_table
SET ERROR_LEVEL = 4
WHERE (UPPER (ATTRIBUTE_1), ROWID)
IN (SELECT UPPER (ATTRIBUTE_1), rid
FROM (SELECT UPPER (ATTRIBUTE_1), ROWID AS rid
, ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
FROM RAW_table
)
WHERE RN > 1
)
;

4 rows updated.
Elapsed: 00:00:00.36


Why? Why? Why?

I expected the UPDATE to fail with
ORA-00904: "ATTRIBUTE_1": invalid identifier
too.
Why it does not fail?

The real problem, however, is not that the UPDATE actually works, but that it works really slow.

When I corrected the subquery not to trigger
ORA-00904: "ATTRIBUTE_1": invalid identifier
to something like this:

UPDATE RAW_table
SET ERROR_LEVEL = 4
WHERE (UPPER (ATTRIBUTE_1), ROWID)
IN (SELECT checked_column, rid
FROM (SELECT UPPER (ATTRIBUTE_1) AS checked_column, ROWID AS rid
, ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
FROM RAW_table
)
WHERE RN > 1
)
;


The query accelerated nearly 400 times on the test data set of 11.000 rows:

SELECT COUNT(*) FROM RAW_table;

COUNT(*)
----------
11004
1 row selected.


Corrected query:

1005 rows updated.
Elapsed: 00:00:00.28


Query with ORA-904:

1005 rows updated.
Elapsed: 00:01:48.40


I was not patient enough to wait till the end of 71.000 rows test:

SELECT COUNT(*) FROM RAW_table;
COUNT(*)
----------
71475
1 row selected.

Corrected query
11004 rows updated.
Elapsed: 00:00:00.60

Query with ORA-904


Cancelled after 30 min...

Explain plan for the query with ORA-904:

UPDATE STATEMENT ALL_ROWS Cost: **2 544 985 615** Bytes: 8 464 752 Cardinality: 4 176
7 UPDATE RAW_TABLE
6 FILTER
1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 169 282 878 Cardinality: 83 514
5 VIEW Cost: 30 486 Bytes: 2 087 850 Cardinality: 83 514
4 WINDOW SORT Cost: 30 486 Bytes: 169 282 878 Cardinality: 83 514
3 FILTER
2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 169 282 878 Cardinality: 83 514


Explain plan for corrected query:

UPDATE STATEMENT ALL_ROWS Cost: **36 637** Bytes: 3 374 235 Cardinality: 835
7 UPDATE RAW_TABLE
6 HASH JOIN RIGHT SEMI Cost: 36 637 Bytes: 3 374 235 Cardinality: 835
4 VIEW VIEW SYS.VW_NSO_1 Cost: 30 486 Bytes: 168 197 196 Cardinality: 83 514
3 VIEW Cost: 30 486 Bytes: 169 282 878 Cardinality: 83 514
2 WINDOW SORT Cost: 30 486 Bytes: 169 282 878 Cardinality: 83 514
1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 169 282 878 Cardinality: 83 514
5 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 169 282 878 Cardinality: 83 514


After analyzing the table the costs are plans are the same.
Explain plan for the query with ORA-904:

UPDATE STATEMENT ALL_ROWS Cost: **29 381 690** Bytes: 38 Cardinality: 2
7 UPDATE RAW_TABLE
6 FILTER
1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 1 358 025 Cardinality: 71 475
5 VIEW Cost: 427 Bytes: 1 786 875 Cardinality: 71 475
4 WINDOW SORT Cost: 427 Bytes: 1 358 025 Cardinality: 71 475
3 FILTER
2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 1 358 025 Cardinality: 71 475


Explain plan for corrected query:

UPDATE STATEMENT ALL_ROWS Cost: **3 123** Bytes: 1 453 595 Cardinality: 715
7 UPDATE RAW_TABLE
6 HASH JOIN SEMI Cost: 3 123 Bytes: 1 453 595 Cardinality: 715
5 VIEW VIEW SYS.VW_NSO_1 Cost: 427 Bytes: 143 950 650 Cardinality: 71 475
4 VIEW Cost: 427 Bytes: 144 879 825 Cardinality: 71 475
3 WINDOW SORT Cost: 427 Bytes: 1 358 025 Cardinality: 71 475
2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 1 358 025 Cardinality: 71 475
1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54 Bytes: 1 358 025 Cardinality: 71 475


Explain plan cost says it all, but why it is so much different?

I've just triggered a 71.000 rows test again, after computing statistics on the table, but it is already running for few minutes...

This all is on Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit.

Answer

This is why aliases are really, really useful.

In the query

UPDATE RAW_table
   SET ERROR_LEVEL   = 4
 WHERE (UPPER (ATTRIBUTE_1), ROWID) 
       IN (SELECT UPPER (ATTRIBUTE_1), rid
           FROM (SELECT UPPER (ATTRIBUTE_1), ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) 
                                               ORDER BY RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table
                )
           WHERE RN > 1
          )

SELECT UPPER (ATTRIBUTE_1) is valid because it can be resolved as a reference to the table you are updating not to the table in the FROM. With aliases, that query is equivalent to

UPDATE RAW_table dest
   SET dest.ERROR_LEVEL   = 4
 WHERE (UPPER (dest.ATTRIBUTE_1), ROWID) 
       IN (SELECT UPPER (dest.ATTRIBUTE_1), src.rid
           FROM (SELECT UPPER (rt.ATTRIBUTE_1), rt.ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (rt.ATTRIBUTE_1) 
                                               ORDER BY rt.RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table rt
                ) src
           WHERE src.rid > 1
          )

Of course, if you had written it this way, it would be immediately clear that you were referencing the dest.attribute_1 rather than the src.attribute_1. That (and many other reasons) is why it's a good idea to alias your columns-- it makes it clear which object you intended to reference and throws an error when the intended reference is invalid rather than potentially resolving it to something you didn't intend.

Comments