Ashish Ashish - 3 months ago 6
SQL Question

Oracle sapling of data in parallel

I have X million records in a table

TABLE_A
and want to process these records one by one.
How can I divide the population equally among 10 instances of same PL/SQL scripts to process in parallel?

See below query

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
FROM
(
SELECT CUSTOMER_ID, NAME, HOME_TELNO, DEPT_NAME, ROWNUM AS RNUM
FROM TABLE_A ORDER BY CUSTOMER_ID ASC
) CBR
WHERE CBR.RNUM < :sqli_end_rownum AND CBR.RNUM >= :sqli_start_rownum ;


Values will be incremented in each iteration of loop. In next iteration
sqli_start_rownum
will become
sqli_end_rownum
.

This query is taking much time. Does someone has better way to do it

Answer

You could look into DBMS_PARALLEL_EXECUTE: http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_parallel_ex.htm#ARPLS67331

For example: https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2

The poor man's version of this is basically to run a query to generate ranges of rowids. You can then access the rows in the table within a given range.

Step1: create the number of "buckets" you want to divide the table into and get a range of rowids for each bucket. Here's an 8-bucket example:

select bucket_num, min(rid) as start_rowid, max(rid) as end_rowid, count(*)
  from (select rowid rid
             , ntile(8) over (order by rowid) as bucket_num
          from table_a
) 
group by bucket_num
order by bucket_num; 

You'd get an output that looks like this (I'm using 12c - rowids may look different in 11g):

BUCKET_NUM  START_ROWID         END_ROWID           COUNT(*)
1           AABetTAAIAAB8GCAAA  AABetTAAIAAB8u5AAl  82792
2           AABetTAAIAAB8u5AAm  AABetTAAIAAB9RrABi  82792
3           AABetTAAIAAB9RrABj  AABetTAAIAAB96vAAU  82792
4           AABetTAAIAAB96vAAV  AABetTAAIAAB+gKAAs  82792
5           AABetTAAIAAB+gKAAt  AABetTAAIAAB+/vABv  82792
6           AABetTAAIAAB+/vABw  AABetTAAIAAB/hbAB1  82791
7           AABetTAAIAAB/hbAB2  AABetTAAIAACARDABf  82791
8           AABetTAAIAACARDABg  AABetTAAIAACBGnABq  82791

(The sum of the counts will be the total number of rows in the table at the time of the query.)

Step2: can grab a set of rows from the table for a given range:

SELECT <whatever you need>
  FROM <table>
 WHERE rowid BETWEEN 'AABetTAAIAAB8GCAAA' and 'AABetTAAIAAB8u5AAl'
...

Step3: repeat step2 for the given ranges.

so instead of this:

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
FROM
(
    SELECT CUSTOMER_ID, NAME, HOME_TELNO, DEPT_NAME, ROWNUM AS RNUM
    FROM TABLE_A ORDER BY CUSTOMER_ID ASC
) CBR
WHERE CBR.RNUM < :sqli_end_rownum AND CBR.RNUM >= :sqli_start_rownum ;

you'll just have this:

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
  FROM table_a
 WHERE rowid BETWEEN :start_rowid and :end_rowid

You can use this to run the same job in parallel but you'll need a separate session for each run (e.g. multiple SQL Plus sessions. You can also use something like DBMS_JOBS/DBMS_SCHEDULER to launch background jobs.

(Note: always be aware if your table is being updated between the time the buckets are calculated and the time you access the tables as you can miss rows.)

Comments