Ashish Ashish - 1 year ago 73
SQL Question

Oracle sapling of data in parallel

I have X million records in a table

TABLE_A
and want to process these records one by one.
How can I divide the population equally among 10 instances of same PL/SQL scripts to process in parallel?

See below query

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
FROM
(
SELECT CUSTOMER_ID, NAME, HOME_TELNO, DEPT_NAME, ROWNUM AS RNUM
FROM TABLE_A ORDER BY CUSTOMER_ID ASC
) CBR
WHERE CBR.RNUM < :sqli_end_rownum AND CBR.RNUM >= :sqli_start_rownum ;


Values will be incremented in each iteration of loop. In next iteration
sqli_start_rownum
will become
sqli_end_rownum
.

This query is taking much time. Does someone has better way to do it

Answer Source

You could look into DBMS_PARALLEL_EXECUTE: http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_parallel_ex.htm#ARPLS67331

For example: https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2

The poor man's version of this is basically to run a query to generate ranges of rowids. You can then access the rows in the table within a given range.

Step1: create the number of "buckets" you want to divide the table into and get a range of rowids for each bucket. Here's an 8-bucket example:

select bucket_num, min(rid) as start_rowid, max(rid) as end_rowid, count(*)
  from (select rowid rid
             , ntile(8) over (order by rowid) as bucket_num
          from table_a
) 
group by bucket_num
order by bucket_num; 

You'd get an output that looks like this (I'm using 12c - rowids may look different in 11g):

BUCKET_NUM  START_ROWID         END_ROWID           COUNT(*)
1           AABetTAAIAAB8GCAAA  AABetTAAIAAB8u5AAl  82792
2           AABetTAAIAAB8u5AAm  AABetTAAIAAB9RrABi  82792
3           AABetTAAIAAB9RrABj  AABetTAAIAAB96vAAU  82792
4           AABetTAAIAAB96vAAV  AABetTAAIAAB+gKAAs  82792
5           AABetTAAIAAB+gKAAt  AABetTAAIAAB+/vABv  82792
6           AABetTAAIAAB+/vABw  AABetTAAIAAB/hbAB1  82791
7           AABetTAAIAAB/hbAB2  AABetTAAIAACARDABf  82791
8           AABetTAAIAACARDABg  AABetTAAIAACBGnABq  82791

(The sum of the counts will be the total number of rows in the table at the time of the query.)

Step2: can grab a set of rows from the table for a given range:

SELECT <whatever you need>
  FROM <table>
 WHERE rowid BETWEEN 'AABetTAAIAAB8GCAAA' and 'AABetTAAIAAB8u5AAl'
...

Step3: repeat step2 for the given ranges.

so instead of this:

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
FROM
(
    SELECT CUSTOMER_ID, NAME, HOME_TELNO, DEPT_NAME, ROWNUM AS RNUM
    FROM TABLE_A ORDER BY CUSTOMER_ID ASC
) CBR
WHERE CBR.RNUM < :sqli_end_rownum AND CBR.RNUM >= :sqli_start_rownum ;

you'll just have this:

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
  FROM table_a
 WHERE rowid BETWEEN :start_rowid and :end_rowid

You can use this to run the same job in parallel but you'll need a separate session for each run (e.g. multiple SQL Plus sessions. You can also use something like DBMS_JOBS/DBMS_SCHEDULER to launch background jobs.

(Note: always be aware if your table is being updated between the time the buckets are calculated and the time you access the tables as you can miss rows.)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download