Oracle sapling of data in parallel

Ashish Published at Dev

Ashish

I have X million records in a table TABLE_A and want to process these records one by one. How can I divide the population equally among 10 instances of same PL/SQL scripts to process in parallel?

See below query

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
FROM
(
    SELECT CUSTOMER_ID, NAME, HOME_TELNO, DEPT_NAME, ROWNUM AS RNUM
    FROM TABLE_A ORDER BY CUSTOMER_ID ASC
) CBR
WHERE CBR.RNUM < :sqli_end_rownum AND CBR.RNUM >= :sqli_start_rownum ;

Values will be incremented in each iteration of loop. In next iteration sqli_start_rownum will become sqli_end_rownum.

This query is taking much time. Does someone has better way to do it

Patrick Marchand

You could look into DBMS_PARALLEL_EXECUTE: http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_parallel_ex.htm#ARPLS67331

For example: https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2

The poor man's version of this is basically to run a query to generate ranges of rowids. You can then access the rows in the table within a given range.

Step1: create the number of "buckets" you want to divide the table into and get a range of rowids for each bucket. Here's an 8-bucket example:

select bucket_num, min(rid) as start_rowid, max(rid) as end_rowid, count(*)
  from (select rowid rid
             , ntile(8) over (order by rowid) as bucket_num
          from table_a
) 
group by bucket_num
order by bucket_num;

You'd get an output that looks like this (I'm using 12c - rowids may look different in 11g):

BUCKET_NUM  START_ROWID         END_ROWID           COUNT(*)
1           AABetTAAIAAB8GCAAA  AABetTAAIAAB8u5AAl  82792
2           AABetTAAIAAB8u5AAm  AABetTAAIAAB9RrABi  82792
3           AABetTAAIAAB9RrABj  AABetTAAIAAB96vAAU  82792
4           AABetTAAIAAB96vAAV  AABetTAAIAAB+gKAAs  82792
5           AABetTAAIAAB+gKAAt  AABetTAAIAAB+/vABv  82792
6           AABetTAAIAAB+/vABw  AABetTAAIAAB/hbAB1  82791
7           AABetTAAIAAB/hbAB2  AABetTAAIAACARDABf  82791
8           AABetTAAIAACARDABg  AABetTAAIAACBGnABq  82791

(The sum of the counts will be the total number of rows in the table at the time of the query.)

Step2: can grab a set of rows from the table for a given range:

SELECT <whatever you need>
  FROM <table>
 WHERE rowid BETWEEN 'AABetTAAIAAB8GCAAA' and 'AABetTAAIAAB8u5AAl'
...

Step3: repeat step2 for the given ranges.

so instead of this:

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
FROM
(
    SELECT CUSTOMER_ID, NAME, HOME_TELNO, DEPT_NAME, ROWNUM AS RNUM
    FROM TABLE_A ORDER BY CUSTOMER_ID ASC
) CBR
WHERE CBR.RNUM < :sqli_end_rownum AND CBR.RNUM >= :sqli_start_rownum ;

you'll just have this:

SELECT CBR.CUSTOMER_ID, CBR.NAME, CBR.DEPT_NAME
  FROM table_a
 WHERE rowid BETWEEN :start_rowid and :end_rowid

You can use this to run the same job in parallel but you'll need a separate session for each run (e.g. multiple SQL Plus sessions. You can also use something like DBMS_JOBS/DBMS_SCHEDULER to launch background jobs.

(Note: always be aware if your table is being updated between the time the buckets are calculated and the time you access the tables as you can miss rows.)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-04-10

Comments

0 comments

TOP Ranking

Article

Oracle sapling of data in parallel

Oracle sapling of data in parallel

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

BigQuery - concatenate ignoring NULL

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

How to remove the extra space from right in a webview?

Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

How to use merge windows unallocated space into Ubuntu using GParted?

flutter: dropdown item programmatically unselect problem

Pandas - check if dataframe has negative value in any column

Nuget add packages gives access denied errors

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

Generate random UUIDv4 with Elm

Client secret not provided in request error with Keycloak