With a huge enterprise level datasystems these days, we might have some requirements where we would need to get data from different databases residing on different servers (sometimes residing at different locations), Here we might need to get data joining two tables residing on different databases
For example there exists a table_a under schema_a in database_a
and table_b under schema_b in database_b with below columns:
Let's say we need to write a query to get top 9 employee details with there names and emails.
Here this might look simple, thinking that we can join two tables by refering tables with dblinks pointing to needed database just like below queries.
CREATE DATABASE LINK "ABDBLINK.com" CONNECT TO schema_a IDENTIFIED BY <PWD> USING database_b';
select * from table_a x,table_b@ABDBLINK.com y where x.id=y.id where rownum > 10;
The above query works perfect to get data needed, but if we learn how the
above query works, we will get to know how this would create problems
when those tables have thousands of records and when we are expecting
only a couple from those. When we run the query, actually oracle will
fetch all the records in the table_B from database_B into database_a and
then try to filter the matching records in both table_a and table_b So now we can imagine how much data traffic it would cause to move
thousands of records across databases just to fetch a couple from those.
So the other alternative we can think about is, we need to get only the required records from table_b in database_b to database_a and then try joining both to get the columns needed from both the tables but unfortunately oracle allows only primitive data types (Number, Varchar..etc..) to return using a function across databases but here we need to get a table type (user defined type) in order to be able to join both.
Using The Code
I came with a thought of how we can use an old technique/ predefined procedure in oracle which is used to execute dynamic SQL queries called DBMS_SQL. The main
advantage or reason to use this is DBMS_SQL is a procedure and as all know oracle supports RPC (Remote Procedure Calls) and this procedure can be executed across databases, Let us see how this works (Refer the link to know more on DBMS_SQL ).
- Define a function in database_b which would execute a query to return only the needed records from table_b with the help of its input parameter
which would be sent while calling this function from database_a
PROCEDURE DefineQueryCursor (c IN OUT NUMBER, p_emp_ids VARCHAR)
c := DBMS_SQL.open_cursor;
FROM table_b Where employee_id in (' || p_emp_ids || ')',
rc := DBMS_SQL.Execute (c);
We would be calling the above function from database_a to get only the required employee details. Observe the first parameter in function is a out param that sends out the cursor reference in database_b to database_a on which the query would get executed and the results would be saved from that point of cursor where as the second parameter is used to filter out the needed records from table_b. Here
I would send a comma separated employee ids from database_a that are needed for us.
- Declare a record type and table type variables in database_a with the columns needed from table_b to catch the records/result set we get from query declared in the above function.
TYPE rec_employee IS RECORD
EMPLOYEE_ID VARCHAR (10),
EMPLOYEE_MAIL VARCHAR2 (4000)
TYPE tab_employee IS TABLE OF rec_employee;
- Define a function in database_a to call the function in database_b and then wrap the data in a table type, we use pipelined functions here that helps in returning table types,
please go through the link for more information on
cmfs VARCHAR (4000);
indx NUMBER := 0;
fetch_count NUMBER := 10;
SELECT RTRIM (
XMLAGG (XMLELEMENT (e, employee_id || '|')).EXTRACT (
FROM (SELECT t1.employee_id
FROM table_a t1
WHERE rownum < 10);
SELECT COUNT (t1.employee_id) + 5
FROM table_a t1
WHERE rownum < 10;
DefineQueryCursor@ABDBLINK.com (c, emp_ids);
d := dbms_sql@ABDBLINK.com.fetch_rows (c);
dbms_sql@ABDBLINK.com.COLUMN_VALUE (c, 1, v_employee_id);
dbms_sql@ABDBLINK.com.COLUMN_VALUE (c, 2, v_employee_name);
EXIT WHEN d != fetch_count;
IF v_employee_id.COUNT > 0
FOR i IN v_employee_id.FIRST .. v_employee_id.LAST
v_result.EMPLOYEE_ID := v_employee_id (i);
v_result.EMPLOYEE_MAIL := v_employee_name (i);
PIPE ROW (v_result);
IF DBMS_SQL@ABDBLINK.com.IS_OPEN (c)
'func-getempdetails-dbms_sql: - ' || SQLERRM);
Please go through the comments for a detailed explanation at each step
- Now we can get only the needed records from table_b in database_b by querying the above function as below in.............................. of course database_a
select * from table(get_employee_info());
So now we can easily join the both the above query and table_a and that would give us the required/expected result.
SELECT t1.employee_id, t1.employee_name, t2.employee_email
FROM table_a t1, TABLE (get_employee_info ()) t2
WHERE t1.employee_id = t2.employee_id
Points of Interest
I know all these looks a bit cumbersome or heavy to maintain but if we can organize all these functions in a single package in two databases, that would give us a lot of performance to the application using these tables, Below are some sample timestamps which we in our team experienced (though table/database names are not real, all the times are facts which we experienced)
Time elapsed When used the query as mentioned in my introduction is 3Min 10 Sec, check the below explain plan
Time elapsed when used the query using DBMS_SQL is 900ms, check the below explain plan
we can observe in the above explain plans that though the cost is less in both the cases, Bytes transferred is very high in the first case which was greatly reduced with out approach.
- Version created on 4/1/2013.
- Added images to show the performance on 4/2/2013