Click here to Skip to main content
15,569,319 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Below is my query which takes around 5k mappers and 1k reducers and time taken is around 2.2 hours to finish. Any scope of optimization in here?

What I have tried:

sum(B.item_net_amount) net_amount,
sum(B.item_gross_amount) gross_amount,
sum(B.item_quantity) item_quantity,
(sum(nvl(B.item_gross_amount,0))-sum( nvl(B.total_discount_amount,0)))/100 dollar_sales
from (select item_dim_key from wh_postx_seven11_p1.eal_seven11_it_mstr where category_label in ('34.02 - Products')) A
left outer join
wh_postx_seven11_p2.postx_fact B
on A.item_dim_key = B.src_item_dim_key
WHERE data_supplier_name='SEVEN11'
and retailer_name='SEVEN11'
and tm_dim_key_day between 42598 and 42604 ;

output i got :
258546211.00000000    258550200.00000000    7475811    2585462.1100000

The query used in from statement returns around 1.3 k records. Based on this table the left join is used to main table.

select item_dim_key from wh_postx_seven11_p1.eal_seven11_it_mstr where category_label in ('34.02 - Products')
Updated 12-Dec-20 2:12am
Gerry Schmitz 12-Dec-20 12:49pm    
Does it sum before joining? When would that be redundant considering the joins? Is there an "execution plan"? Do you think one query statement is "better" than thoughtful sub-queries that one can time? How do you think someone else would answer those questions?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900