Click here to Skip to main content
Click here to Skip to main content
Go to top

SSIS: Load Data in Dimension Table by Merge Join

, 13 Jan 2013
Rate this:
Please Sign up or sign in to vote.
How to load data from an Excel file to load in a dimension table in an incremental way.

Introduction

In this tutorial I will show how to load data from an Excel file to load in a dimension table in an incremental way.

Background

Each time you want to populate your dimension table data you need to truncate the destination and reload from the source. But if the table size is huge then it is not the best solution. So instead of delete insert, we will populate data by insert new row and update changed row.

What you need

This script was tested in SQL Server 2008.

Create project

First create a database and table from script.

use master
go

if db_id ('bi_Stage') is not null
drop database bi_Stage;
go

create database bi_Stage 
on primary (name = 'bi_Stage'
, filename = 'E:\BI\DB\Stage\bi_Stage.mdf'
, size = 5 MB, filegrowth = 5 MB)
log on (name = 'bi_Meta_log'
, filename = 'E:\BI\DB\Stage\bi_Stage_log.ldf'
, size = 2 MB, filegrowth = 512 KB)
collate SQL_Latin1_General_CP1_CI_AS
go

alter database bi_Stage set recovery simple 
go
alter database bi_Stage set auto_shrink off
go
alter database bi_Stage set auto_create_statistics off
go
alter database bi_Stage set auto_update_statistics off
go


SET ANSI_NULLS ON
GO

use bi_Stage
go

if exists 
  (select * from sys.tables 
   where name = 'chart_of_account')
drop table chart_of_account
go

BEGIN
CREATE TABLE [dbo].[chart_of_account](
	[account_id] [int] NULL,
	[parent_account_id] [int] NULL,
	[account_code] [nvarchar](50) NULL,
	[account_name] [nvarchar](150) NULL,
	[account_type] [nvarchar](50) NULL
) ON [PRIMARY];
END
GO
SET ANSI_NULLS OFF
GO

Then open SQL Server Business Intelligence Development Studio.

Then go to File->New->Project and select Integration Service Project.

Select "Data Flow Task" from "Control Menu" and drag it on "Control Flow" tab. Then double click it.

Now select "Excel Source" from "Data Flow Sources" and drag it on "Data Flow" tab. Then double click it to assign file source.

Click OK and then again click OK.

Select "Data Conversion" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Excel Source". Then double click it. We need to convert data which is coming from source.

Rename "Output Alias" field, change "Data Type" value from default value to add prefix "src_" to distinguish, and change "Length". Click OK.

Select "OLE DB Source" from "Data Flow Transformation" and drag it on "Control Flow" tab. Then double click it. We need to assign data source.

Click New button for create New connection or select from already existing connection.

Click New button.

Assign "Server Name", select "Authentication", select "Database", and click "OK".

From left tab select "Column" and then click OK.

Select "Sort" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Data Conversion". Then double click it.

Select column which needs to be sorted.

Select another "Sort" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "OLE DB Source". Then double click it. Select column which needs to be sorted.

Select "Merge Join" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Sort".

Select Join Type as "Left outer join" and select all fields from left panel and all fields from right panel. Click OK. Connect "Merge Join" from "Sort Dest".

Select "Conditional Split" from "Data Flow Transformation" and drag it on "Control Flow" tab. Connect it from "Merge Join". Right-click the Conditional Split and click Edit to open the Conditional Split Editor.

Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the "Output Name" column and enter "New" as the name of the first output. From the NULL Functions folder, drag and drop the "ISNULL( <<expression>> )" function to the Condition column of the New Rows condition:

Next, drag account_id from the columns folder and drop it onto the "<<expression>>" text in the Condition column. "New" should now be defined by the condition "ISNULL( [account_id] )". This defines the WHERE clause for new rows - setting it to "WHERE account_id Is NULL".

Type "Update" into a second Output Name column. Add the expression "((src_code != account_code) || (src_name != account_name) || (src_parent_id != parent_account_id) || (src_type != account_type))" to the Condition column for the Update Rows output. This defines our WHERE clause for detecting changed rows - setting it to "WHERE ((src_code != account_code) || (src_name != account_name) || (src_parent_id != parent_account_id) || (src_type != account_type))". Note "||" is used to convey "OR" in SSIS Expressions:

Change the "Default output name" from "Conditional Split Default Output" to "Unchanged".

Select "OLE DB Destination" from "Data Flow Destination" and Drag it on "Control Flow" tab.

Connect it from "Conditional Split" and select "New" and assign connection and map field.

Select "OLE DB Command" from "Data Flow Transformation" and Drag it on "Control Flow" tab.

Connect it from "Conditional Split" and select "Update" and assign "Connection Manager" and map field.

Click OK. Now we are ready to execute the SSIS.

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Now Execute the command to test:

SELECT [account_id]
      ,[parent_account_id]
      ,[account_code]
      ,[account_name]
      ,[account_type]
  FROM [bi_Stage].[dbo].[chart_of_account]

Now I have changed some data in Excel file and again execute the package with debugging (press F5), the package should succeed and appear as shown here:

Conclusion

I have used one more "Sort Dest" for sorting destination table. I think it will be better if we use "ORDER BY" clause in "OLE DB Source" rather than use "Sort". Performance can be varied. I hope this might be helpful to you!

History

None so far.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

mparvez
Software Developer (Senior) icddr,b
Bangladesh Bangladesh
More than 8 years experience on Programming and Project implementation, I was primarily involved with projects for private organization,Govt.(Bangladesh Army,DG Health,RJSC), NGO (SEDF,WFP). Presently I am working at ICDDR,B and enhancing Hospital Management System developed by Microsoft Dynamic NAV and Windows Mobile Application 5.0
 
An active supporter of Open Source technology, my interested areas are ERP, IT Audit, Data warehouse, BI etc.
 
Playing Guitar for 15 years, my interested music style is Blues Rock,Neo Classical.
 
Certification
 
70-540:Microsoft® Windows Mobile® 5.0 - Application Development
MB7-514:Microsoft Dynamics™ NAV 5.0 C/SIDE Introduction
MB7-516:Microsoft Dynamics™ NAV 5.0 Solution Development
MB7-517:Microsoft Dynamics™ NAV 5.0 Installation and Configuration
MB7-515:Microsoft Dynamics™ NAV 5.0 Financials
70-432:Microsoft SQL Server 2008 - Implementation and Maintenance
70-450:PRO: Designing, Optimizing and Maintaining a Database Administrative Solution Using Microsoft SQL Server 2008
70-448:Microsoft SQL Server 2008, Business Intelligence Development and Maintenance
312-50:Certified Ethical Hacker
 
Web :http://masudparvezshabuz.appspot.com
Blog :http://masudparvezshabuz.wordpress.com
linkedin :http://www.linkedin.com/in/masudparvez
Follow on   Google+

Comments and Discussions

 
QuestionExcellent Excercise Pinmembersajida16-Jun-14 7:15 
QuestionSCD Type 2 Pinmemberkumaar198611-Sep-13 2:03 
AnswerRe: SCD Type 2 Pinmembermparvez11-Sep-13 2:16 
QuestionMerge Join is not working! PinmemberMember 1015246817-Jul-13 1:42 
AnswerRe: Merge Join is not working! Pinmembermparvez17-Jul-13 18:36 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140926.1 | Last Updated 13 Jan 2013
Article Copyright 2013 by mparvez
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid