This is the second article of the series Learn Microsoft Business Intelligence step by step.
In this series we will learn all three components of MSBI platform that is SSIS, SSAS and SSRS in step by step.
- Day 1
- Day 2
- Day 3
What we have learned last time?
In Day 1 we have learned following things
- What is Data warehouse?
- How it is different From Traditional database?
- What is Business Intelligence?
- What is the purpose of SSIS, SSAS and SSRS?
- How create a simple ETL package using SSIS?
Agenda for Day 2
We already have a basic knowledge about Data Warehouse by now. It’s typically a database holding data in a denormalized manner. Data from multiple places (multiple data sources) are collected and stored into DW (Data warehouse). SSIS (SQL server integration services) is the Microsoft’s business intelligence platform for performing this task.
In order to design data warehouse you have to ask yourself two questions.
- What you want to calculate?
Answer may be – Total profit, Total spend etc.
- What will be your filters? OR What are the basis for your calculation?
Answer may be – Location, Time, Customer etc. because your target is to find out
- What is the total sale in particular location? Once you get the value you can take appropriate actions like, you can run some campaign or advertisements at one with least sale.
- In last 5 years which was the best year for business or in short in which year maximum sales happened?
- Which customer did purchased the maximum goods.
DW Basically contains two main types of tables. One is a fact table which contain measurement of a business process and other is a dimension table which will be used for filtering, grouping and sorting of our data. In the above example,
- Location, Time (one which will hold years) and customer will become your Dimensions
- Sale become your Facts.
Note: Facts and Dimensions are simply tables.
Designing Data Warehouse is all about,
- understanding the Dimensions and Facts in the business
- and creating appropriate tables.
In order to understand this, let’s have an example. Look at the database diagram of tradition Normalized database.
Now let see how to make DW from above one
As you can see,
- Traditional tables are converted into denormalized Dimensions.
- Fact table created contains only 2 things
- Numbers which can be used for calculation
- References for dimension table keys.
|Above technique where Fact table references Dimension tables and Dimension tables are completely denormalized leads to a database diagram which looks like star and hence called as Star schema design.
In Star schema design, one fact table will be surrounded by many dimension tables.
||There is one more way of designing data ware houses. It’s called Snowflake design. In this normalization of some dimension tables is considered to some extent.
- Star schema design we get more performance because dimensions are denormalized. In Snowflake performance will be comparatively less compared to star flake.
- In Star schema data management will be difficult compare to snowflake because data redundancy is more.
- In Star schema queries will be simpler because joins will be comparatively less.
Conclusion – If data integrity and data redundancy is big concern snowflake is best. If performance is the major concern star schema fits best.
Now it’s time for our labs. In the last lab we have created a simple SSIS package demonstrating the ETL process. In this article we will continue our lab and explore some more Control Flow tasks.
This task let us custom SQL Script via our package.
Step 1. Create Table
Create a table in your SQL database as follows
Note: EmployeeId is identity column.
Step 2. Create SSIS Project
Create a new Empty SSIS project just like one we created in Lab 1.
Step 3. Create Connection Manager
3.1 Right click connection manager and say new ADO.NET connection.
3.2 Click New
3.3 Enter Server Name, Enter Credential, and Select Database and click OK.
3.4 Click Ok again.
3.5 Rename connection manager to Lab2Connection.
Step 4. Add Execute SQL Task
In SSIS Designer by default control flow is selected. Take “Execute SQL Task” from the toolbox and add it in the designer.
Step 5. Configure “Execute SQL Task”
5.1 Double click the task, it will open up “Execute SQL Task Editor”
5.2 Make sure the left section General is selected. In the right section under SQL Statement group, change connection type to ADO.NET from default OLEDB.
5.3 Next set Connection property to one created in Step 3.
5.4 Next is SQLSourceType. It support three values,
5.4.1 File connection – Let us execute T-SQL Statement stored in a file.
5.4.2 Variable – Let us execute T-SQL statement stored in one of the variable.
5.4.3 Direct Input- let us provide T-SQL statement which we want to execute
Note: On selecting one of the above value, a dynamic option will be displayed in the property window. For “Direct Input” a property called “SQL Statement”, For Variable a property called “Source Variable” and for File Connection property called “File connection” will be dispayed.
For our select “Direct Input”.
5.5 Click “Sql Statement”. A button with three dots in it will appear. Click it. It will popup “Enter SQL Query” dialog box. Enter query into it and click ok.
5.6 Click Ok in the “Execute SQL Task Editor” window.
Step 6. Execute Package
Press F5 and execute the package.
Step 7. Test the execution
Open the sql server database and confirm record is instered.
Note: We are using hard coded query for demonstration. You can check the identity column and confirm whether actually values are inserted or not.
Step 8. Prepare for next lab
Grab some food, take some rest and be prepare for next lab.
Lab 3 – Backup database Task
Step 3. Add Backup Database Task
Take “Backup database task” from toolbox and add it to control flow designer window
Step 4. Configure Backup Database Task
4.1 Double click the task, it will pop up the configure window.
4.2 The first thing to configure in connection in the top. Select the connection manager to one created in step 3.
Note: A very strange thing will happen here. You will not find the connection manager (created in step 3) in the drop down. For that,
4.2.1 Close the window.
4.2.2 Right click your connection manager and go to properties.
4.2.3 Check the value of Qualifier property. It will be something like this, “System.Data.SqlClient.SqlConnection, System.Data, Version=184.108.40.206, Culture=neutral, PublicKeyToken=b77a5c561934e089”.
4.2.4 Change it to Simple “SQL”.
4.2.5 Double click the Backup task again. Now you will find the connection manager in dropdown box.
4.3 Just below connection you will find Database(s) option. Select one or more database from the list.
4.4 Let other setting as it is, only select the path where you want to keep database backups.
4.5 Click Ok
Step 5. Execute the Package
Press F5 and execute the package.
Step 6. Test the execution
Go to the chosen path and see if backup is created or not.
If backup file exist it’s time for next lab.
In this lab we will learn about two things.
- Script task – It let us perform something which is not possible using any of the existing SSIS tasks. It let us write code in C# and do custom operations.
- Variables – We will learn how to declare variables, how to assign values to them, and how to use them inside Script Task.
Note: Once the variable is created and assign some values we can use it as an input for many tasks. For instance, as an input for Execute SQL Task or may be used inside script task which we are going to do in this lab.
Step 1. Create SSIS Project
Create a new Empty SSIS project.
Step 2. Create Variable and Assign Value
2.1 Do either one of these,
It will open up variable window.
2.2 Click on Add Variable button
2.3 Define the variable and assign the value.
Step 3. Add Script task to control flow
Take Script task from the toolbox and add it control flow
Step 4. Configure script task.
4.1 Double click the script task. It will open Script task editor.
4.2 Next we have to configure what variable we want to pass to the script. We can pass variable as either read only variable or read write variable. Click the triple dot button in front of either ReadOnlyVariable or ReadWriteVariable.
Select the variables and click OK.
4.3 Next, click on edit script.
It will make a new instance of Visual studio open up with a “.cs” extension file.
4.4 In the main Method write the following code.
public void Main()
4.5 Press Ctrl+Save and save the file.
4.6 Close the Current Visual studio and go back to Sql server data tools where “script task editor” is open.
4.7 Click OK.
Step 5. Execute package.
Press F5 and execute the application.
Step 6. Check the output
Time to celebrate
In further coming articles we will speak more about SSIS.
- We will cover some more Control Flow and Data Flow tasks.
- Will talk about Error handling.
- Will talk about Variables and expressions.
- Many more things.
Hope you enjoyed reading this. Your comments, votes and suggestions motivates us for writing more stuffs like this.
For technical trainings on various topics like WCF, MVC, Business Intelligence, Design Patterns, WPF and UML and many more feel free to contact SukeshMarla@Gmail.com or visit www.sukesh-marla.com
For more stuff like this click here. Subscribe to article updates or follow at twitter @SukeshMarla
See 600+ FAQ questions and answers in .NET, C#, ASP.NET, SQL, WCF, WPF, WWF, SharePoint, Design patterns, UML etc.