Hello readers. After the great success of “Learn MVC project in 7 days”, I am back again with a new Concept – Learn MSBI Project in 7 days. It’s going to be a detailed and step by step series covering almost everything required to be a MSBI developer.
Learn it and develop your more in-depth knowledge with complete step by step MSBI tutorials on http://www.learnmsbitutorials.net/
Day 1 will be an introductory session. Those who already have good idea on MSBI, for them it’s a relaxing time. They can go and play some Video games, Football or can watch movies and catch us with next day or as a real learner they can revise their concepts by just continuing with the chapter.
These are the Day 1 topics.
What we will see in Day 1?
What is BI?
What is DW?
What is ETL?
What is MSBI?
What we need to get started?
Creating the solution
Lab 1 – Creating SSIS project and getting started
If you are looking for a course which will speak about individual options or tasks in MSBI world then you are in a wrong place. Here we will go for a complete project and talk about most of the concepts in MSBI from Project point of View.
BI or Business Intelligence is all about making profitable, valuable and logical decisions.
BI is a process where Data will be converted into information. Information is the knowledge to make proper decisions.
If you look around you will see a huge group of entrepreneurs, businessmans running a particular company and struggling with a big problem called “Decision making”. Many of them can’t take decisions properly.
When we say decision, it does not mean decisions like finalizing tour destination. Whether family tour should be planned for Dubai or to Singapore.☻
Here decision means, Decisions related to business. For example, whether we should hire some more salesmans or not, what should we produce more – Jeans or trousers, what will be the best location for the new branch etc.
What is the reason for such confusions?
They are running a business that means surely they are not stupid.
Then what? Can we consider lack of data is the reason? Now a day’s most of the business maintain all of the business related data somewhere in some format. So considering lack of data a reason won’t be correct.
Actual reason is lack of information.
Information and data are two different things. When I say data followings are the examples.
- Customer1 brought Product1 in India for 65000 Rs.
- Customer2 brought Product2 in US for $888.
These data might be stored in Tabular format inside Sql Server database or MySql database or may be stored in comma separated formatinside text file.It will be a pure technical thing. A decision maker won’t go and look into these individual data sources and then make his decision. Practically it won’t be possible either.
When it comes to decision making Decision Makers will be rather interested in following things.
- What product is sold most?
- Which country is best for a Product of a particular type?
This is information/Knowledge. Using this information Decision Makers in the company or a business can make valuable and profitable decisions.
Don’t you think that, this information is obtained from data only? Yes. The process of converting Data into information is called Business intelligence.
Business Intelligence involves,
- Collecting all the necessary data,
- Displaying all those data in such a way that, all the questions of Decision makers get answered automatically.
Point worth to remember
We understood a fact that decisions will be made based on information and knowledge but one thing worth to notice here is, that information is always dependent on a “number”.
Which country is better for which product? -> Here SalesAmount is that number. Total sales made in each country for each product will tell us whether that product is better for that country or not.
If you didn’t understood this point, just move further with the series. At the end you will understand what I mean.
In layman terms DW or Data warehouse is a Warehouse of data. Place where all the data will be kept.
When it comes to implementing Business intelligence, one of the biggest problem is scattered data.
In real time, it may be possible that a company or a business maintain their data via more than one system. Let’s takean example of an organization called ABC organization. It maintain Employee related information in Sql Server database, Client information in Excel, Sales and Purchase information is Oracle DB. Now information/knowledge generation involves three steps.
- Gathering data from all of these subsystems.
- Applying some computation on all of these data - Data retrieved from each data source will be analysed for information and may be clubbed together.
- Finally displaying them in an easy to understand format.
More the data is scattered more complex data retrieval becomes and more time will be taken to generation information/knowledge.
To solve this problem, industry identified a solution called Data warehouse. It’s going to be a simple database like Oracle, Sql or any other.
All the data maintained inside different sub systems in different format will be retrieved and converted into a common format and stored into this data warehouse. It became the first step in Business intelligence process. Data stored inside this Data Warehouse becomes source for the Information creation. Instead of analysing data in each data source, simply Data Warehouse data will be analysed.
ETL stands for Extract, Transformation and Load.
So far we have understood following things.
- Information will be generated from data using which business people take decisions.
- In real time scenario data will be scattered across multiple systems in different formats. All of those data will be dumped into Data warehouse which ultimately becomes the source for information creation.
Loading data into DW
ETL is a process for loading data into Data warehouse
|In the initial stage of Business Intelligence Data Warehouse will be designedbased on Business requirement. I always share one sentence during my corporate trainings with students. |
“Life will be easier if Data Warehouse is designed properly”.
|But we have to aware about real world reality as well. |
“Mostly it will never be deigned properly”.
|In this course, we will not only learn the correct way of Data Warehouse designing but we will also learn how to deal with bad design. |
Once DW is designed, data will loaded into it by following ETL process.
ETL process can be explained as follows
- Extract –read data from data source
- Transform – convert data to the one format which data warehouse is expecting. Example – data source may contain DateOfBirth but Data Warehouse expects Age. Converting DateOfBirth to Age is called Transformation. This is not a compulsory step. In some situation in may be possible that data is already in the desired format. In that case this step will skipped.
- Load – Finally the data will be dumped into Data Warehouse.
MSBI is an acronym for Microsoft Business Intelligence. It’s the Microsoft suite comprising of various tool for implementing Business intelligence solutions.
Using MSBI we will get an opportunity to create three king of projects.
- SSIS – Which will help us to perform ETL operation or in simple words. It will be used to fill Data warehouse.
- SSAS – Let us create Cube out of Data Warehouse. We will talk about cube in detail as we move further. For now, just take cube as one more storage space like Data Warehouse but here data will be stored in more performance efficient manner. Data retrieval will be faster here compared to normal Data Warehouse.
- SSRS – Let us create reports which display data in nice graphical way with the help of various charts, images etc. SSRS reports are final source of information to end user. By looking into end user will make decisions.
First we will be required Enterprise version of Sql Server 2014. You can download the trial version for learning purpose from http://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2014.
Make sure to check Analysis, Reporting and Integration services at the time of installation.
Once installed you will find “Sql Server Data tools for Visual studio 2013” inside Sql Server folder in the Start Menu.
If you are very new to MSBI, then we suggest go through following article which demonstrates detailed step by step installations of SQL Server and SQL Server Data Tools.
We will create one project now and use the same one throughout the series. So as a start let’s create a blank solution.
For those who are new to the word “solution”
In visual studio world we create projects. Each Project will be of a particular type. Type of the project decides what that project is capable of. Example of project types are “Console Application”, “Windows Forms Application” etc. In our case we will create 3 kinds of projects,
- Integration Services Project – SSIS
- Analysis Services Multidimensional and Data mining project – SSAS
- Reporting Services project – SSRS
Projects are logically grouped into solution. One solution may have more than one project of different type. Steps for creating a blank solution are as follows.
Step 1 – Open Sql Server Data tools
Open Sql server data tools from the location specified above.
Step 2- Create New Solution
Click on File >> New >> Project. “New Project” dialog box will open.
Select “Visual Studio Solutions” located inside “Other Project Types” group in left side section.
From right side select “Blank Solution”.
Name the Solution as “SalesBI” and Click Ok.
So let’s understand the very first requirement of the project. Goal of this demo is understanding SSIS project basics.
We have Customer.txt file as follows. (Make sure you create one and save it somewhere for demo purpose.)
Note: SalesDate is in “dd/mm/yyyy” format.
First requirement is,loading data from above text file into TblCustomer in the SalesDW database in Sql server management studio. (Create the database and table in your machine for demo purpose)
Just to remind you, I am going to talk about the proper way of Data warehouse designing in one of the upcoming chapter. In the initial stage we will just try to get the feel of MSBI.
Step 1 – Create SSIS project.
Right click the solution and select Add>>New Project.
It will launch “Add New Project” dialog box. Select “Business Intelligence” from left section and “Integration service project” from right side. Name project as “SalesETL” and click Ok.
Just for the revision – SSIS or Integration service project will be mainly used to perform ETL operation.
Step 2 – Create New Package
Right click the newly created project and select Add>>New Item. It will launch “Add New Item” dialog box. Select “New SSIS Package”. Name it as CustomerETL and click Add.
- In SSIS world Package is an executable file.
- Visual studio provides a UI interface called SSIS designer for designing packages.
- Internally package is an XML file which will be executed by special utility called “dtsexec”.This utility will installed as a part of MSBI installation. We will soon understand it in detail.
- Packages will be have an extension called DTSX which stands for “Data transformation services executable”. In earlier version of sql server to perform ETL we had a feature called “Data Transformation services”. DTSX is named after it.
Step 3 – Design Control Flow
Double click the new created package in solution explorer.
As you can see in SSIS designer we have several tabs – Control Flow, Data Flow, Parameters, Event Handlers, and Package Explorer. We will look into each one of these tabs one by one.
Right now we are interested in Control Flow tab.
- This tab will let us decide what need to be done.
- You will notice we have SSIS toolbox in left side. If it is not available in your demo then you can get it from View>>Other Windows>>SSIS toolbox.
Toolbox contains tasks like Data Flow Task, Execute Sql Task etc. Each task let us achieve some different behaviour. Example –
- Send Mail Task – It will be used to send emails using SSIS package
- Execute SQL Task – Let us execute SQL queries and work out on the result set.
- Script Task – Let us execute custom C# code
- Right now our requirement asks us to load data from a text file to sql server database and for that we will be required “DataFlow Task”. Simply drag the task from SSIS toolbox to SSIS designer.
Right click the newly created “Data Flow Task” and select rename. Name it as “CsvCustomer to TblCustomer”.
Step 3 – Add DestinationConnection Manager
Other than all these tabs, SSIS designer also provides something called Connection Managers section. It’s located in the bottom Corner of the designer.
Simply right click the area and select “New Ado.Net Connection…”
Enter Server Name, Enter Credential, and Select Database and click OK.
Click Ok again.
Rename connection manager to SalesDWConnection.
Step 4 – Add SourceConnection Manager
Now it’s time to add Source Connection Manager.
Right click the connection manager area but this time select “New Flat file Connection...”
It will launch “Flat file Connection manager editor”.
Put name as “CustomerCSVConnection” and select the source customer file.
In left side of the dialog couple of sections are defined like General, Columns, and Advanced etc.
Select Columns sections. No need to change any settings at this moment. Simply click Ok.
Step 5 – Configure Data Flow Task
After that double click Data Flow task. It will take you to Data flow tab.
Step 6 –Add Source
Data flow tab is the one which will actually decide ETL. Here we will define, from where to where data will flow and if there is any transformation required or not.
As soon as you move to the Data Flow tab, you will notice a change in the SSIS toolbox.
As you can see, tasks in the toolbox is segregated into three groups – Sources, Transforms and Destinations. “Common” is a special group which contain mostly used sources, transformations and destinations.
We are interested in “Flat file source”. You will find it in “Other Sources” section. Simply drag it to SSIS designer and rename it to “CustomerCsv”
Step 7 – Configure Source
Double click the CustomerCsv source. It will launch “Flat file source editor”. Select “CustomerCSVConnection” from the dropdown.
Move to the column section and make any changes if required and click ok.
Step 8 – Add Destination
Drag Ado.Net destination from “Other Destinations” section to SSIS designer and rename it to “TblCustomer”.
Step 9 – Configure Destination
Double click the “TblCustomer” destination. This time instead of showing configuration editor window, following error message will be displayed.
Destination task cannot configured unless and until it have a proper input.
Now click the “CustomerCsv” source. You will notice two arrows coming out of it. Blue one and red one. We will talk about red one later. Blue one is the one though which data will flow. Hence this arrow is called as “Data Flow Path”. Take that arrow and connect it to “TblCustomer” destination.
Step 10- Configure Destination (Continued)
Double click the “TblCustomer” destination again. In the “configuration editor window” select Connection manager to “SalesDWConnection” and table to “TblCustomer”
Click on “Mappings” section and confirm that all mappings are correct.
Step 11- Execute and Test the package
As I said sometime back, Package will be executed by a special utility called “DtsExec.exe”. Visual studio makes our life easy at the time of development. For testing simply press F5 ☻ everything else will be handled by Visual studio and package start executing.
Red Cross mark indicates that execution failed.
Step 11- Understand the error
Go to progress tab and scroll down and find the first statement with “Red Cross” icon in left. Unfortunately it won’t be possible to read the complete error in this screen. Hence simply right click it say “copy message text”.
Paste it in some text file. Error will be as follows.
“[TblCustomer ] Error: An exception has occurred during data insertion, the message returned from the provider is: The given value of type String from the data source cannot be converted to type datetime of the specified target column.”
Step 12- Add Data Conversion transformation
Error is very common, we have DateTime column in Database and in text file all columns will be of type string by default.
To solve the problem we will use a special transformation called “Data Conversion” transformation.
First stop the execution by pressing stop button in toolbox.
Now drag “Data Conversion” transformation from toolbox to designer window. Right click the “Data Flow Path” connecting Source and Destination and say Delete. Finally connect “Data Flow Path” coming from source to “Data Conversion” transformation.
Step 13-Configure Data Conversion
Double click Data Conversion transformation and perform following step one after the other.
- Check SalesDate in Available Columns. It will add one new entry into following Grid
- Change Alias name to SalesDate_DateTime
- Select datatype as “Database_timestamp”
Step 14 – Reconfigure the Destination
Connect “Data Flow Path” coming from “Data Conversion” transformation to “TblCustomer” destination and then double click it.
Change the mapping of SalesDate in destination to SalesDate_DateTime in available columns.
Step 15 – Execute and Test
Re-execute the package by pressing F5. This time Package will complete its execution.
Open database table and check the records.
Everything looks correct but in reality something is wrong. In source data third and fourth row had SalesDate as “12/1/2013”(12 Jan 2013) and “15/06/2015”(15 Jun 2015) respectively where as in destination third and fourth row have SalesDate as “1 Dec 2013” and “15 Jun 2015” respectively.
Let’s understand what exactly happened.
- “Data Conversion” transformation assumed that dates are in “mm/dd/yyyy” format hence updated first three records accordingly.
- In the last source record SalesDate is “15/06/2015” which is not a proper date if “mm/dd/yyyy” is considered.Hence “DataConversion”transformation assumed it is in “dd/mm/yyyy” format and updated destination accordingly.
It’s logically incorrect. To make it correct we have to make sure that, whenever DateTime value comes as a string value it should be in proper “mm/dd/yyyy” format.
Step 16 – Add “Derived Column” transformation
First truncate TblCustomer table in destination manually.
Stop the execution and remove Connection between “CustmerCsv” source and “Data Conversion” transformation.
Add “Derived Column” transformation from toolbox to designer window.
Connect “CustomerCsv” source to “Derived Column” transformation.
Step 17 – Configure “Derived Column” transformation.
Double click the transformation.
Create a new derived column called SalesDate_MMDDYYY with following expression.
|SUBSTRING(SalesDate,FINDSTRING(SalesDate,"/",1) + 1,FINDSTRING(SalesDate,"/",2) - |
|FINDSTRING(SalesDate,"/",1) - 1) + "/" + SUBSTRING(SalesDate,1,FINDSTRING(SalesDate,"/",1) - 1) |
|+ "/" + SUBSTRING(SalesDate,FINDSTRING(SalesDate,"/",2) + 1,LEN(SalesDate) - |
Above expression just converts current SalesDate which is in “dd/mm/yyyy” format to new SalesDate in “mm/dd/yyyy” format.
Step 18 – Reconfigure “Data Conversion” transformation
Connect “Derived Column” transformation to “Data Conversion” transformation via “Data Flow Path”.
Double click “Data Conversion” transformation and just like before, create a converted column for SalesDate_MMDDYY column called SalesDate_DateTime.
Step 19 – Reconfigure Destination.
Connect “Data Conversion” transformation to Destination. Double click the destination. Map SalesDate column in destination to SalesDate_DateTime column in input columns.
Step 20 - Execute and Test
One final time re-execute the package. Make sure to truncate the table before execution.
Hope you enjoyed reading Day 1. Stay tuned for Day 2. Lot of learning and challenges are there in the way.
For more stuff like this click here. Subscribe to article updates or follow at twitter @SukeshMarla
You can also refer the below 1 hour MSBI youtube video :- Learn MSBI in 4 days.
Click here for more MSBI Step by step Tutorials.