Click here to Skip to main content
Click here to Skip to main content

SSIS: Transform data using Regex

, 21 Jan 2013 BSD
Rate this:
Please Sign up or sign in to vote.
Transform data by using regex and Script Component.

Introduction

In this tutorial I will show how to transform data using a Script Component within a Data Flow Task and regex. My source data contains customer name, and I need to separate it to first, middle, and last names.

Background

Regular Expressions are a powerful way for you to search for patterns in strings of text. In SSIS, we can use Regular Expressions to assist us in cleansing or transforming data.

This is my partial "Customer" reference table.

What you need

Download script and execute in SSMS.

-- Create database
create database Test
go
use Test
go
 
USE [Test]
GO

-- Create customer table
if exists (select * from sys.tables where name = 'customer_destination')
drop table customer_destination
go

CREATE TABLE [dbo].[customer_destination](
	[customer_number] [varchar](10) NULL,
	[customer_type] [varchar](1) NULL,
	[name] [varchar](100) NULL,
	[gender] [varchar](1) NULL,
	[email_address] [varchar](200) NULL,
	[date_of_birth] [datetime] NULL,
	[occupation] [varchar](50) NULL,
	[status] [varchar](3) NULL,
	[fname] [varchar](50) NULL,
	[mname] [varchar](50) NULL,
	[lname] [varchar](50) NULL
)

This script was tested in SQL Server 2008.

Create Project

Open SQL Server Business Intelligence Development Studio.

Then go to File->New->Project and select Integration Service Project.

Select "Data Flow Task" from "Control Flow Items" and drag it on "Control Flow" tab. Then double click it.

Select "Excel Source" from "Data Flow Source" and drag it on "Data Flow" tab. Double click on “Excel Source” task to configure it.

Click New button for new OLE DB connection or select from existing connection.

Click Browse button to select the "Customer.xls" file from your local path.

Select the Excel sheet name.

Check the available column and click OK.

Select "Data Conversion" from "Data Flow Transformation" and drag it on the "Data Flow" tab and connect the extended green arrow from “Excel Source” to your "Data Conversion".

Convert data. The length should be the same, otherwise you will get a warning message. Add "convert" prefix in each value.

Select "Script Component" from "Data Flow Transformation" and drag it on the "Data Flow" tab. Connect the extended green arrow from “Data Conversion” to your "Script Component".

Click Input Column tab and select the "convert_name" column.

Click the Output Column tab and create three outputs: "fname", "mname", and "lname".

Click the "Script" tab and click "Edit Script". I have chosen my ScriptLanguage as "Microsoft Visual C# 2008".

In the Main.cs file we need to add a reference. We need this for Regular Expressions.

using System.Text.RegularExpressions;

In the Main.cs file it's time to write some code in the Input0_ProcessInputRow procedure.

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    /*
      Add your code here
    */
    string keyVal = Row.convertname.ToString();
    Match match = Regex.Match(keyVal, @"^(?<first>\w+) (?<last>\w+)(?: (?<middle>\w+))?$");
    if (match.Success)
    {
        string f = match.Groups["first"].Value;
        string l = match.Groups["last"].Value;
        string m = null;
        if (match.Groups["middle"].Success)
        {
            m = match.Groups["middle"].Value;
        }

        Row.fname = f;
        Row.lname = l;
        Row.mname = m;
    }
    else
        Row.fname = Row.convertname.ToString();
}</middle></last></first>

Select "OLE DB Destination" from "Data Flow Destination" and drag it on the "Data Flow" tab. Connect the extended green arrow from “Script Component” to your "OLE DB Destination".

Double click on the "OLE DB Destination" task to configure it.

Click New to create a new connection.

Select "Server Name", "Authentication", and "Database" which will be "Test" for this example. Click Test Connection for checking, then click OK and then again click OK.

Select the "customer_destination" table.

Click the mapping tab and map each field and click OK.

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

To check what is happening here:

SELECT [customer_number]
      ,[customer_type] as [type]
      ,[name]
      ,[gender]
      ,[email_address]
      ,[date_of_birth]
      ,[occupation]
      ,[status]
      ,[fname]
      ,[mname]
      ,[lname]
  FROM [Test].[dbo].[customer_destination]

Conclusion

You can see that the fname, mname, and lname fields have been populated. The .NET Framework includes full support for regular expressions (Regex), in the System.Text.RegularExpressions namespace. Regex provides an incredibly powerful way of defining and finding string patterns. You can use them for string pattern match, data cleaning, and data transformation. Email, postal code, and phone number verification are a few of the uses. I have included some Regex here, though all are not perfect, they are collected from Google.

Mail: [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Phone: ^[0-9+\(\)#\.\s\/ext-]+$

First character capital:- ^[A-Z][a-zA-Z0-9]+$

I hope this might be helpful to you!

References

History

None so far.

License

This article, along with any associated source code and files, is licensed under The BSD License

Share

About the Author

mparvez
Software Developer (Senior) icddr,b
Bangladesh Bangladesh
More than 8 years experience on Programming and Project implementation, I was primarily involved with projects for private organization,Govt.(Bangladesh Army,DG Health,RJSC), NGO (SEDF,WFP). Presently I am working at ICDDR,B and enhancing Hospital Management System developed by Microsoft Dynamic NAV and Windows Mobile Application 5.0
 
An active supporter of Open Source technology, my interested areas are ERP, IT Audit, Data warehouse, BI etc.
 
Playing Guitar for 15 years, my interested music style is Blues Rock,Neo Classical.
 
Certification
 
70-540:Microsoft® Windows Mobile® 5.0 - Application Development
MB7-514:Microsoft Dynamics™ NAV 5.0 C/SIDE Introduction
MB7-516:Microsoft Dynamics™ NAV 5.0 Solution Development
MB7-517:Microsoft Dynamics™ NAV 5.0 Installation and Configuration
MB7-515:Microsoft Dynamics™ NAV 5.0 Financials
70-432:Microsoft SQL Server 2008 - Implementation and Maintenance
70-450:PRO: Designing, Optimizing and Maintaining a Database Administrative Solution Using Microsoft SQL Server 2008
70-448:Microsoft SQL Server 2008, Business Intelligence Development and Maintenance
312-50:Certified Ethical Hacker
 
Web :http://masudparvezshabuz.appspot.com
Blog :http://masudparvezshabuz.wordpress.com
linkedin :http://www.linkedin.com/in/masudparvez
Follow on   Google+

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.141223.1 | Last Updated 21 Jan 2013
Article Copyright 2013 by mparvez
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid