Click here to Skip to main content
15,886,110 members
Articles / Programming Languages / SQL

SSIS Do I Union All or Merge??

Rate me:
Please Sign up or sign in to vote.
4.43/5 (3 votes)
28 Mar 2014CPOL2 min read 36.8K   1   3  
SSIS - Do I Union All or Merge?

The Union All and Merge SSIS data flow transformations provide confusingly similar functionality by combining result sets. So what are the differences and when should you use one or the other? Despite the almost identical results, there are some clearly defined rules as when to use the Union All over the Merge and visa versa. The Union All should be used over Merge when:

  1. The transformation inputs are not sorted
  2. The combined output does not need to be sorted
  3. The transformation has more than two inputs

Probably the most notable limitation of a Merge transformation that the Union All overcomes is that a Merge can accept ONLY two inputs while the Union All supports multiple inputs. Another detractor to the Merge is that both inputs must be sorted. This can be done by using a Sort transformation task before the Merge or using an ORDER BY clause in the source and setting the IsSorted property and setting the SortKeyPosition of the columns in the sort order desired. Technet outlines setting the sort order here.

Both transformations require columns in the inputs have matching metadata, for example they must have compatible data types. This requirement is the same as any T-SQL set operator.

To demonstrate these two transformations, create an SSIS package first with an Execute SQL task that will be used to populate two different tables with an identical schema to be used for both transformations:

SQL
USE tempdb;
GO

IF EXISTS(SELECT * FROM sys.tables WHERE name = ‘employees’)BEGIN
DROP TABLE employeesEND

IF EXISTS(SELECT * FROM sys.tables WHERE name = ‘customers’)BEGIN
DROP TABLE customers
END

SELECT TOP 10 BusinessEntityID,
      FirstName,
      MiddleName,
      LastName
INTO employees
FROM AdventureWorks2012.Person.Person 
ORDER BY BusinessEntityID

SELECT BusinessEntityID,
      FirstName,
      MiddleName,
      LastName
INTO customers
FROM AdventureWorks2012.Person.Person _
WHERE BusinessEntityID IN(3, 5, 8, 277, 45, 14847, 3456, 76, 9874, 15937) 

Now, add a data flow task that will be used for the Union All transformations with two OLEDB data sources, one for the customers table and one for the employees. Then, add a Union All transformation and connect both sources:

image Add a flat file destination and connect the Union All transformation.

imageAdd another data flow task and again add two sources, one for customers and employees, each source should use a query with an ORDER BY clause using the BusinessEntityID:

SQL
SELECT BusinessEntityID,
      FirstName,
      MiddleName,
      LastName
FROM employees/customers
ORDER BY BusinessEntityID

Add a Merge transformation and connect both sources to the Merge transformation. An error now appears saying that the IsSorted property must be configured for the Merge transformation. Right click on both customer and employee sources and go to the advanced editor and select the Input and Output Properties tab and select the OLE DB Source Output. In the Common Properties pane, set the IsSorted property to True:

image Once the IsSorted property has been set, you must still set the column(s) that are sorted. Open the OLE DB Source Output and the Output Columns and select the BusinessEntityID and set the SortKeyPosition to 1:

image Again, add a flat file destination and connect the Merge transformation.

Running the package and reviewing the output files shows near identical results with the exception that the NamesMerged file has the same records, but they are ordered by the BusinessEntityID.

The sample package outlined can be downloaded here.

This article was originally posted at http://www.sqlsafety.com?p=751

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Database Developer
United States United States
David retired as a Sergeant with the Cape Coral Police Department after 22 years of service. His final 10 years of duty were as a database administrator and developer in the Administrative Services Division. He began his career with the police department in 1990 in the patrol division and worked various assignments until being promoted to Sergeant in 1998. Based on his education and experience David was assigned to Administrative Services in 2002 and was responsible for database administration, software integration, and development for public safety. David’s primary focus and expertise is with SQL Server, reporting services, integration services, and analysis services, and he was recognized for his work by SQL Server Magazine as “Innovator of the Year” runner up in 2007. David is an MCITP for SQL Server 2005 and 2008 in both database administration and business intelligence and is a Microsoft Certified Trainer. He regularly posts on the MSDN SQL Server forums where he also serves as a moderator, and is a contributor at SQLCLR.net. In addition to his knowledge of SQL Server David works as a software developer using VB.net and C# and has worked extensively in SharePoint development.

Comments and Discussions

 
-- There are no messages in this forum --