The Union All and Merge SSIS data flow transformations provide confusingly similar functionality by combining result sets. So what are the differences and when should you use one or the other? Despite the almost identical results, there are some clearly defined rules as when to use the Union All over the Merge and visa versa. The Union All should be used over Merge when:
- The transformation inputs are not sorted
- The combined output does not need to be sorted
- The transformation has more than two inputs
Probably the most notable limitation of a Merge transformation that the Union All overcomes is that a Merge can accept ONLY two inputs while the Union All supports multiple inputs. Another detractor to the Merge is that both inputs must be sorted. This can be done by using a Sort transformation task before the Merge or using an
ORDER BY clause in the source and setting the
IsSorted property and setting the
SortKeyPosition of the columns in the sort order desired. Technet outlines setting the sort order here.
Both transformations require columns in the inputs have matching metadata, for example they must have compatible data types. This requirement is the same as any T-SQL set operator.
To demonstrate these two transformations, create an SSIS package first with an Execute SQL task that will be used to populate two different tables with an identical schema to be used for both transformations:
IF EXISTS(SELECT * FROM sys.tables WHERE name = ‘employees’)BEGIN
DROP TABLE employeesEND
IF EXISTS(SELECT * FROM sys.tables WHERE name = ‘customers’)BEGIN
DROP TABLE customers
SELECT TOP 10 BusinessEntityID,
ORDER BY BusinessEntityID
FROM AdventureWorks2012.Person.Person _
WHERE BusinessEntityID IN(3, 5, 8, 277, 45, 14847, 3456, 76, 9874, 15937)
Now, add a data flow task that will be used for the
Union All transformations with two OLEDB data sources, one for the
customers table and one for the
employees. Then, add a
Union All transformation and connect both sources:
Add a flat file destination and connect the Union All transformation.
Add another data flow task and again add two sources, one for
employees, each source should use a query with an
ORDER BY clause using the
ORDER BY BusinessEntityID
Add a Merge transformation and connect both sources to the Merge transformation. An error now appears saying that the
IsSorted property must be configured for the Merge transformation. Right click on both
employee sources and go to the advanced editor and select the Input and Output Properties tab and select the OLE DB Source Output. In the Common Properties pane, set the
IsSorted property to
IsSorted property has been set, you must still set the column(s) that are sorted. Open the OLE DB Source Output and the Output Columns and select the
BusinessEntityID and set the
Again, add a flat file destination and connect the Merge transformation.
Running the package and reviewing the output files shows near identical results with the exception that the NamesMerged file has the same records, but they are ordered by the
The sample package outlined can be downloaded here.