Click here to Skip to main content
12,559,389 members (51,652 online)
Click here to Skip to main content
Add your own
alternative version


33 bookmarked

Dynamic Excel file loading with SSIS

, 23 Apr 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
SSIS and Excel Automation


Loading Excel files with same schema but different file name or sheet name is a challenge for many SSIS developers. Microsoft does not have any Dynamic Excel loading component that search Excel files using a pattern and indentify the sheet contains data and load it.

The article helps you to challenge such scenarios. The package described here is able to search Excel files using a pattern and load it without any user intervention.  The Sheet contains data does not have start at first row or first column.


  •        Filename can be anything(ie it should match the pattern used for searching)
  •        Format of excel can be any format (.xls – Office 2003, .xlsx – Office 2007 or later)
  •        Data to be loaded can be any sheet. Zero dependency on sheet name.
  •        Data can start at any different column or row, ie it does not have to be at A1 or A2 or B10 etc.
  •        Report out if any additional fields are added to the excel data after the design.  
  •        Archive processed files to Processed folder.    

Building the package  

  1.     Create Folder structure(please feel free to create anywhere, for simplicity I use C:\ drive).

a.       Create folder SSISLoad in  C:\

b.      Create subfolder in Data on C:\SSISLoad

c.       Create subfolder in Country on C:\SSISLoad\Data

d.      Create subfolder in Test  on C:\SSISLoad\Data\Country

e.      Create subfolder in Processed  on C:\SSISLoad\Data\Country<o:p>    

2. Package Design Preparations

a.       Identify the Sheet that contains data. Rename it to Data

b.      Delete empty rows and columns such that data starts at cell A1

c.       Save it as Country Details - XX.xlsx & Copy to C:\SSISLoad\Data\Country\Test\

d.      Create a new SSIS Package Load Dynamic Excel.dtsx

e.      Create and Set package variable (string type) Input_Share_Root as C:\SSISLoad\Data\

f.        Create and Set package variable (string type) DataFile_Input_Path as Country\

g.       Create and Set package variable (string type) DataFile_Search_Pattern as Country;.xlsx

h.      Create package variable (string type) NewFieldsSummary

i.         Create package variable (string type) NewFields

j.        Create package variable (boolean type) FileFound

k.       Create package variable (string type) SourceFile 

l.         Create package variable (string type) Extraction_Statement

m.    Create package variable (string type) Connection_String

Save the package.  

3. Set  package variable Connection_String 

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\SSISLoad\Data\Country\Test\Country Details - XX.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=YES;IMEX=1";

   4.  Preparing package variable Extraction_Statement

  • Save the package.  

a.       Add a DataFlow Component to package(Drag DataFlow from Tools), name it as Load Excel

b.      Add a Excel connection to package(Right Click Connection manager area >> New Connection >> Excel >> Browse and Point to C:\SSISLoad\Data\Country\Test\ Country Details - XX.xlsx Name the connection as Excel_Source

c.       Right click Excel_Source connection >>Properties >> Expression >> Click Ellipse button

                                                                 i.      Pick Connection String for Property from drop down , click Ellipse on Expression

                                                            ii.      Set expression as @[User::Connection_String] or Drag it from Variables on the expression      builder window.

d.      Go to Load Excel data flow and add Excel Source by dragging from Tool Box, name it as  Country Details

e.      Edit Country Details

f.        Set OLEDB Connection manager  as Excel_Source (from drop down)

g.       Set Data Access Mode as SQL Command (from drop down)

h.      Click Build Query

i.         Click Add Table symbol and Select Data$ and click Add, Click Close

j.        Manually check each and every column, do not check *(All Columns)

k.       Copy the SQL Generated from the query pane:

SELECT        ID, StateProvinceCode, CountryRegionCode, IsOnlyStateProvinceFlag, Name, TerritoryID

FROM            [Data$] 

l.         Cancel the Build Query process. Cancel Excel Source Editor window.

m.    Make the SQL in single line and set as value for Extraction_Statement

n.      Save the package.  

5Set Country Details  excel source component 

a.       Edit Country Details  and set Connection manager  as Excel_Source (from drop down)

b.      Set Data Access Mode as SQL Command from variable(from drop down)

c.       Set Variable name  as User:: Extraction_Statement (from drop down)

d.      Click Columns from Left pane.

e.      See all columns are populated (you may uncheck any column you do not want). Click OK 

6. Set Destination as you like and map columns.

a.       Create OLEDB Destination Connection manager OLEDB_Destination_Conn (sample used local server, Test as DB 

b.    Create table CountryDetails and map columns  


Write Automation Code  

1. Search excel file using the pattern specified in variable DataFile_Search_Pattern.

This pattern separated by extension using semicolon format  

2. Enumerate Excel sheets for the file obtained by Search
a.Use OLEDB driver to create connection to Excel and read schema.

3. Identify Sheet contains data and data start address.
a.Search each sheet for fields used in Extraction_Statement and determine sheetname and data start address.
b.Modify connection string as per the excel file. 

Using the Code 

How it works(Concept)

The package search for the file using the specification supplied, determines the sheet contains data and its start address(like A1 OR C10 etc), and finally modifies the connection string to the Excel file to point the file obtained and modifies SQL Query to read the excel file from the sheet identified. This level of dynamic loading is achieved with the help of user defined variables in the package, and thus the run-time setting of Connection Managers, SQL Command to read/pull data are updated with the help of variables.   

 Let's now analyze concept (code perspective) as below: 

The following method searches file in the path supplied. It uses matching pattern as parameter searchstring  and use parameter extension as filter 

public string SearchFile(string path, string extension, string searchstring)
     DirectoryInfo di = null;
     if (Directory.Exists(path))
         di = new DirectoryInfo(path);
         return "Directory Does not Exist";
     string newestFile;
     IEnumerable<System.IO.FileInfo> fileList = di.GetFiles("*" + searchstring.ToLower() + "*");
     //Create the query
     IEnumerable<System.IO.FileInfo> fileQuery =
         from file in fileList
         where (extension.ToLower().Contains(file.Extension.ToLower()))
         orderby file.LastWriteTime
         select file;


         var FileSearchedResult = (from file in fileQuery orderby file.LastWriteTime select new { file.FullName, file.Name, file.CreationTime }).Last();
         newestFile = FileSearchedResult.FullName;
         FileSearchedResult = null;
         fileList = null;
         di = null;
         return newestFile;
         fileList = null;
         di = null;
         return null;



The following method GetExcleSheetNames enumerates all available sheets in the excelFile supplied as parameter. It uses OLEDDB driver to connect the Excel file and reads the Schema using    GetOleDbSchemaTable. 

private List<String> GetExcelSheetNames(string excelFile)
            OleDbConnection objConn = null;
            System.Data.DataTable dt = null;

                // Connection String. Change the excel file to the file you
                // will search.
                String connString = this.GetConnectionString(excelFile);
                // Create connection object by using the preceding connection string.
                objConn = new OleDbConnection(connString);
                // Open connection with the database.
                // Get the data table containg the schema guid.
                dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

                if (dt == null)
                    return null;

                List<String> excelSheets = new List<string>();

                // Add the sheet name to the collection.
                foreach (DataRow row in dt.Rows)
                    string WorkSheet = row["TABLE_NAME"].ToString();
                    WorkSheet = WorkSheet.StartsWith("'") ? "[" + WorkSheet.Substring(1) : WorkSheet;
                    WorkSheet = WorkSheet.StartsWith("[") ? WorkSheet : "[" + WorkSheet;
                    WorkSheet = WorkSheet.EndsWith("'") ? WorkSheet.Substring(0, WorkSheet.Length - 1) + "]" : WorkSheet;

                    WorkSheet = WorkSheet.Substring(0, WorkSheet.Length - 1).EndsWith("$") ? WorkSheet : WorkSheet.Substring(0, WorkSheet.Length - 1) + "$]";

                    if (!excelSheets.Exists(delegate(string k) { return (k.ToLower() == WorkSheet.ToLower() || k.ToLower() + "$" == WorkSheet.ToLower()); }))
                return excelSheets;
            catch (Exception ex)
                return null;
                // Clean up.
                if (objConn != null)
                if (dt != null)

The following method ScanWorkSheet builds DataSet for first 1000 rows from WorkSheet(parameter)  

private SourceFileDetails ScanWorkSheet(string excelFile, string WorkSheet, string ExtractionStatement, string FindColumn, out bool Success)
          Success = false;
          System.Data.DataSet excelDataSet = new DataSet();
          string connectionString = this.GetConnectionString(excelFile);

          using (OleDbConnection objConn = new System.Data.OleDb.OleDbConnection(connectionString))
                  OleDbDataAdapter cmd = new OleDbDataAdapter("select top 1000 * from " + WorkSheet, objConn);
                  cmd.Fill(excelDataSet, WorkSheet);
              catch { }
                  if (objConn != null)


          SourceFileDetails sd = ScanRows(excelDataSet, ExtractionStatement, FindColumn);
          if (sd != null)
              Success = true;
              sd.FileNamePath = excelFile;
              sd.ConnectionString = this.GetConnectionString(excelFile, true);
              return sd;
          return (SourceFileDetails)null;

The following method ScanRows scans each row find our interested columns. This is the way it identifies sheet contains data, and data start address. They key step in this method is build the list of columns/fields we are interested and search for it. Here we assume those fields/columns occur within 1000 rows. 1000 rows are selected here for performance and nobody create an excel file contains data after leaving first 1000 rows as blank. 

         public SourceFileDetails ScanRows(DataSet excelDataSet, string ExtractionStatement, string FindColumn)
            if (excelDataSet.Tables.Count < 1)
                return (SourceFileDetails)null;

            string ExtractFields = this.ReplaceString(this.ReplaceString(ExtractionStatement, "SELECT", ""), "From [Data$]", "");
            List<string> FindStrings = ExtractFields.Split(',').Select(s => s.Trim().Replace("[", "").Replace("]", "").Replace("#", ".").Replace("(", "").Replace(")", "")).ToList();

            foreach (DataTable dt in excelDataSet.Tables)
                List<ItemValueHolder> FoundAddress = new List<ItemValueHolder>();
                int iDuplicates = 0;

                foreach (DataRow dr in dt.Rows)
                    int iItemColumnIndex = 0;
                    foreach (var fieldValue in dr.ItemArray)
                        object cellData = fieldValue;
                        string sCellData = cellData.ToString().Replace("#", ".").Replace("[", "").Replace("]", "").Replace("(", "").Replace(")", "");

                        if (cellData != null)

                            if (FindStrings.Exists(delegate(string k) { return k.ToLower() == sCellData.ToString().ToLower(); }))
                                if (!FoundAddress.Exists(delegate(ItemValueHolder t) { return t.Item.ToLower() == sCellData.ToLower(); }))
                                    FoundAddress.Add(new ItemValueHolder(sCellData, dt.Rows.IndexOf(dr), iItemColumnIndex));
                    int iTotalFields = FindStrings.Count - iDuplicates;

                    if (100 * FoundAddress.Count / (float)iTotalFields >= 90.00)
                        var query = FoundAddress.GroupBy(
                    item => item.ItemValue,
                    (itemvalue, items) => new
                        Key = (int)itemvalue,
                        Count = items.Count(),

                        int dataStartAddress = (from p in query
                                                where p.Count == (query.Max(it => it.Count))
                                                select p.Key).Max();
                        dataStartAddress += 1;
                        string SheetName = "[" + dt.TableName + "$A" + dataStartAddress.ToString() + ":IV]";
                        string _selectCommand = this.ReplaceString(ExtractionStatement, "[Data$]", SheetName);
                        string reportDate = "";
                        if (!string.IsNullOrEmpty(FindColumn))

                                int ColIndex = (from p in FoundAddress
                                                where p.Item.ToLower() == FindColumn.ToLower()
                                                select p.ColumnIndex).First();
                                DataRow drRowReport = dt.Rows[dataStartAddress];
                                reportDate = drRowReport[ColIndex].ToString();

                            catch (Exception e)
                        string NewFields = GetNewFieldsAtSource(FindStrings, dr);
                        return new SourceFileDetails("", "", "", _selectCommand, reportDate, NewFields);
            return (SourceFileDetails)null;

The following class is used to store Excel file details that we can use to modify connection strings dynamically . The class is nothing but a information bundle; no other important methods that we use inside this class.    

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ST_b050e37a03e84ca7b15b853949d29aa6.csproj
    public class SourceFileDetails
        private string _fileNamePath = "";
        public string FileNamePath
            get { return _fileNamePath; }
            set { _fileNamePath = value; }

        private string _fileExtension = "";
        public string FileExtension
            get { return _fileExtension; }
            set { _fileExtension = value; }

        private string _connectionString = "";
        public string ConnectionString
            get { return _connectionString; }
            set { _connectionString = value; }
        private string _extractQuery = "";
        public string ExtractQuery
            get { return _extractQuery; }
            set { _extractQuery = value; }
        private string _reportDate = "";
        public string ReportDate
            get { return _reportDate; }
            set { _reportDate = value; }

        private string _newFields = "";
        public string NewFields
            get { return _newFields; }
            set { _newFields = value; }
        public SourceFileDetails() { }

        public SourceFileDetails(string fileNamePath, string fileExtension, string connectionString, string extractQuery, string reportDate, string newFields)
            this.FileNamePath = fileNamePath;
            this.FileExtension = fileExtension;
            this.ConnectionString = connectionString;
            this.ExtractQuery = extractQuery;
            this.ReportDate = reportDate;
            this.NewFields = newFields;

The following class is again an intermediate information storage and no other functionality.  

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ST_b050e37a03e84ca7b15b853949d29aa6.csproj
    public class ItemValueHolder
        private string _item = "";
        public string Item
            get { return _item; }
            set { _item = value; }
        private object _itemValue = "";
        public object ItemValue
            get { return _itemValue; }
            set { _itemValue = value; }

        private int _columnIndex;
        public int ColumnIndex
            get { return _columnIndex; }
            set { _columnIndex = value; }

        public ItemValueHolder() { }
        public ItemValueHolder(string item, object itemValue, int columnIndex)
            this.Item = item;
            this.ItemValue = itemValue;
            this.ColumnIndex = columnIndex;

The following method builds the connection string for the particular excel file supplied. It can build connection string irrespective of data has Header row not. Variable HasHeader determines if we have Header row or not. 

public string GetConnectionString(string FileNamePath, bool HasHeader)
            string ConnectionString = "";
            string Extension = Path.GetExtension(FileNamePath).ToLower();
            string BinaryExcelProvider = "Microsoft.Jet.OLEDB.4.0";
            string XmlExcelProvider = "Microsoft.ACE.OLEDB.12.0";
            string BinaryExcelExtProperties = "Excel 8.0";
            string XmlExcelExtProperties = "Excel 12.0";
            string XmlMacroExcelExtProperties = "EXCEL 12.0 Macro";
            string Provider = "";
            string ExtendedProperties = "";

            switch (Extension)
                case ".xls":
                    Provider = BinaryExcelProvider;
                    ExtendedProperties = BinaryExcelExtProperties;

                case ".xlsx":
                    Provider = XmlExcelProvider;
                    ExtendedProperties = XmlExcelExtProperties;

                case ".xlsm":
                    Provider = XmlExcelProvider;
                    ExtendedProperties = XmlMacroExcelExtProperties;

            string Header = ";HDR=NO;IMEX=1";
            if (HasHeader)
                Header = ";HDR=YES;IMEX=1";
            string ConnectionStringFormat = "Provider={0};Data Source={1};Extended Properties=\"{2}{3}\";";

            ConnectionString = string.Format(ConnectionStringFormat, Provider, FileNamePath, ExtendedProperties, Header);
            return ConnectionString;

All codes, packages(SSIS Project Solution file), sql scripts to generate destination table and sample excel files that can used for testing are attached.

Points of Interest 

Multiple files loading with for each file enumerator instead using Search method is also uploaded. Please come back with questions or doubts, I will be glad to help you. 




This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Business Analyst
United States United States
No Biography provided

You may also be interested in...


Comments and Discussions

QuestionRunning the Code Pin
TatMys7-Mar-16 6:37
memberTatMys7-Mar-16 6:37 
AnswerRe: Running the Code Pin
TatMys8-Mar-16 2:47
memberTatMys8-Mar-16 2:47 
QuestionReferences & 64 bit operation Pin
Nique_m3-Jan-13 2:16
memberNique_m3-Jan-13 2:16 
Hi there,

Your solution appears to be able to do exactly what I would be after, however after downloading your sample package (and the associated test files) I ran into a few issues that I'm unable to resolve so far.

As Im sure you can guess, Im not a programmer, so Im hoping this will be straight forward for you to offer some advice on. Current system is VS 2008 BIDS on Server 2008R2

i) If I execute the package in 64 bit mode, it runs to the Excel Connection then reports
<The Excel Connection Manager is not supported in the 64-bit version of SSIS, as no OLE DB provider is available.>

Thats what I would expect.

ii) If I alter the package to run in 32 bit, initially the script has not got the references for "Excel, ICSharpCode or BSAFileLibrary" in the script.

I downloaded and manually added the ICSharpCode and added in the MS.Excel ref but could not find any listing anywhere to grab BSAFileLibrary

Now when I run it I get, which I assume is due to the missing reference

<SSIS package "Load_Dynamic_Multiple_Excel[1] (1).dtsx" starting.
Error: 0x1 at Intialize Connection Settings: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.NullReferenceException: Object reference not set to an instance of an object.
at ST_b050e37a03e84ca7b15b853949d29aa6.csproj.ScriptMain.GetSourceFileDetails(String ExcelFile, String ExtractionStatement, String findColumn)
at ST_b050e37a03e84ca7b15b853949d29aa6.csproj.ScriptMain.Main()
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
at System.Type.InvokeMember(String name, BindingFlags invokeAttr, Binder binder, Object target, Object[] args, CultureInfo culture)
at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()
Task failed: Intialize Connection Settings
Warning: 0x80019002 at Foreach Loop Container: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (1) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
Warning: 0x80019002 at Load Dynamic Multiple Excel: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (1) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS package "Load_Dynamic_Multiple_Excel[1] (1).dtsx" finished: Failure.

I really hope you will be able to assist, even if it might be to say that with my setup, this cannot work.

Many thanks in advance
SuggestionRe: References & 64 bit operation Pin
Nique_m3-Jan-13 5:45
memberNique_m3-Jan-13 5:45 
GeneralRe: References & 64 bit operation Pin
Kuthuparakkal3-Jan-13 18:31
memberKuthuparakkal3-Jan-13 18:31 
GeneralRe: References & 64 bit operation Pin
teh.kok.leong25-Nov-14 22:26
memberteh.kok.leong25-Nov-14 22:26 
GeneralRe: References & 64 bit operation Pin
Kuthuparakkal26-Nov-14 9:32
memberKuthuparakkal26-Nov-14 9:32 
QuestionLoading all multiple worksheets from excel files Pin
Rio Khn28-Oct-12 23:41
memberRio Khn28-Oct-12 23:41 
AnswerRe: Loading all multiple worksheets from excel files Pin
Kuthuparakkal31-Oct-12 17:14
memberKuthuparakkal31-Oct-12 17:14 
GeneralRe: Loading all multiple worksheets from excel files Pin
Rio Khn2-Nov-12 2:36
memberRio Khn2-Nov-12 2:36 
GeneralMy vote of 5 Pin
Kanasz Robert24-Sep-12 5:48
mvpKanasz Robert24-Sep-12 5:48 
GeneralRe: My vote of 5 Pin
Kuthuparakkal24-Sep-12 16:11
memberKuthuparakkal24-Sep-12 16:11 
GeneralSome questions Pin
svenb7730-Jul-12 0:07
membersvenb7730-Jul-12 0:07 
AnswerRe: Some questions Pin
kuthuparakkal30-Jul-12 1:46
memberkuthuparakkal30-Jul-12 1:46 
GeneralRe: Some questions Pin
svenb7731-Jul-12 2:02
membersvenb7731-Jul-12 2:02 
AnswerRe: Some questions Pin
kuthuparakkal2-Aug-12 8:23
memberkuthuparakkal2-Aug-12 8:23 
QuestionRe: Some questions Pin
svenb772-Aug-12 19:31
membersvenb772-Aug-12 19:31 
AnswerRe: Some questions Pin
Kuthuparakkal3-Aug-12 21:08
memberKuthuparakkal3-Aug-12 21:08 
QuestionRe: Some questions Pin
svenb775-Aug-12 21:47
membersvenb775-Aug-12 21:47 
AnswerRe: Some questions Pin
Kuthuparakkal5-Aug-12 23:03
memberKuthuparakkal5-Aug-12 23:03 
QuestionRe: Some questions Pin
svenb776-Aug-12 1:04
membersvenb776-Aug-12 1:04 
AnswerRe: Some questions Pin
Kuthuparakkal6-Aug-12 1:22
memberKuthuparakkal6-Aug-12 1:22 
QuestionRe: Some questions Pin
svenb776-Aug-12 20:28
membersvenb776-Aug-12 20:28 
AnswerRe: Some questions Pin
Kuthuparakkal6-Aug-12 22:01
memberKuthuparakkal6-Aug-12 22:01 
QuestionRe: Some questions Pin
svenb777-Aug-12 2:08
membersvenb777-Aug-12 2:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.161026.1 | Last Updated 23 Apr 2012
Article Copyright 2012 by Kuthuparakkal
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid