Using SQL Bulk Load to Upload Fixed Width Text Files

Alden John Snow

Rate me:

4.20/5 (3 votes)

3 Mar 2009CPOL3 min read

44.7K

401

This project demonstrates what is likely the fastest way to upload one of those fixed width reports from a mainframe to a SQL server table.

Download source - 34.54 KB

Introduction

The class needs a table on a SQL Server somewhere to import into, a connection to that database, a file to load, the names of the fields to load the data into and the possessions in the file to load. I set this up so that everything can come from application settings, and put a quick WPF file browser front end together to demonstrate how to use the class.

Background

Everywhere I have worked, it has been important to be able to load fixed width files from mainframe reports that come from data sources that can't or at-least aren't provided by web services, ODBC or other sources. Every time there is a new version of Visual Studio, this gets a little easier to write. This is my latest class to do the import.

Using the Code

The simplest way to use my BulkLoad class is to make setting strings for what needs to be loaded and pass them to the BulkLoadFromStrings function. In this example I have set up everything as app settings.

This is what the config file looks like:

XML

<configuration>
  <configsections>
    <sectiongroup name=""userSettings"" publickeytoken="b77a5c561934e089""
          culture="neutral," version="2.0.0.0,"
          type=""System.Configuration.UserSettingsGroup,">
      <section name=""Bulk_Import.Properties.Settings""
          publickeytoken="b77a5c561934e089"" culture="neutral," version="2.0.0.0,"
          type=""System.Configuration.ClientSettingsSection,"
          requirepermission=""false"" allowexedefinition=""MachineToLocalUser"">
    </sectiongroup>
  </configsections>
  <usersettings>
    <bulk_import.properties.settings>
      <setting name=""TableName"" serializeas=""String"">
        <value>drcAlden</value>
      </setting>
      <setting name=""FieldNamesCSV"" serializeas=""String"">
        <value>CLSNumber,YrPeriod,Vendor,Department,BoxCode,StoreNumber,StoreName,
         TypeScan,UPC   , Manufac,Description,Qty     ,Base   ,Credit  ,GSNumber</value>
      </setting>
      <setting name=""DefaultDir"" serializeas=""String"">
        <value>C:\</value>
      </setting>
      <setting name=""DefaultFile"" serializeas=""String"">
        <value>FixedWidthFile.txt</value>
      </setting>
      <setting name=""FormatCSV"" serializeas=""String"">
        <value>4-8     ,10-14    ,15-22  ,23-26    ,27-36   ,38-48      ,48-77   ,
           78-80    ,80-94  ,94-103 ,103-123   ,124-133 ,133-146 ,147-160,161-170</value>
      </setting>
    </bulk_import.properties.settings>
  </usersettings>
  <connectionstrings>
    <add name=""DamageReclaimEntities"" security="True;MultipleActiveResultSets=False"""
        catalog="Database;Integrated" source="Server;Initial"
        string=""Data" providername=""System.Data.EntityClient""
        connectionstring=
        ""metadata=res://*/Model1.csdl|res://*/Model1.ssdl|res://*/
        Model1.msl;provider=System.Data.SqlClient;provider">
    <add name=""Bulk_Import.Properties.Settings.DamageReclaimConnectionString""
        security="True"" catalog="Database;Integrated"
        source="DBServer;Initial" providername=""System.Data.SqlClient""
        connectionstring=""Data">
  </connectionstrings>
</configuration>

The app settings were just added by right clicking on the project file though, so, don't worry about having to retype a bunch of perfectly valid XML unless you are into all that.

This BulkLoadFromStrings function basically calls the class's parse routines on these strings and uses them to Load the file.

/// <summary>
/// Does bulk load based on settings for file and table to be loaded.
/// </summary>
/// <param name=""TableName"">Name of table to be loaded.</param>
/// <param name=""FieldNamesCSV"">Comma Separated list of field names </param>
/// <param name=""FieldPositionCSV"">Comma separated list of field positions
/// separated by dashes</param>
/// <param name=""FileNameAndPath"">Path to fixed width file with data to be
/// loaded.</param>
/// <param name=""SQLConnectionString"">Connection string for DB to load
/// data in.</param>
/// <returns></returns>
public int BulkLoadFromStrings (   string TableName_in
                                ,   string FieldNamesCSV_in
                                ,   string FieldPositionCSV_in
                                ,   string FileNameAndPath_in
                                ,   string SQLConnectionString_in
                                )
{
    int iRecordsLoadedOut = 0;
    BulkLoadFields blfldsToLoad;
    BulkLoadTable bltblToLoad;
    SqlConnection con = null;
    Stream strmToLoad = null;
    DataTable dtToLoad = null;
    int iRecordsPreLoaded = 0;
    con = MakeConnection(SQLConnectionString_in);
    blfldsToLoad = new BulkLoadFields(FieldNamesCSV_in, FieldPositionCSV_in);
    bltblToLoad = MakeBulkLoadTable(TableName_in, con, blfldsToLoad.FieldNames);
    strmToLoad = FileToStream(FileNameAndPath_in);
    dtToLoad = bltblToLoad.Table;
    iRecordsPreLoaded = LoadTable(ref dtToLoad, blfldsToLoad.FieldPositions,
        strmToLoad);
    strmToLoad.Close();
    strmToLoad = null;
    bltblToLoad.Table = dtToLoad;
    iRecordsLoadedOut = DoBulkLoad(bltblToLoad, con);
    AddExportDate(FileNameAndPath_in, con.Database.ToString(),
        con.DataSource.ToString());
    return iRecordsLoadedOut;
}

There are 2 structures that the class uses to pass things around. The BulkLoadFields structure is where the collection of field positions and names are stored, and the BulkLoadTable is simply a datatable with these SQLBulkInsert maps set up for each of its columns. These maps turned out to be extremely important to getting the Bulk insert to work.

Because the SQLBulkCopy object seemed to be so sensitive to datatype and data size differences, the best solution seemed to be to create a table to load based on an existing table. I saw other examples wherein data readers did something similar, and it seems like there should be a better way (or at least one less specific to SQL Server) to get a table definition, but selecting the description from SQL's system tables has worked well for me.

/// <summary>
/// Creates BulkLoad table from a table definition from SQL system
/// objects and the list of column names
/// </summary>
/// <param name=""TableName"">Name of table to be loaded or copied from</param>
/// <param name=""con"">Database to get table def from</param>
/// <param name=""aFilesColumns"">Columns to define.</param>
/// <returns>BulkLoad table from a table definition from SQL system
/// objects</returns>
public BulkLoadTable MakeBulkLoadTable  (   string TableName
                                    ,   System.Data.SqlClient.SqlConnection con
                                    ,   string[] aFilesColumns
                                    )
{
    BulkLoadTable bltOut;
    bltOut = new BulkLoadTable();
    DataTable dtOut = null;
    SqlBulkCopyColumnMappingCollection sbcMapsOut = null;
    System.Data.SqlClient.SqlBulkCopy sbc = null;
    SortedList slFields = null;
    string[] aFilesFormat = null;
    string strFieldDFormatCSV = string.Empty;
    string strColumnName = string.Empty;
    string strColumnType = string.Empty;
    string strFromFile = string.Empty;
    string strFile = string.Empty;
    SqlDataReader sdr = null;
    SqlDbType FieldType = SqlDbType.Variant;
    SqlCommand sqlCmd = null;
    int iCharacterMaximumLength = 0;
    StringBuilder sbInsertFields = null;
    StringBuilder sbSelectFields = null;
    //Get a data reader to peek at the column types that we will be
    //adding rows to.
    sqlCmd = new SqlCommand(" select Column_NAme, Data_Type, ISNULL(
        Character_Maximum_Length, 0) Max " +
                            " from information_schema.columns " +
                            " where table_name like '" + TableName + "'", con);
    sqlCmd.CommandType = CommandType.Text;
    if (sqlCmd != null)
    {
        aFilesFormat = strFieldDFormatCSV.Split(char.Parse(","));
        if (aFilesFormat != null)
        {
            if (con.State != ConnectionState.Open)
            {
                con.Open();
            }
            sdr = sqlCmd.ExecuteReader(CommandBehavior.CloseConnection);
            if (sdr != null)
            {
                sbSelectFields = new StringBuilder();
                foreach (string strColName in aFilesColumns)
                {
                    if (sbSelectFields.Length!=0)
                    {
                        sbSelectFields.Append(",");
                    }
                    sbSelectFields.Append("[");
                    sbSelectFields.Append(strColName.Trim());
                    sbSelectFields.Append("]");
                }
                SqlDataAdapter sdaForWrite = new SqlDataAdapter(
                    "Select " + sbSelectFields.ToString() + " from " +
                    TableName + " where 1= -1", con);
                sbc = new System.Data.SqlClient.SqlBulkCopy(con);
                sbcMapsOut = sbc.ColumnMappings;
                sbInsertFields = new StringBuilder("INSERT INTO ");
                sbInsertFields.Append(TableName);
                sbInsertFields.Append(" (");
                sbInsertFields.Append(sbSelectFields.ToString());
                sbInsertFields.Append(") VALUES (");
                sbInsertFields.Append(sbSelectFields.Replace(
                    "[", "@").Replace("]", ""));
                sbInsertFields.Append(")");
                sdaForWrite.InsertCommand = new SqlCommand();
                sdaForWrite.InsertCommand.CommandText = sbInsertFields.ToString();
                sbInsertFields = null;
                slFields = new SortedList();
                if (sdr.HasRows)
                {
                    if (sdr.Read())
                    {
                        do
                        {
                            strColumnName = sdr["Column_NAme"].ToString().Trim();
                            if (HasColumn(aFilesColumns, strColumnName))
                            {
                                strColumnType = sdr["Data_Type"].ToString();
                                if (sdr["Max"] != null)
                                {
                                    iCharacterMaximumLength = (int)sdr["Max"];
                                }
                                else
                                {
                                    iCharacterMaximumLength = 0;
                                }
                                 switch (strColumnType)
                                {
                                    case "bigint":
                                        FieldType = SqlDbType.BigInt;
                                        break;
                                    case "int":
                                        FieldType = SqlDbType.Int;
                                        break;
                                    case "smallint":
                                        FieldType = SqlDbType.SmallInt;
                                        break;
                                    case "tinyint":
                                        FieldType = SqlDbType.TinyInt;
                                        break;
                                    case "bit":
                                        FieldType = SqlDbType.Bit;
                                        break;
                                    case "decimal":
                                        FieldType = SqlDbType.Decimal;
                                        break;
                                    case "money":
                                        FieldType = SqlDbType.Money;
                                        break;
                                    case "smallmoney":
                                        FieldType = SqlDbType.SmallMoney;
                                        break;
                                    case "nchar":
                                        FieldType = SqlDbType.NChar;
                                        break;
                                    case "ntext":
                                        FieldType = SqlDbType.NText;
                                        break;
                                    case "nvarchar":
                                        FieldType = SqlDbType.NVarChar;
                                        break;
                                    case "datetime":
                                        FieldType = SqlDbType.DateTime;
                                        break;
                                    case "smalldatetime":
                                        FieldType = SqlDbType.SmallDateTime;
                                        break;
                                    case "char":
                                        FieldType = SqlDbType.Char;
                                        break;
                                    case "varchar":
                                        FieldType = SqlDbType.VarChar;
                                        break;
                                }
                                sdaForWrite.InsertCommand.Parameters.Add(
                                    "@" + strColumnName, FieldType,
                                    iCharacterMaximumLength, strColumnName);
                                sbcMapsOut.Add(new SqlBulkCopyColumnMapping(
                                    strColumnName, strColumnName));
                            }
                        }
                        while (sdr.Read());
                    }
                }
                //Done with the data reader now that we have the field types.
                sdr.Close();
                DataSet ds = new DataSet();
                sdaForWrite.Fill(ds, TableName);
                dtOut = ds.Tables[TableName];
                sdr = null;
                bltOut = new BulkLoadTable(dtOut, sbcMapsOut);
            }
        }
    }
    return bltOut;
}

With a table and fields defined, the next step is to load the data from the flat file. Having already loaded the file into a stream, I basically split it up by the end of line character and then loop through the resulting array of strings to parse out the fields from each line.

/// <summary>
/// Loads a datatable by parsing a fixed width file.
/// </summary>
/// <param name=""TableToLoad"">Data table to load</param>
/// <param name=""FieldPositions_in"">Field positions that the file will
/// be parsed with</param>
/// <param name=""FiletoLoad"">Stream to read data from.</param>
/// <returns></returns>
public int LoadTable    (   ref DataTable TableToLoad
                        ,   SortedList<string,> FieldPositions_in
                        ,   Stream FiletoLoad
                        )
{
    int iRecordsLoadedOut = 0;
    string strSQLCon = string.Empty;
    string strTableName = string.Empty;
    DataRow dr = null;
    string[] astrFile = null;
    string strFromFile = string.Empty;
    string strFld = string.Empty;
    string strFile = string.Empty;
    int iStart = 0;
    int iEnd = 0;
    int iFileLen = 0;
    StreamReader sr = null;
    iFileLen = (int)FiletoLoad.Length;
    sr = new StreamReader(FiletoLoad,System.Text.Encoding.UTF8,false,iFileLen);
    if (FiletoLoad != null)
    {
        if (sr != null)
        {
            //FiletoLoad.Seek(1, SeekOrigin.Begin);
            strFile = sr.ReadToEnd();
            sr.Close();
            sr = null;
            if (strFile.Contains("\r"))
            {
                strFile.Replace("\n", "");
                astrFile = strFile.Split(Char.Parse("\r"));
            }
            if (astrFile != null)
            {
                foreach (string strLine in astrFile)
                {
                    if (strLine != "\n")
                    {
                        dr = TableToLoad.NewRow();
                        foreach (DataColumn col in TableToLoad.Columns)
                        {
                            strFld = col.ColumnName;
                            dr[strFld.Trim()] = DBNull.Value;
                            iStart = FieldPositions_in[strFld].Start;
                            iEnd = FieldPositions_in[strFld].End;
                            if ((iStart <= strLine.Replace("\n", "").Length) &
                                (iStart + (iEnd - iStart) <=
                                strLine.Replace("\n", "").Length))
                            {
                                strFromFile = strLine.Replace("\n",
                                    "").Substring(iStart, iEnd - iStart).Trim();
                                if (strFromFile.Length > 0)
                                {
                                    if (TableToLoad.Columns[strFld.Trim()] != null)
                                    {
                                        if (strFromFile != string.Empty)
                                        {
                                            strFromFile = strFromFile.Replace(
                                                "'", "''");
                                            if (strFromFile != ".")
                                            {
                                                dr[strFld.Trim()] =
                                                    strFromFile.Trim().ToString();
                                            }
                                        }
                                    }
                                }
                            }
                         }
                        TableToLoad.Rows.Add(dr);
                    }
                }
            }
        }
    }
    //TableToLoad.AcceptChanges();
    return iRecordsLoadedOut;
}

The datatable that is loaded by LoadTable is then passed back into the BulkLoadTable structure so that its columns can stay married up to the corresponding SqlBulkCopyColumnMappingCollection, and after WriteToServer is called, the records appear in the new table.

/// <summary>
/// Bulk loads a BulkLoadTable with SQL Server's BCP components.
/// </summary>
/// <param name=""bct_in"">Table to load</param>
/// <param name=""Con_in"">Open Connection to use.</param>
/// <returns></returns>
public int DoBulkLoad(BulkLoadTable bct_in, SqlConnection Con_in)
{
    int iRecordsLoadeOut = 0;
    SqlBulkCopy sbc = null;
    SqlBulkCopyColumnMappingCollection sbcMaps = null;
    DataTable dtToLoad =null;
    string strTableName = null;
    dtToLoad = bct_in.Table;
    strTableName = dtToLoad.TableName;
    sbc = new SqlBulkCopy(Con_in);
    sbcMaps = bct_in.Maps;
    foreach (SqlBulkCopyColumnMapping map in sbcMaps)
    {
        sbc.ColumnMappings.Add(map);
    }
    sbc.BatchSize = dtToLoad.Rows.Count;
    sbc.DestinationTableName = strTableName;
    sbc.BulkCopyTimeout = 0;
    if (Con_in.State != ConnectionState.Open)
    {
        Con_in.Open();
    }
    sbc.WriteToServer(dtToLoad, DataRowState.Added);
    if (Con_in.State != ConnectionState.Closed)
    {
        Con_in.Close();
    }
    iRecordsLoadeOut = dtToLoad.Rows.Count;
    return iRecordsLoadeOut;
}

Points of Interest

I broke the load function out of the rest of this process so that if there are some special validations and/or transformations that need to happen on the individual fields, this can be done right after the data is loaded into the ADO data table, but, before it is sent to the server. Also, at some point, it might make sense to override the load or make table functions for special tables or other file formats.

Though this example walks through parsing all the settings from strings, there is no reason another app might want to load the field possessions structure differently, or use an FTP sockets stream to get a file instead of reading it from a disk.

Hopefully in a future ADO release, this gets easier and tables will get better at knowing their own schema information.

History

3^rd March, 2009: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Alden John Snow

Web Developer

United States

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.