Click here to Skip to main content
Click here to Skip to main content
Go to top

Handling BULK Data insert from CSV to SQL Server

, 9 Nov 2012
Rate this:
Please Sign up or sign in to vote.
From this article, developers will get some useful guidelines on bulk data insertion in SQL Server.

Introduction

I am writing this article from the experiences of inserting huge data (around 5 million rows, 400 MB) from a CSV file to a SQL Server database.

Background 

There are several ways to insert bulk data from a CSV file to a database; our goal was to perform faster insertion and execute the insertion from a C# application. To reach the goal, we experimented with some of the common well known techniques to handle bulk data insertion. Following are the techniques we experimented:

  1. SQL BULK INSERT query
  2. BCP or SqlBulkCopy library to insert bulk data using C# or VB
  3. SQl Server Integration Service (SSIS)
  4. Normal SQL command library in C# or VB

From this article, developers will get some useful guidelines on bulk data insertion in SQL Server.

1. SQL BULK Insert

Using the BULK INSERT statement we can insert bulk data into the database directly from a CSV file. The simplest version of the BULK INSERT query looks like that:

BULK INSERT dbo.TableForBulkData
FROM 'C:\BulkDataFile.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)

To handle transaction and rollback, a try catch block can be used like that:

EGIN TRANSACTION
BEGIN TRY
BULK INSERT dbo.BulkDataTable
FROM 'C:\TestFiles\Bulk3.csv'
WITH
(
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    ROWS_PER_BATCH = 10000, 
    TABLOCK
)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH

The details of the BULK INSERT SQL are available here: http://msdn.microsoft.com/en-us/library/ms188365.aspx.

BULK INSERT is actually a rich T-SQL command which takes arguments to setup error, batch_size, trigger, and so on.

2. BCP or SqlBulkCopy Library

The .NET Framework contains a SqlBulkCopy class in the System.Data.SqlClient namespace to copy large amounts of data from .NET applications to a SQL Server database easily and efficiently.

The details of the SqlBulkCopy operation can be found here: http://msdn.microsoft.com/en-us/library/tchktcdk%28v=vs.80%29.aspx.

The SqlBulkCopy class copies a bulk of data from a data table to a database table. SqlBulkCopy takes the following types of parameters to copy data in the database: System.Data.DataRow[], System.Data.DataTable, System.Data.IDataReader.  

Now the challenge is to convert the large CSV file to any of these datatypes: DataRow[], DataTable, IDataReader. Some of the open source libraries are available to perform such conversion. For experimenting, we used CSVReader (http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader) which binds data from a CSV file via the System.Data.IDataReader interface.

The following code performs a SqlBulkCopy perfectly from a CSV to a database table.

StreamReader file = new StreamReader(bulk_data_filename);
CsvReader csv = new CsvReader(file, true,',');
SqlBulkCopy copy = new SqlBulkCopy(conn);
copy.DestinationTableName = tablename;
copy.WriteToServer(csv);

By default, a bulk copy operation runs in its own transaction. To commit or rollback, a transaction needs to be included. Following is the code for SqlBulkCopy with transaction. 

SqlConnection conn = new SqlConnection(connectionString);
conn.Open();
SqlTransaction transaction = conn.BeginTransaction();
try
{
    using (StreamReader file = new StreamReader(filename))
    {
        CsvReader csv = new CsvReader(file, true, '|');
        SqlBulkCopy copy = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity, transaction);
        copy.DestinationTableName = tablename;
        copy.WriteToServer(csv);
        transaction.Commit();
    }
}
catch (Exception ex)
{
    transaction.Rollback();
}
finally
{
    conn.Close();
}

It is better to add BulkCopyTimeout for very large CSV files: copy.BulkCopyTimeout = XXX;.

3. SQL Server Integration Service

SQL Server includes a powerful data integration and transformation application called SQL Server Integration Service (SSIS). One of the main functions of SSIS is to move data from almost any formatted external data source into SQL Server. Using Business Intelligent Development Studio (BIDS) we can easily import data from a CSV file to a database. Also, it is very simple to put a package file in as automatic reoccurring job.

Here are the basic steps to create a SSIS service package to import data from a CSV file to SQL Server.

  1. Open SQL Server Business Intelligence Studio.
  2. Create a new “Integration Service Project”. 
  3. In the “Control Flow” tab, drag a “Data Flow Task” from the toolbox.
  4. Go to “Data Flow” tab.
  5. In the “Data Flow” page, drag “Flat File Source” and “ADO.NET Destination” from the toolbox and set them up.
  6. Connect Flat File Source output path (green arrow) to the ADO.NET Destination.

Running the created SSIS package will duplicate data from the CSV file to the SQL database. 

4. Insert data using the conventional SQLCommand class

Data can be inserted to the database from a CSV file using the conventional SQLCommand class. But this is a very slow process. Compared to the other three ways I have already discussed, this process is at least 10 times slower. It is strongly recommended to not loop through the CSV file row by row and execute SqlCommand for every row to insert a bulk amount of date from the CSV file to the SQL Server database.

Comparative analysis

In our study, we found that BULK Insert SQL and SQLBulkCopy performed best. For around a 400MB CSV file it took an average three minutes to insert data. SQL Server Integration Service (SSIS) took around double the time than BULK Insert and SQLBulkCopy. Normal data insertion took a long long time.

Based on our results we can say using BULK Insert SQL or the SQLBulkCopy class performs best for huge data insertions. We can also use SSIS for bulk data insertions, but using normal SQLCommand to insert bulk data is not a reasonable solution. 

Caution: The result can vary from system to system.

Expert Opinions (References to make it better)

According to Adrian Hills: Bulk loading though ADO.NET can be extremely efficient if you use the SqlBulkCopy class. If your source data is in XML, you could try loading that into a DataSet - either by iterating through the XML document manually, or via the DataSet.ReadXml method. You can then highly optimise the bulk loading when using the SqlBulkCopy class by:

  • loading into a heap table (no indexes - create the indexes after the data has been loaded in)
  • Specifying the TableLock option for the bulk load, which will get a bulk upload lock on the table

Suggestion in forum: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/812b9a6a-541d-4d3a-b7a0-005b93012264.

Blog about it: http://www.adathedev.co.uk/2010/02/sqlbulkcopy-bulk-load-to-sql-server.html

Greg Robidoux has written tips for Minimally Logging Bulk Load Inserts into SQL Server in http://www.mssqltips.com/sqlservertip/1185/minimally-logging-bulk-load-inserts-into-sql-server/.

Optimizing Bulk Import Performance

Following are the suggestions to optimize bulk import performance:

  • Use minimal logging
  • Import data in parallel from multiple clients to a single table
  • Disable triggers 
  • Disable constraints
  • Order the data in a data file
  • Control the locking behavior
  • Import data in native format

Details of these optimization suggestions can be found here: http://msdn.microsoft.com/en-us/library/ms190421%28v=sql.105%29.aspx.

History

V01 - August 13, 2012 - First version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ashrafur Rahaman
Software Developer (Senior)
Canada Canada
Software engineer with broad experience in enterprise application development, product deployment automation, software test & test automation, application & system management, and manage big projects and team using proven agile technologies.
 
Passionate on Microsoft technologies, developed solutions using C#, .net (1.1/2.0/3.5/4), SQL Server (2005/2008). Work on Powershell, SSRS, SSIS, WPF, Ajax, WCF, JQuery.
 
Develop innovative application with cutting edge technologies always boosting inside.

Comments and Discussions

 
GeneralMy vote of 5 Pinmemberdp.sistemas10-Apr-13 2:00 
QuestionAn item with the same key has already been added. PinmemberDBlocker11-Mar-13 3:51 
QuestionError with uniqueidentifier types. [modified] Pinmemberfranco.fral31-Jan-13 7:08 
AnswerRe: Error with uniqueidentifier types. PinmemberMember 110133988-Sep-14 7:37 
QuestionCan we get some more accurate numbers? PinmemberJasmine250112-Nov-12 10:49 
QuestionDatatype conversion errors for bit and datetime sql types. PinmemberRene Pilon10-Nov-12 16:16 
AnswerRe: Datatype conversion errors for bit and datetime sql types. PinmemberRene Pilon11-Nov-12 3:44 
AnswerRe: Datatype conversion errors for bit and datetime sql types. PinmemberRene Pilon11-Nov-12 4:20 
GeneralMy vote of 1 Pinmembercjb11013-Aug-12 20:54 
The point of this? No comparisons/timings, just a rehash of part of the help.
 
If you were completely new to this, i.e. no attempt to work anything out yourself, this article still wouldn't be useful as none of your examples have enough information about how they work.
 
Sorry but why bother writing this?
QuestionBulk insert not recognizing end of file Pinmembersearching for solution.....6-May-13 0:29 
AnswerRe: Bulk insert not recognizing end of file PinmemberAshrafur Rahaman6-May-13 5:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140905.1 | Last Updated 9 Nov 2012
Article Copyright 2012 by Ashrafur Rahaman
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid