Click here to Skip to main content
15,861,125 members
Articles / Database Development / SQL Server

Handling BULK Data insert from CSV to SQL Server

Rate me:
Please Sign up or sign in to vote.
4.69/5 (17 votes)
9 Nov 2012CPOL5 min read 387.5K   45   13
From this article, developers will get some useful guidelines on bulk data insertion in SQL Server.

Introduction

I am writing this article from the experiences of inserting huge data (around 5 million rows, 400 MB) from a CSV file to a SQL Server database.

Background 

There are several ways to insert bulk data from a CSV file to a database; our goal was to perform faster insertion and execute the insertion from a C# application. To reach the goal, we experimented with some of the common well known techniques to handle bulk data insertion. Following are the techniques we experimented:

  1. SQL BULK INSERT query
  2. BCP or SqlBulkCopy library to insert bulk data using C# or VB
  3. SQl Server Integration Service (SSIS)
  4. Normal SQL command library in C# or VB

From this article, developers will get some useful guidelines on bulk data insertion in SQL Server.

1. SQL BULK Insert

Using the BULK INSERT statement we can insert bulk data into the database directly from a CSV file. The simplest version of the BULK INSERT query looks like that:

SQL
BULK INSERT dbo.TableForBulkData
FROM 'C:\BulkDataFile.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)

To handle transaction and rollback, a try catch block can be used like that:

SQL
EGIN TRANSACTION
BEGIN TRY
BULK INSERT dbo.BulkDataTable
FROM 'C:\TestFiles\Bulk3.csv'
WITH
(
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    ROWS_PER_BATCH = 10000, 
    TABLOCK
)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH

The details of the BULK INSERT SQL are available here: http://msdn.microsoft.com/en-us/library/ms188365.aspx.

BULK INSERT is actually a rich T-SQL command which takes arguments to setup error, batch_size, trigger, and so on.

2. BCP or SqlBulkCopy Library

The .NET Framework contains a SqlBulkCopy class in the System.Data.SqlClient namespace to copy large amounts of data from .NET applications to a SQL Server database easily and efficiently.

The details of the SqlBulkCopy operation can be found here: http://msdn.microsoft.com/en-us/library/tchktcdk%28v=vs.80%29.aspx.

The SqlBulkCopy class copies a bulk of data from a data table to a database table. SqlBulkCopy takes the following types of parameters to copy data in the database: System.Data.DataRow[], System.Data.DataTable, System.Data.IDataReader.  

Now the challenge is to convert the large CSV file to any of these datatypes: DataRow[], DataTable, IDataReader. Some of the open source libraries are available to perform such conversion. For experimenting, we used CSVReader (http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader) which binds data from a CSV file via the System.Data.IDataReader interface.

The following code performs a SqlBulkCopy perfectly from a CSV to a database table.

C#
StreamReader file = new StreamReader(bulk_data_filename);
CsvReader csv = new CsvReader(file, true,',');
SqlBulkCopy copy = new SqlBulkCopy(conn);
copy.DestinationTableName = tablename;
copy.WriteToServer(csv);

By default, a bulk copy operation runs in its own transaction. To commit or rollback, a transaction needs to be included. Following is the code for SqlBulkCopy with transaction. 

C#
SqlConnection conn = new SqlConnection(connectionString);
conn.Open();
SqlTransaction transaction = conn.BeginTransaction();
try
{
    using (StreamReader file = new StreamReader(filename))
    {
        CsvReader csv = new CsvReader(file, true, '|');
        SqlBulkCopy copy = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity, transaction);
        copy.DestinationTableName = tablename;
        copy.WriteToServer(csv);
        transaction.Commit();
    }
}
catch (Exception ex)
{
    transaction.Rollback();
}
finally
{
    conn.Close();
}

It is better to add BulkCopyTimeout for very large CSV files: copy.BulkCopyTimeout = XXX;.

3. SQL Server Integration Service

SQL Server includes a powerful data integration and transformation application called SQL Server Integration Service (SSIS). One of the main functions of SSIS is to move data from almost any formatted external data source into SQL Server. Using Business Intelligent Development Studio (BIDS) we can easily import data from a CSV file to a database. Also, it is very simple to put a package file in as automatic reoccurring job.

Here are the basic steps to create a SSIS service package to import data from a CSV file to SQL Server.

  1. Open SQL Server Business Intelligence Studio.
  2. Create a new “Integration Service Project”. 
  3. In the “Control Flow” tab, drag a “Data Flow Task” from the toolbox.
  4. Go to “Data Flow” tab.
  5. In the “Data Flow” page, drag “Flat File Source” and “ADO.NET Destination” from the toolbox and set them up.
  6. Connect Flat File Source output path (green arrow) to the ADO.NET Destination.

Running the created SSIS package will duplicate data from the CSV file to the SQL database. 

4. Insert data using the conventional SQLCommand class

Data can be inserted to the database from a CSV file using the conventional SQLCommand class. But this is a very slow process. Compared to the other three ways I have already discussed, this process is at least 10 times slower. It is strongly recommended to not loop through the CSV file row by row and execute SqlCommand for every row to insert a bulk amount of date from the CSV file to the SQL Server database.

Comparative analysis

In our study, we found that BULK Insert SQL and SQLBulkCopy performed best. For around a 400MB CSV file it took an average three minutes to insert data. SQL Server Integration Service (SSIS) took around double the time than BULK Insert and SQLBulkCopy. Normal data insertion took a long long time.

Based on our results we can say using BULK Insert SQL or the SQLBulkCopy class performs best for huge data insertions. We can also use SSIS for bulk data insertions, but using normal SQLCommand to insert bulk data is not a reasonable solution. 

Caution: The result can vary from system to system.

Expert Opinions (References to make it better)

According to Adrian Hills: Bulk loading though ADO.NET can be extremely efficient if you use the SqlBulkCopy class. If your source data is in XML, you could try loading that into a DataSet - either by iterating through the XML document manually, or via the DataSet.ReadXml method. You can then highly optimise the bulk loading when using the SqlBulkCopy class by:

  • loading into a heap table (no indexes - create the indexes after the data has been loaded in)
  • Specifying the TableLock option for the bulk load, which will get a bulk upload lock on the table

Suggestion in forum: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/812b9a6a-541d-4d3a-b7a0-005b93012264.

Blog about it: http://www.adathedev.co.uk/2010/02/sqlbulkcopy-bulk-load-to-sql-server.html

Greg Robidoux has written tips for Minimally Logging Bulk Load Inserts into SQL Server in http://www.mssqltips.com/sqlservertip/1185/minimally-logging-bulk-load-inserts-into-sql-server/.

Optimizing Bulk Import Performance

Following are the suggestions to optimize bulk import performance:

  • Use minimal logging
  • Import data in parallel from multiple clients to a single table
  • Disable triggers 
  • Disable constraints
  • Order the data in a data file
  • Control the locking behavior
  • Import data in native format

Details of these optimization suggestions can be found here: http://msdn.microsoft.com/en-us/library/ms190421%28v=sql.105%29.aspx.

History

V01 - August 13, 2012 - First version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Canada Canada
Software engineer with broad experience in enterprise application development, product deployment automation, software test & test automation, application & system management, and manage big projects and team using proven agile technologies.

Passionate on Microsoft technologies, developed solutions using C#, .net (1.1/2.0/3.5/4), SQL Server (2005/2008). Work on Powershell, SSRS, SSIS, WPF, Ajax, WCF, JQuery.

Develop innovative application with cutting edge technologies always boosting inside.

Comments and Discussions

 
GeneralMy vote of 4 Pin
Raghvendra singh124-Jul-19 18:52
Raghvendra singh124-Jul-19 18:52 
QuestionIDisposable not being considered - Using! Pin
TamusRoyce20-Apr-17 9:51
TamusRoyce20-Apr-17 9:51 
GeneralMy vote of 5 Pin
dp.sistemas10-Apr-13 2:00
dp.sistemas10-Apr-13 2:00 
QuestionAn item with the same key has already been added. Pin
DBlocker11-Mar-13 3:51
DBlocker11-Mar-13 3:51 
QuestionError with uniqueidentifier types. Pin
franco.fral31-Jan-13 7:08
franco.fral31-Jan-13 7:08 
AnswerRe: Error with uniqueidentifier types. Pin
Member 110133988-Sep-14 7:37
Member 110133988-Sep-14 7:37 
QuestionCan we get some more accurate numbers? Pin
Jasmine250112-Nov-12 10:49
Jasmine250112-Nov-12 10:49 
QuestionDatatype conversion errors for bit and datetime sql types. Pin
2374110-Nov-12 16:16
2374110-Nov-12 16:16 
Hi,

I'm using #2 above - using CSVReader. CSVReader is extremely quick.

The CSV was generated from MySQL where I am trying to import the CSV data to a SQL Server table.

Big problem - I keep getting errors on anything related to DATETIME or BIT data colums when writing to SQL Server.

It does not recognize a 1 or 0 as a "compatible" data type to write to a bit column - not to mention the insanity of trying to get the correct format for a DATETIME.

I am using column mappings and all.

Any ideas? (these issues are all over the net regarding data type conversions using SQLBulkCopy).
AnswerRe: Datatype conversion errors for bit and datetime sql types. Pin
2374111-Nov-12 3:44
2374111-Nov-12 3:44 
AnswerRe: Datatype conversion errors for bit and datetime sql types. Pin
2374111-Nov-12 4:20
2374111-Nov-12 4:20 
GeneralMy vote of 1 Pin
cjb11013-Aug-12 20:54
cjb11013-Aug-12 20:54 
QuestionBulk insert not recognizing end of file Pin
searching for solution.....6-May-13 0:29
searching for solution.....6-May-13 0:29 
AnswerRe: Bulk insert not recognizing end of file Pin
Ashrafur Rahaman6-May-13 5:16
Ashrafur Rahaman6-May-13 5:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.