Click here to Skip to main content
Click here to Skip to main content

How to store and fetch binary data into a file stream column

, 3 Jan 2009 CPOL
Rate this:
Please Sign up or sign in to vote.
Desribes how to store and fetch binary data to a filestream column compared to a varbinary column.

Introduction

File streams were introduced in SQL Server 2008. They offer the capability to store binary data to the database but outside the normal database files. Earlier, varbinary used to be stored inside database files, which had many side-effects. Because SQL Server stores data in blocks which are arranged as extents, the data in earlier varbinary columns had to conform to the block structure (although a bit different from normal data blocks).

In SQL Server 2008, if a varbinary column is defined to be stored as a file stream, the binary data is stored in a special folder structure which is managed by the SQL Server engine. The only thing that remains inside the database is a pointer to the file along with a mandatory GUID column in order to use the file stream from the Win32 client.

File stream data can be used from the .NET Framework using the traditional SqlParameter, but there is also a specialized class called SqlFileStream which can be used with .NET Framework 3.5 SP1 or later. This class provides mechanisms, for example, for seeking a specific position from the data.

Advantages of storing binary data to the database

A typical question is: Why store binary data in a database? A very common solution is to store the actual data into the file system and only define a path or URL to the database pointing to the actual file. However, there are a few issues that should be considered.

  • Backups: When data is stored apart from the database, it's not backed up by SQL Server. If these files need to be backed up, a separate mechanism must be created. This also means that these two backups are 'never' in-sync. For example, a file may be deleted when a SQL Server backup is made but the actual file is not backed up yet. When the data is stored to the database, the backup is consistent.
  • Transactionality: When the file is stored outside the database, the file creation, modification, and deletion isn't part of the transaction which occurs against the database. This means that neither commit nor rollback actually guarantees that the result is consistent. When the data is stored inside the database, it's part of the transaction. So, for example, a rollback includes all traditional database operations along with binary data operations. This usually makes the client solution more robust with less code.

Setting up the database

This section describes how to create a test database capable of handling a file stream. The database contains two tables for comparing traditional varbinary columns against file streams. These scripts are provided in the sample project in the file named DatabaseCreationScript.txt. Note: Before using the script, modify the data file paths to suityour environment.

Database creation

--------------------------------------------------------------------------
-- Create the database
--------------------------------------------------------------------------
CREATE DATABASE [SqlFileStream] ON  PRIMARY 
( NAME = N'SqlFileStream', 
  FILENAME = N'C:\DATA\SqlFileStream.mdf', 
  SIZE = 500MB , 
  MAXSIZE = UNLIMITED, 
  FILEGROWTH = 100MB ), 
  FILEGROUP [FileStreamData] CONTAINS FILESTREAM  DEFAULT 
( NAME = N'SqlFileStream_Data', 
  FILENAME = N'C:\DATA\FileStreamData\SqlFileStream_Data' )
LOG ON 
( NAME = N'SqlFileStream_log', 
  FILENAME = N'C:\DATA\SqlFileStream_log.ldf', 
  SIZE = 500MB , 
  MAXSIZE = 2048GB , 
  FILEGROWTH = 100MB)
GO

The previous script creates a new database called SqlFileStream. When using file stream storage, you must specify the folder where the binary data will be placed. This folder is represented to SQL Server as a special file group defined to contain a file stream. The primary file and the log files are defined normally.

--------------------------------------------------------------------------
-- Create the tables
--------------------------------------------------------------------------
USE [SqlFileStream]
GO

CREATE TABLE [FileStreamTest] (
[Id]   uniqueidentifier ROWGUIDCOL NOT NULL DEFAULT (NEWID()) PRIMARY KEY,
[Name] nvarchar(100)    NOT NULL,
[Data] varbinary(max)   FILESTREAM NOT NULL
)
GO

CREATE TABLE [VarbinaryTest] (
[Id]   uniqueidentifier ROWGUIDCOL NOT NULL DEFAULT (NEWID()) PRIMARY KEY,
[Name] nvarchar(100)    NOT NULL,
[Data] varbinary(max)   NOT NULL
)
GO

The script creates two tables. The FileStreamTest table stores the binary data to the file stream. In order to use the data from the Win32 client, a GUID column must be present in the table. This GUID is actually used to identify the stored files in the file stream folder. The second table stores the binary data inside the database files (in this case, SqlFileStream.mdf). This table is used from the client program to test storage times against file stream data. A GUID column isn't mandatory in this case, although it's used by the client to identify a row.

Client program

Tab1Picture.jpg

The client program is a simple WPF application. The first tab is used to upload the data with three different variations:

  • Using a SqlParameter, load the data to a traditional varbinary column
  • Using a SqlParameter, load the data to a varbinary column stored as a file stream
  • Using a SqlFileStream, load the data to a varbinary column stored as a file stream

During the upload, a single operation is measured, and if the upload is repeated using a repeat count, an average is calculated based on individual upload times. The second tab is used for downloading and saving the data back to a file with the same variations as when uploading.

The program consists of the main window and a static DbOperations class which handles all of the communication to the database.

Storing the data

The DbOperations class implements a method called StoreFileUsingSqlParameter to store the data to the database using SqlParameter. The code is the same whether a normal varbinary or a file stream is used, so this decision actually doesn't have any effect on the client. The code simply creates a command and sets the parameters. After that, the command is executed in a loop for a defined number of times.

command.CommandText = "INSERT INTO " 
     + (tableType == TableType.Traditional ? "VarbinaryTest" : "FileStreamTest") 
     + " ([Name], [Data]) VALUES (@Name, @Data)";
command.CommandType = System.Data.CommandType.Text;

parameter = new System.Data.SqlClient.SqlParameter("@Name", 
                System.Data.SqlDbType.NVarChar, 100);
parameter.Value = file.Substring(file.LastIndexOf('\\') + 1);
command.Parameters.Add(parameter);

parameter = new System.Data.SqlClient.SqlParameter("@Data", 
                System.Data.SqlDbType.VarBinary);
parameter.Value = System.IO.File.ReadAllBytes(file);
command.Parameters.Add(parameter);

The StoreFileUsingSqlFileStream method does the same thing, but this time, using a SqlFileStream. There are a few gotcha's when using the SqlFileStream.

When inserting a new row, the file storing the data for SQL Server should be created at the same time. However, if the column containing the file stream data is omitted in the INSERT statement, it's interpreted as NULL. In that case, the file is not created, and it would be harder to use the row afterwards. For that reason, the INSERT statement adds (0x) (empty data) to the varbinary column.

insertCommand.CommandText = 
 "INSERT INTO FileStreamTest ([Id], [Name], [Data]) VALUES (@Id, @Name, (0x))";
insertCommand.CommandType = System.Data.CommandType.Text;

Because the row is inserted first and the binary data is updated to the row afterwards, the program must start a transaction and fetch the transaction context that will later be used when an instance of SqlFileStream is created.

insertCommand.Transaction = connection.BeginTransaction();
helperCommand.Transaction = insertCommand.Transaction;

helperCommand.CommandText = "SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()";
transactionContext = helperCommand.ExecuteScalar();

The row is first inserted as a normal row, but after that, a path to the file is fetched from SQL Server. This path, along with the transaction context, is used to initialize the SqlFileStream. Once initialized, the data is written to the SqlFileStream as a byte array. Note, the code is in different order in the program for efficiency.

helperCommand.CommandText = "SELECT Data.PathName() FROM FileStreamTest WHERE [Id] = @Id";
parameter = new System.Data.SqlClient.SqlParameter("@Id", 
                System.Data.SqlDbType.UniqueIdentifier);
helperCommand.Parameters.Add(parameter);

helperCommand.Parameters["@Id"].Value = insertCommand.Parameters["@Id"].Value;
filePathInServer = (string)helperCommand.ExecuteScalar();

sqlFileStream = new System.Data.SqlTypes.SqlFileStream(filePathInServer, 
                                                       (byte[])transactionContext, 
                                                       System.IO.FileAccess.Write);
sqlFileStream.Write(fileData, 0, fileData.Length);
sqlFileStream.Close();

Using the code

In order to use the code, you need to install a SQL Server 2008 instance and create the database and tables into it. After that, the SQL Server instance name and database name are configured via app.config. It would look something like:

...
<applicationSettings>
    <TableValuedParameters.Properties.Settings>
        <setting name="DataSource" serializeAs="String">
            <value>SqlServerMachine\SqlServerInstanceName</value>
        </setting>
        <setting name="DatabaseName" serializeAs="String">
            <value>SqlFileStream</value>
        </setting>
    </TableValuedParameters.Properties.Settings>
</applicationSettings>
...

Measurements

With a small amount of binary data, it's not efficient to use a file stream. This is because it needs extra overhead like file creation and handling. These operations are not needed when predefined database files are used. However, with larger files, file streams are quite efficient. The following charts show the elapsed times for upload in milliseconds using different techniques, on my development box. The key specifications for the computer used were:

  • SQL Server and client application on the same machine
  • Processor: Intel Core2 Duo, 1.8 MHz
  • 4 GB physical memory
  • Database files on drive C:
  • Files uploaded from drive E:
  • Drives C: and E: on separate physical SATA disk drives

The charts show average upload times for:

  • 100 KB file repeated 3 times for each measurement
  • 1 MB file repeated 3 times for each measurement
  • 10 MB file repeated 3 times for each measurement

100kB.jpg

In this measurement, you can clearly see the overhead caused by executing several different statements using SqlFileStream. Also, the file handling overhead is seen in both file stream based solutions compared to a varbinary inside the database.

1MB.jpg

With 1 MB of data, the traditional varbinary and the file stream are acting quite similarly. Using SqlFileStream is still significantly slower.

10MB.jpg

When 10 MB files are used, storing the data inside a traditional database file is much slower.

Based on these measurements, it's more efficient to use a file stream when the typical file size is about 1 MB or more. If files are small (clearly under 1 MB), a traditional varbinary performs better. In all of the cases, using SqlFileStream wasn't giving any performance advantage, so its advantages seem to be only in the data handling capabilities over a byte array.

When measuring the times, one difference I found was that the deletion of the file stream based rows is much, much faster than when stored inside the table. So, when considering the overall performance, you should also consider how much deletions occur and how they impact the total throughput of the system.

Downloading

The download tab is used to fetch binary data from a single, user selected row. It uses the same techniques as the upload. The download tab is included for two reasons: to make sure that the data wasn't modified in any way, and to investigate fetching using SqlFileStream.

So, that's about it. Download the program, and use it freely to investigate the efficiency of these technologies using your setup and to explore the usage of a file stream.

History

  • January 3, 2009: Created.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Mika Wendelius
Architect
Finland Finland
I've been a programmer since mid 80's using languages like assembler, C/C++, PL/I (mainframe environment), pascal, VB (I know, I know, no comments please) and C# and utilizing different techniques and tools.
 
However I'm specialized in databases and database modeling. Mostly I have used products like Oracle (from version 6), SQL Server (from version 4.2), DB2 and Solid Server (nowadays an IBM product).
 
For the past 10+ years my main concerns have been dealing with different business processes and how to create software to implement and improve them. At my spare time (what ever that actually means) I'm also teaching and consulting on different areas of database management, development and database oriented software design.

Comments and Discussions

 
QuestionCan Use Sql File Stream in Client / Server App? Pinmemberrahim_ttl23-Jun-13 22:57 
GeneralMy vote of 5 PinmvpKanasz Robert24-Sep-12 6:01 
GeneralRe: My vote of 5 PinmvpMika Wendelius25-Sep-12 8:09 
GeneralMy vote of 5 Pinmembermsswcon4-May-12 8:25 
GeneralRe: My vote of 5 PinmvpMika Wendelius4-May-12 8:46 
QuestionHow do you open the filestream file in its associated app? PinmemberElliot Harlowe27-Sep-11 11:02 
AnswerRe: How do you open the filestream file in its associated app? PinmvpMika Wendelius4-May-12 8:47 
GeneralMy vote of 5 PinmemberMember 802179015-Jul-11 3:30 
GeneralRe: My vote of 5 PinmemberMika Wendelius25-Jul-11 10:54 
GeneralMy vote 5 PinmemberKanasz Robert11-Oct-10 2:52 
GeneralMy vote of 5 PinmemberGianluca Maria Marcilli15-Sep-10 20:44 
QuestionSource code ? Pinmemberaaditya200015-Apr-09 9:34 
GeneralAnother benefit for using the SqlFileStream approach [modified] PinmemberAssaf S.7-Jan-09 4:13 
GeneralRe: Another benefit for using the SqlFileStream approach PinmvpMika Wendelius7-Jan-09 4:22 
Generalthanks and notes [modified] PinmemberSergey Arhipenko6-Jan-09 3:33 
GeneralRe: thanks and notes PinmvpMika Wendelius6-Jan-09 3:51 
GeneralRe: thanks and notes PinmemberSergey Arhipenko6-Jan-09 4:44 
GeneralRe: thanks and notes PinmvpMika Wendelius6-Jan-09 5:21 
GeneralRe: thanks and notes PinmemberSergey Arhipenko6-Jan-09 6:02 
GeneralRe: thanks and notes PinmvpMika Wendelius6-Jan-09 6:09 
GeneralRe: thanks and notes PinmemberSergey Arhipenko7-Jan-09 0:05 
GeneralGood introduction PinmemberN a v a n e e t h3-Jan-09 23:23 
GeneralRe: Good introduction PinmemberMika Wendelius3-Jan-09 23:34 
GeneralRe: Good introduction PinmemberN a v a n e e t h4-Jan-09 4:20 
GeneralRe: Good introduction PinmemberMika Wendelius4-Jan-09 5:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.141022.1 | Last Updated 3 Jan 2009
Article Copyright 2009 by Mika Wendelius
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid