Introduction
Over a year ago, I read an article by Jarno Peschier in .NET Magazine. The article, written in Dutch, is called "Blob + Stream = BlobStream" and can be found here. It discusses the use of streams in conjunction with blobs in an SQL Server database. Blob is short for "Binary Large OBject" and generally refers to stored pictures, PDFs and other binary files. Jarno's idea is to use System.IO.Stream
to read a blob from or write a blob to a database.
The more widespread techniques to get blobs into and out of a database generally use byte arrays that contain the entire blob. For most scenarios and/or with relatively small blobs, this shouldn't be a problem. However, the transfer of a whole blob in one piece might not be a good idea in a web scenario with potentially hundreds of users requesting the same large file from the database at the same time. The big advantage of using streams is that the blob is transported in small chunks, thus keeping the memory footprint on the application server as small as possible.
Having used Jarno's technique in some of my recent projects, I decided to take it a step further and provide an extensible framework to manipulate blobs from other databases than SQL Server. As this is a rather large subject, I've split it up in parts. For the remainder of these articles, I shall no longer be talking about blobs, but about lobs instead. Of course, lob stands for "Large OBject." The reason is that lob streams can be used not only with binary data, but with text data as well.
Part 2 of this article is also available.
The philosophical discussion
This article does not take a stand in the big debate about whether or not it is a good idea to store pictures and other binary files in a database instead of using the file system. A rule of thumb that Dino Esposito gave in his book "Programming Microsoft ASP.Net 2.0 Applications – Advanced Topics" (Chapter 9) is:
- Use the file system if you are manipulating images larger than 1 Mb
- Use the file system if you have several hundred thousand images
- Use of the database is OK if the two previous conditions are not met
In another article Dino added, "If you need to frequently edit the images, I suggest that you store the images as separate files. If your files are essentially read-only and relatively static, and if you measure the size in Kb, then using a database is generally fine." Personally, I agree with Dino. In the projects where I stored pictures in the database, they were smaller than 1 Mb most of the time and served merely as an illustration to a text. Thus, they were static by nature.
The framework
The purpose of the framework is to provide a few classes and types that aid in the use of lob streams. The heart of the framework is an abstract class called LobStream. LobStream serves as the general base class for all subsequent implementations of lob streams that are database-specific. LobStream derives from System.IO.Stream
, a class that exposes abstract methods that must be overridden by child classes. In addition, LobStream implements ILobStream which is an interface that groups common methods and properties when working with database lobs. This interface ensures that client code will not be broken as long as new classes managing lob streams for yet-unsupported databases implement it.
The abstract LobStream class has two abstract subclasses: SqlLobStream and OleDbLobStream. As the names reveal, these classes refer to the two main .NET Data Providers for accessing databases. In the future, the framework can be extended with other .NET Data Provider specific classes like, for example, OdbcLobStream or OracleLobStream. See the interesting article on working with Oracle lobs for more information. The .NET Data Provider specific abstract subclasses serve as the basis for the actual implementation of a database-specific lob stream, as we will see further. Furthermore, the framework exposes a LobDataType enumeration that indicates the type of database and data type of the lob inside the database.
Finally, the framework implements a LobStreamFactory class named CreateLobStream that exposes three overloaded static methods. The CreateLobStream methods accept some parameters and return an appropriate lob stream object. The framework was created in Visual Studio 2005 as a class library project. It outputs to LobStream.dll. As long as the client code knows how to deal with an instance of LobStream that implements the ILobStream interface, the underlying code in the DLL can be changed and extended over time. Several design goals are achieved in this way:
- Reusability: just reference the framework classes in your project and you are set to go.
- Extensibility: you can easily add new lob stream classes that can talk to new/other databases or simply change the existing classes. As long as they derive from LobStream, no existing client code will be broken; just reference the updated framework in your projects.
The full picture of the framework is displayed here:
ILobStream
The implementation of ILobStream is pretty straightforward:
public interface ILobStream
{
IDbConnection Connection { get; set;}
IDbCommand Validate { get; set;}
bool IsValid { get; set;}
string TableName { get; set;}
string KeyColumn { get; set;}
string LobColumn { get; set;}
string LobID { get; set;}
void WriteTo(Stream destination, bool closeWhenFinished, long bufferSize);
void WriteTo(Stream destination);
void WriteTo([In, Out] byte[] bytes);
void ReadFrom(Stream source, bool closeWhenFinished, long bufferSize);
void ReadFrom(Stream source);
byte[] ToByteArray();
}
The interface makes sure that, in order to be able to communicate with a database, at least the following is available in the implementing class:
Connection
: the connection to the database. The Connection
property is of type IDbCommand
, to be compatible with connections exposed by any .NET Data Providers TableName
: the name of the table in which the lob is or has to be stored KeyColumn
: the name of the column in the aforementioned table that holds the unique value by which the destination record can be identified LobColumn
: the name of the column that holds the lob LobID
: the unique value in the KeyColumn
that indicates the database record to hold the lob
All of the above ultimately has to be supplied by the client code to the implementing class. In addition, ILobStream contains a few useful methods to enable the LobStream to communicate with other streams:
WriteTo
: the LobStream pours the lob into a destination stream or into a byte array ReadFrom
: the LobStream retrieves the lob from a source stream
Finally, the ToByteArray
function returns the lob as a byte array.
Inheriting from System.IO.Stream
Any class that derives from System.IO.Stream
has to override its abstract properties and methods, which are:
Properties:
CanSeek
CanRead
CanWrite
Position
Length
Methods:
Flush
Seek
Read
Write
SetLength
Since the LobStream is deriving from System.IO.Stream
, let's zoom in on the methods and properties to override.
LobStream
We reach the point that we must provide an actual implementation of the abstract methods of Stream in order to produce a base class for specific lob streams. In addition, we make LobStream an abstract class as well so that it can not be instantiated directly. The LobStream class serves only as a base class for database-specific implementations in a subclass.
Implementing the abstract properties of System.IO.Stream
CanSeek, CanRead, CanWrite
A stream should support seeking if the position in the dataflow matters and if the read/written chunks of data are meaningless unless they are ordered properly. Since the LobStream is all about the transfer of smaller binary data chunks to and from a database, it is obvious that these chunks must be kept in the proper order. Therefore, CanSeek
should return true. CanSeek
must be set to true
in order to be able to use Length
, SetLength
and Position
. Reading and writing lobs from and to a database is the reason for creating a lob stream. It is obvious that CanRead
and CanWrite
should also return true
.
#region CanSeek, CanRead, CanWrite
public override bool CanSeek
{
get { return true; }
}
public override bool CanRead
{
get { return true; }
}
public override bool CanWrite
{
get { return true; }
}
#endregion
Position
Position
gets or sets the current position in the stream, which indicates the progress of the subsequent reading operations. The implementation of Position
is straightforward:
private long _position = 0;
public override long Position
{
get
{
return _position;
}
set
{
_position = value;
}
}
Length
Length
indicates the length in number of bytes of the stream. In the case of LobStream, Length
must therefore be equal to the length of the binary data that is manipulated. The Length
value is important when reading from the database. In contrast, when writing the lob data to the database, we just append the chunks one after another without having to bother about length. However, measuring the length of the underlying lob is database-specific and must be implemented in the deriving class. If not, an exception is thrown.
public override long Length
{
get
{
throw new Exception("The property is not implemented.");
}
}
Implementing the abstract methods of System.IO.Stream
The Flush method
Some stream implementations perform local buffering of the underlying data to improve performance. For such streams, the Flush
method can be used to clear any internal buffers and ensure that all data has been written to the underlying data source or repository. However, no internal buffers are used in the LobStream. We need to override the Flush
method since it is defined as an abstract method in the base class, but without providing any additional functionality.
public override void Flush()
{
}
The Seek method
The Seek
method is an abstract method in System.IO.Stream
. It must be overridden in derived classes to the set position in the stream relative to the SeekOrigin
value, which can be Begin
, Current
or End
. As it turns out, the implementation of the Seek
method is common to all subclasses of LobStream and very straightforward.
public override long Seek(long offset, SeekOrigin origin)
{
long start = 0;
switch (origin)
{
case SeekOrigin.Begin:
start = 0;
break;
case SeekOrigin.Current:
start = this.Position;
break;
case SeekOrigin.End:
start = this.Length;
break;
}
this.Position = start + offset;
return this.Position;
}
Read and Write Methods
Although the Read
and Write
methods are database specific, we still need to provide for an actual implementation, since these are abstract methods in System.IO.Stream
. The best approach is to throw an exception if these methods are not properly overridden in the subclass.
public override int Read(byte[] buffer, int offset, int count)
{
throw new Exception("The Read method is not implemented.");
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new Exception("The Write method is not implemented.");
}
The SetLength method
With this method, the length of the stream is set to a desired length either by truncating or expanding the stream. However, the LobStream needs to manipulate the binary data exactly as it is, without truncating or expanding. SetLength
will therefore be overridden in LobStream without providing any functionality. An exception is thrown if this method is called.
public override void SetLength(long value)
{
throw new Exception("The SetLength method is not supported.");
}
Providing more base functionality in LobStream
Constructor
The LobStream class provides two constructors, one with and one without parameters. The constructor with the parameters is useful to accept at once all necessary user-supplied information.
public LobStream(IDbConnection connection, string tableName,
string keyColumn, string lobColumn, string lobID)
{
this.Connection = connection;
this.TableName = tableName;
this.KeyColumn = keyColumn;
this.LobColumn = lobColumn;
this.LobID = lobID;
}
It is important to note that lob stream classes expect to receive an open connection from the calling code. On the other hand, when the lob stream class is disposed of, the connection is closed automatically. This is explained in the "Cleaning up" section.
Cleaning up
The LobStream uses a database connection for transport of the lob. After the work is done, this connection should be closed and released. The question is where. The most intuitive solution is to override the Close
method of the Stream. Microsoft documentation recommends otherwise: in derived classes, the Close
method should not be overridden. All of the Stream cleanup logic should reside in the Dispose(bool)
method, which is called in the actual implementation of the Close
method of System.IO.Stream
:
public virtual void Close()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
The implementation of the Dispose
methods in System.IO.Stream
is as follows:
public void Dispose()
{
this.Close();
}
protected virtual void Dispose(bool disposing)
{
if(disposing && (this._asyncActiveEvent != null))
{
this._CloseAsyncActiveEvent(
Interlocked.Decrement(ref this._asyncActiveCount));
}
}
In LobStream, we override the Dispose(bool)
method as follows:
protected override void Dispose(bool disposing)
{
if(!this.disposed)
{
try
{
if(disposing) this.Connection.Close();
this.disposed = true;
}
finally
{
base.Dispose(disposing);
}
}
}
In summary, calling Close
in turn calls Dispose(true)
, where the connection will be closed. As a reminder, the nice thing about having System.IO.Stream
implementing the IDisposable interface is that we can employ the using statement like:
using (LobStream.LobStream s = LobStreamFactory.CreateLobStream(
LobDataType.AccessBinaryFromImage, connectionString,
"Blob", "blobID", "blobImage", "1"))
{
}
The using statement defines a scope outside that the LobStream will be disposed to automatically.
A validation mechanism
Following the "check your input" rule , I included a validation mechanism by which the LobStream can verify that the indicated lob is indeed present in the database. LobStream has an abstract property, Validate
, of type IDbCommand
. This Validate
command is database-specific and has to be provided by the actual implementation of the subclass. For the validation mechanism to work, the Validate
command must return the number of records in the database matching the user-supplied information, i.e. KeyColumn
, LobID
, LobColumn
and TableName
. The Validate
command is called inside the get section of the IsValid
boolean property. If the Validate
command returns any other number than 1
, the LobStream object is not valid.
private bool _isValid = false;
public virtual bool IsValid
{
get
{
int result = Convert.ToInt32(this.Validate.ExecuteScalar());
this._isValid = (result == 1) ? true : false;
return this._isValid;
}
set
{
_isValid = value;
}
}
public abstract IDbCommand Validate
{
get;
set;
}
Communicating with the outside world
Introduction
At some point, the lob stream interacts with client code, either to get the lob content from the database or to write lob content to the database. This interaction can come in two flavors:
- Use the helper classes BinaryWriter and BinaryReader in conjunction with descendants of LobStream
- Use the
ReadFrom
and WriteTo
methods that are already included in LobStream, but in most cases overridden in the .NET Data Provider specific subclasses. The difference between the two approaches is not trivial.
BinaryReader subscribes to the orthodox stream logic and is heavily dependent on the Read
method, as well as on the Length
property. WriteTo
, on the other hand, only uses Read
and Length
in the LobStream implementation. In the subclasses, WriteTo
does not use the Read
method or the Length
property that Read
depends on. This is particularly important for databases from which it is not possible to retrieve the length of a stored lob. For example, in SQL Server we can use the DataLength
function to fetch the size of a lob. In MS Access, however, no counterpart for DataLength
exists. Getting the length for an Access lob requires some ugly workaround.
A little side-note here: do not be confused that BinaryReader and WriteTo
are compared side-by-side. Both functionalities actually read the lob stream. The WriteTo
method first reads the lob content and then writes it to a destination stream or byte array.
WriteTo/ReadFrom in LobStream
Once instantiated, two important methods are responsible for accepting outside data and writing it into the LobStream or supplying data to an outside stream or byte array. These overloaded methods are ReadFrom
and WriteTo
. The implementations are pretty straightforward and are generally sufficient for the subclasses, as well:
public virtual void ReadFrom(Stream source,
bool closeWhenFinished, long bufferSize)
{
try
{
int bytesRead;
byte[] buffer = new byte[bufferSize];
bytesRead = source.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
this.Write(buffer, 0, bytesRead);
bytesRead = source.Read(buffer, 0, buffer.Length);
}
this.Flush();
}
catch (Exception ex)
{
throw new Exception("Read from stream operation failed.", ex);
}
finally
{
if (closeWhenFinished) source.Close();
}
}
public virtual void ReadFrom(Stream source)
{
this.ReadFrom(source, true, this.BufferSize);
}
public virtual void WriteTo(Stream destination,
bool closeWhenFinished, long bufferSize)
{
try
{
int bytesRead;
byte[] buffer = new byte[bufferSize];
bytesRead = this.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
destination.Write(buffer, 0, bytesRead);
bytesRead = this.Read(buffer, 0, buffer.Length);
}
destination.Flush();
}
catch (Exception ex)
{
throw new Exception("Write to stream operation failed.", ex);
}
finally
{
if (closeWhenFinished)this.Close();
}
}
public virtual void WriteTo(Stream destination)
{
this.WriteTo(destination, true, this.BufferSize);
}
As the code shows, the lob is read one buffer at a time in the while (bytesRead > 0)
loop. The Read
method implementation is mandatory in LobStream, since it is an abstract method in System.IO.Stream
. However, the Read
method must, in turn, be overridden by further descendants. Therefore the Read
method in LobStream throws an error. It is likely that the WriteTo
base implementation, as included in LobStream, will seldom or never be used. The reason is that the .NET Data Providers offer data readers capable of accessing database lobs sequentially. These data readers do not require knowledge of the lob's length beforehand, as does the orthodox Read
method. We will come back to this in the actual .NET Data Provider specific classes.
The LobStreamFactory
LobStreamFactory is a factory class that returns the most appropriate LobStream class based upon the data type and database of the lob. The data type of the lob is provided through the LobDataType
enumeration:
public enum LobDataType : int
{
Unknown = 0,
SqlServerBinaryFromImage = 100,
SqlServerBinaryFromVarBinary = 101,
AccessBinaryFromImage = 200,
SqlServerTextFromImage = 500,
SqlServerTextFromVarBinary = 501,
AccessTextFromImage = 600
}
The LobStreamFactory provides three overloaded static methods called CreateLobStream
, which will return the best suited LobStream subclass. The simplest of the three is shown hereunder.
public static LobStream CreateLobStream(LobDataType lobDataType)
{
LobStream result = null;
switch (lobDataType)
{
case LobDataType.SqlServerBinaryFromImage:
case LobDataType.SqlServerTextFromImage:
result = (LobStream)new SqlLobStreamFromImage();
break;
case LobDataType.SqlServerBinaryFromVarBinary:
case LobDataType.SqlServerTextFromVarBinary:
result = (LobStream)new SqlLobStreamFromVarBinary();
break;
case LobDataType.AccessBinaryFromImage:
case LobDataType.AccessTextFromImage:
result = (LobStream)new AccessLobStream();
break;
}
return result;
}
The fact that CreateLobStream
s are static methods permits immediate use against the LobStreamFactory in the client code, without bothering about instantiation. Furthermore, the LobStreamFactory class prevents direct instantiation by making the constructor private.
We outlined above that the LobStream always expects an open connection. In Windows Forms applications, this would seldom cause performance problems. However, in a web scenario we should use connections judiciously and keep them open as briefly as possible. Therefore the second overload of CreateLobStream
accepts a connection string instead of an open connection. See the source code of this article for more detail. It is the lob stream framework that is now responsible for opening and closing the connection, thus ensuring that it will be used as economically as possible.
Conclusion
In this first part, we have cut through the more technical details of setting up a framework for streams that can work with binary or text data stored in a database. The framework is generic enough to provide a solid foundation for subclasses of LobStream, which will manage the flow of data to and from all kinds of databases. In the next part, we'll go into the further details of implementing a lob stream for a specific database.
History
- 6 July, 2007 -- Original version posted
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.