Click here to Skip to main content
Click here to Skip to main content

The Lob Stream Framework, Part 1

, 5 Jul 2007 CPOL
Rate this:
Please Sign up or sign in to vote.
Providing a few classes and types that aid in the use of lob streams

Introduction

Over a year ago, I read an article by Jarno Peschier in .NET Magazine. The article, written in Dutch, is called "Blob + Stream = BlobStream" and can be found here. It discusses the use of streams in conjunction with blobs in an SQL Server database. Blob is short for "Binary Large OBject" and generally refers to stored pictures, PDFs and other binary files. Jarno's idea is to use System.IO.Stream to read a blob from or write a blob to a database.

The more widespread techniques to get blobs into and out of a database generally use byte arrays that contain the entire blob. For most scenarios and/or with relatively small blobs, this shouldn't be a problem. However, the transfer of a whole blob in one piece might not be a good idea in a web scenario with potentially hundreds of users requesting the same large file from the database at the same time. The big advantage of using streams is that the blob is transported in small chunks, thus keeping the memory footprint on the application server as small as possible.

Having used Jarno's technique in some of my recent projects, I decided to take it a step further and provide an extensible framework to manipulate blobs from other databases than SQL Server. As this is a rather large subject, I've split it up in parts. For the remainder of these articles, I shall no longer be talking about blobs, but about lobs instead. Of course, lob stands for "Large OBject." The reason is that lob streams can be used not only with binary data, but with text data as well.

Part 2 of this article is also available.

The philosophical discussion

This article does not take a stand in the big debate about whether or not it is a good idea to store pictures and other binary files in a database instead of using the file system. A rule of thumb that Dino Esposito gave in his book "Programming Microsoft ASP.Net 2.0 Applications – Advanced Topics" (Chapter 9) is:

  • Use the file system if you are manipulating images larger than 1 Mb
  • Use the file system if you have several hundred thousand images
  • Use of the database is OK if the two previous conditions are not met

In another article Dino added, "If you need to frequently edit the images, I suggest that you store the images as separate files. If your files are essentially read-only and relatively static, and if you measure the size in Kb, then using a database is generally fine." Personally, I agree with Dino. In the projects where I stored pictures in the database, they were smaller than 1 Mb most of the time and served merely as an illustration to a text. Thus, they were static by nature.

The framework

The purpose of the framework is to provide a few classes and types that aid in the use of lob streams. The heart of the framework is an abstract class called LobStream. LobStream serves as the general base class for all subsequent implementations of lob streams that are database-specific. LobStream derives from System.IO.Stream, a class that exposes abstract methods that must be overridden by child classes. In addition, LobStream implements ILobStream which is an interface that groups common methods and properties when working with database lobs. This interface ensures that client code will not be broken as long as new classes managing lob streams for yet-unsupported databases implement it.

The abstract LobStream class has two abstract subclasses: SqlLobStream and OleDbLobStream. As the names reveal, these classes refer to the two main .NET Data Providers for accessing databases. In the future, the framework can be extended with other .NET Data Provider specific classes like, for example, OdbcLobStream or OracleLobStream. See the interesting article on working with Oracle lobs for more information. The .NET Data Provider specific abstract subclasses serve as the basis for the actual implementation of a database-specific lob stream, as we will see further. Furthermore, the framework exposes a LobDataType enumeration that indicates the type of database and data type of the lob inside the database.

Finally, the framework implements a LobStreamFactory class named CreateLobStream that exposes three overloaded static methods. The CreateLobStream methods accept some parameters and return an appropriate lob stream object. The framework was created in Visual Studio 2005 as a class library project. It outputs to LobStream.dll. As long as the client code knows how to deal with an instance of LobStream that implements the ILobStream interface, the underlying code in the DLL can be changed and extended over time. Several design goals are achieved in this way:

  • Reusability: just reference the framework classes in your project and you are set to go.
  • Extensibility: you can easily add new lob stream classes that can talk to new/other databases or simply change the existing classes. As long as they derive from LobStream, no existing client code will be broken; just reference the updated framework in your projects.

The full picture of the framework is displayed here:

Screenshot - img1.jpg

ILobStream

The implementation of ILobStream is pretty straightforward:

public interface ILobStream
{
    IDbConnection Connection { get; set;}
    IDbCommand Validate { get; set;}
    bool IsValid { get; set;}
    string TableName { get; set;}
    string KeyColumn { get; set;}
    string LobColumn { get; set;}
    string LobID { get; set;}
    void WriteTo(Stream destination, bool closeWhenFinished, long bufferSize);
    void WriteTo(Stream destination);
    void WriteTo([In, Out] byte[] bytes);
    void ReadFrom(Stream source, bool closeWhenFinished, long bufferSize);
    void ReadFrom(Stream source);
    byte[] ToByteArray();
}

The interface makes sure that, in order to be able to communicate with a database, at least the following is available in the implementing class:

  • Connection: the connection to the database. The Connection property is of type IDbCommand, to be compatible with connections exposed by any .NET Data Providers
  • TableName: the name of the table in which the lob is or has to be stored
  • KeyColumn: the name of the column in the aforementioned table that holds the unique value by which the destination record can be identified
  • LobColumn: the name of the column that holds the lob
  • LobID: the unique value in the KeyColumn that indicates the database record to hold the lob

All of the above ultimately has to be supplied by the client code to the implementing class. In addition, ILobStream contains a few useful methods to enable the LobStream to communicate with other streams:

  • WriteTo: the LobStream pours the lob into a destination stream or into a byte array
  • ReadFrom: the LobStream retrieves the lob from a source stream

Finally, the ToByteArray function returns the lob as a byte array.

Inheriting from System.IO.Stream

Any class that derives from System.IO.Stream has to override its abstract properties and methods, which are:

Properties:

  • CanSeek
  • CanRead
  • CanWrite
  • Position
  • Length

Methods:

  • Flush
  • Seek
  • Read
  • Write
  • SetLength

Since the LobStream is deriving from System.IO.Stream, let's zoom in on the methods and properties to override.

LobStream

We reach the point that we must provide an actual implementation of the abstract methods of Stream in order to produce a base class for specific lob streams. In addition, we make LobStream an abstract class as well so that it can not be instantiated directly. The LobStream class serves only as a base class for database-specific implementations in a subclass.

Implementing the abstract properties of System.IO.Stream

CanSeek, CanRead, CanWrite

A stream should support seeking if the position in the dataflow matters and if the read/written chunks of data are meaningless unless they are ordered properly. Since the LobStream is all about the transfer of smaller binary data chunks to and from a database, it is obvious that these chunks must be kept in the proper order. Therefore, CanSeek should return true. CanSeek must be set to true in order to be able to use Length, SetLength and Position. Reading and writing lobs from and to a database is the reason for creating a lob stream. It is obvious that CanRead and CanWrite should also return true.

#region CanSeek, CanRead, CanWrite
public override bool CanSeek
{
    get { return true; }
}

public override bool CanRead
{
    get { return true; }
}

public override bool CanWrite
{
    get { return true; }
}
#endregion

Position

Position gets or sets the current position in the stream, which indicates the progress of the subsequent reading operations. The implementation of Position is straightforward:

private long _position = 0;
public override long Position
{
    get
    {
        return _position;
    }
    set
    {
        _position = value;
    }
}

Length

Length indicates the length in number of bytes of the stream. In the case of LobStream, Length must therefore be equal to the length of the binary data that is manipulated. The Length value is important when reading from the database. In contrast, when writing the lob data to the database, we just append the chunks one after another without having to bother about length. However, measuring the length of the underlying lob is database-specific and must be implemented in the deriving class. If not, an exception is thrown.

public override long Length
{
    get 
    { 
        throw new Exception("The property is not implemented."); 
    }
}

Implementing the abstract methods of System.IO.Stream

The Flush method

Some stream implementations perform local buffering of the underlying data to improve performance. For such streams, the Flush method can be used to clear any internal buffers and ensure that all data has been written to the underlying data source or repository. However, no internal buffers are used in the LobStream. We need to override the Flush method since it is defined as an abstract method in the base class, but without providing any additional functionality.

public override void Flush()
{
    //No internal buffers are used. So, nothing to flush...
}

The Seek method

The Seek method is an abstract method in System.IO.Stream. It must be overridden in derived classes to the set position in the stream relative to the SeekOrigin value, which can be Begin, Current or End. As it turns out, the implementation of the Seek method is common to all subclasses of LobStream and very straightforward.

public override long Seek(long offset, SeekOrigin origin)
{
    long start = 0;
    switch (origin)
    {
        case SeekOrigin.Begin:
            start = 0;
            break;
        case SeekOrigin.Current:
            start = this.Position;
            break;
        case SeekOrigin.End:
            start = this.Length;
            break;
    }

    this.Position = start + offset;
    return this.Position;
}

Read and Write Methods

Although the Read and Write methods are database specific, we still need to provide for an actual implementation, since these are abstract methods in System.IO.Stream. The best approach is to throw an exception if these methods are not properly overridden in the subclass.

public override int Read(byte[] buffer, int offset, int count)
{
    throw new Exception("The Read method is not implemented.");
}

public override void Write(byte[] buffer, int offset, int count)
{
    throw new Exception("The Write method is not implemented.");
}

The SetLength method

With this method, the length of the stream is set to a desired length either by truncating or expanding the stream. However, the LobStream needs to manipulate the binary data exactly as it is, without truncating or expanding. SetLength will therefore be overridden in LobStream without providing any functionality. An exception is thrown if this method is called.

public override void SetLength(long value)
{
    throw new Exception("The SetLength method is not supported.");
}

Providing more base functionality in LobStream

Constructor

The LobStream class provides two constructors, one with and one without parameters. The constructor with the parameters is useful to accept at once all necessary user-supplied information.

public LobStream(IDbConnection connection, string tableName, 
    string keyColumn, string lobColumn, string lobID)
{
    this.Connection = connection;
    this.TableName = tableName;
    this.KeyColumn = keyColumn;
    this.LobColumn = lobColumn;
    this.LobID = lobID;
}

It is important to note that lob stream classes expect to receive an open connection from the calling code. On the other hand, when the lob stream class is disposed of, the connection is closed automatically. This is explained in the "Cleaning up" section.

Cleaning up

The LobStream uses a database connection for transport of the lob. After the work is done, this connection should be closed and released. The question is where. The most intuitive solution is to override the Close method of the Stream. Microsoft documentation recommends otherwise: in derived classes, the Close method should not be overridden. All of the Stream cleanup logic should reside in the Dispose(bool) method, which is called in the actual implementation of the Close method of System.IO.Stream:

public virtual void Close()
{
    this.Dispose(true);
    GC.SuppressFinalize(this);
}

The implementation of the Dispose methods in System.IO.Stream is as follows:

// Implements IDisposable. 
// Do not make this method virtual. 
// A derived class should not be able to override this method.
// This method disposes the stream, by writing any changes to 
// the backing store and closing the stream to release resources.

public void Dispose()
{
    this.Close();
}

// Dispose(bool disposing) executes in two distinct scenarios. 
// If disposing equals true, the method has been called directly 
// or indirectly by a user's code. Managed and unmanaged resources 
// can be disposed. 
// If disposing equals false, the method has been called by the 
// runtime from inside the finalizer and you should not reference 
// other objects. Only unmanaged resources can be disposed.

protected virtual void Dispose(bool disposing)
{
    if(disposing && (this._asyncActiveEvent != null))
    {
        this._CloseAsyncActiveEvent(
            Interlocked.Decrement(ref this._asyncActiveCount));
    }
}

In LobStream, we override the Dispose(bool) method as follows:

protected override void Dispose(bool disposing)
{
    if(!this.disposed)
    {
        try
        {
            if(disposing) this.Connection.Close();
            this.disposed = true;
        }
        finally
        {
            // Call Dispose on the base class.
            base.Dispose(disposing);
        }
    }
}

In summary, calling Close in turn calls Dispose(true), where the connection will be closed. As a reminder, the nice thing about having System.IO.Stream implementing the IDisposable interface is that we can employ the using statement like:

using (LobStream.LobStream s = LobStreamFactory.CreateLobStream(
    LobDataType.AccessBinaryFromImage, connectionString,
    "Blob", "blobID", "blobImage", "1"))
{ 
    //Do some stuff with it
}

The using statement defines a scope outside that the LobStream will be disposed to automatically.

A validation mechanism

Following the "check your input" rule , I included a validation mechanism by which the LobStream can verify that the indicated lob is indeed present in the database. LobStream has an abstract property, Validate, of type IDbCommand. This Validate command is database-specific and has to be provided by the actual implementation of the subclass. For the validation mechanism to work, the Validate command must return the number of records in the database matching the user-supplied information, i.e. KeyColumn, LobID, LobColumn and TableName. The Validate command is called inside the get section of the IsValid boolean property. If the Validate command returns any other number than 1, the LobStream object is not valid.

private bool _isValid = false;

public virtual bool IsValid
{
    get
    {
        int result = Convert.ToInt32(this.Validate.ExecuteScalar());
        this._isValid = (result == 1) ? true : false;
        return this._isValid;
    }

    set
    {
        _isValid = value;
    }
}

public abstract IDbCommand Validate 
{
    get; 
    set;
}

Communicating with the outside world

Introduction

At some point, the lob stream interacts with client code, either to get the lob content from the database or to write lob content to the database. This interaction can come in two flavors:

  • Use the helper classes BinaryWriter and BinaryReader in conjunction with descendants of LobStream
  • Use the ReadFrom and WriteTo methods that are already included in LobStream, but in most cases overridden in the .NET Data Provider specific subclasses. The difference between the two approaches is not trivial.

BinaryReader subscribes to the orthodox stream logic and is heavily dependent on the Read method, as well as on the Length property. WriteTo, on the other hand, only uses Read and Length in the LobStream implementation. In the subclasses, WriteTo does not use the Read method or the Length property that Read depends on. This is particularly important for databases from which it is not possible to retrieve the length of a stored lob. For example, in SQL Server we can use the DataLength function to fetch the size of a lob. In MS Access, however, no counterpart for DataLength exists. Getting the length for an Access lob requires some ugly workaround.

A little side-note here: do not be confused that BinaryReader and WriteTo are compared side-by-side. Both functionalities actually read the lob stream. The WriteTo method first reads the lob content and then writes it to a destination stream or byte array.

WriteTo/ReadFrom in LobStream

Once instantiated, two important methods are responsible for accepting outside data and writing it into the LobStream or supplying data to an outside stream or byte array. These overloaded methods are ReadFrom and WriteTo. The implementations are pretty straightforward and are generally sufficient for the subclasses, as well:

public virtual void ReadFrom(Stream source, 
    bool closeWhenFinished, long bufferSize)
{
    try
    {
        int bytesRead;
        byte[] buffer = new byte[bufferSize];
        bytesRead = source.Read(buffer, 0, buffer.Length);
        while (bytesRead > 0)
        {
            this.Write(buffer, 0, bytesRead);
            bytesRead = source.Read(buffer, 0, buffer.Length);
        }
        this.Flush();
    }
    catch (Exception ex)
    {
    throw new Exception("Read from stream operation failed.", ex);
    }
    finally
    {
    if (closeWhenFinished) source.Close();
    }
}

public virtual void ReadFrom(Stream source)
{
    this.ReadFrom(source, true, this.BufferSize);
}

public virtual void WriteTo(Stream destination, 
    bool closeWhenFinished, long bufferSize)
{
    try
    {
        int bytesRead;
        byte[] buffer = new byte[bufferSize];
        bytesRead = this.Read(buffer, 0, buffer.Length);
        while (bytesRead > 0)
        {
            destination.Write(buffer, 0, bytesRead);
            bytesRead = this.Read(buffer, 0, buffer.Length);
        }
        destination.Flush();
    }
    catch (Exception ex)
    {
        throw new Exception("Write to stream operation failed.", ex);
    }
    finally
    {
        if (closeWhenFinished)this.Close();
    }
}

public virtual void WriteTo(Stream destination)
{
    this.WriteTo(destination, true, this.BufferSize);
}

As the code shows, the lob is read one buffer at a time in the while (bytesRead > 0) loop. The Read method implementation is mandatory in LobStream, since it is an abstract method in System.IO.Stream. However, the Read method must, in turn, be overridden by further descendants. Therefore the Read method in LobStream throws an error. It is likely that the WriteTo base implementation, as included in LobStream, will seldom or never be used. The reason is that the .NET Data Providers offer data readers capable of accessing database lobs sequentially. These data readers do not require knowledge of the lob's length beforehand, as does the orthodox Read method. We will come back to this in the actual .NET Data Provider specific classes.

The LobStreamFactory

LobStreamFactory is a factory class that returns the most appropriate LobStream class based upon the data type and database of the lob. The data type of the lob is provided through the LobDataType enumeration:

public enum LobDataType : int
{
    Unknown = 0,
        SqlServerBinaryFromImage = 100,
        SqlServerBinaryFromVarBinary = 101,
        AccessBinaryFromImage = 200,
        SqlServerTextFromImage = 500,
        SqlServerTextFromVarBinary = 501,
        AccessTextFromImage = 600
}

The LobStreamFactory provides three overloaded static methods called CreateLobStream, which will return the best suited LobStream subclass. The simplest of the three is shown hereunder.

public static LobStream CreateLobStream(LobDataType lobDataType)
{
    LobStream result = null;

    switch (lobDataType)
    {
        case LobDataType.SqlServerBinaryFromImage:
            case LobDataType.SqlServerTextFromImage:
                result = (LobStream)new SqlLobStreamFromImage();
                break;
            case LobDataType.SqlServerBinaryFromVarBinary:
                case LobDataType.SqlServerTextFromVarBinary:
                    result = (LobStream)new SqlLobStreamFromVarBinary();
                    break;
                case LobDataType.AccessBinaryFromImage:
                    case LobDataType.AccessTextFromImage:
                        result = (LobStream)new AccessLobStream();
                        break;
    }

    return result;
}

The fact that CreateLobStreams are static methods permits immediate use against the LobStreamFactory in the client code, without bothering about instantiation. Furthermore, the LobStreamFactory class prevents direct instantiation by making the constructor private.

We outlined above that the LobStream always expects an open connection. In Windows Forms applications, this would seldom cause performance problems. However, in a web scenario we should use connections judiciously and keep them open as briefly as possible. Therefore the second overload of CreateLobStream accepts a connection string instead of an open connection. See the source code of this article for more detail. It is the lob stream framework that is now responsible for opening and closing the connection, thus ensuring that it will be used as economically as possible.

Conclusion

In this first part, we have cut through the more technical details of setting up a framework for streams that can work with binary or text data stored in a database. The framework is generic enough to provide a solid foundation for subclasses of LobStream, which will manage the flow of data to and from all kinds of databases. In the next part, we'll go into the further details of implementing a lob stream for a specific database.

History

  • 6 July, 2007 -- Original version posted

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Peter De Jonghe

United States United States
No Biography provided

Comments and Discussions

 
Generallinks Pinmembergezanal27-Nov-08 2:00 
AnswerRe: links Pinmemberpdejonghe27-Nov-08 6:31 
Generaloverride with Exception PinmemberSteve Hansen6-Jul-07 5:05 
GeneralRe: override with Exception Pinmemberpdejonghe9-Jul-07 23:35 
GeneralRe: override with Exception PinmemberSteve Hansen9-Jul-07 23:37 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.141022.2 | Last Updated 5 Jul 2007
Article Copyright 2007 by Peter De Jonghe
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid