Click here to Skip to main content
Click here to Skip to main content
Go to top

Using CLR integration to compress BLOBs/CLOBs in SQL Server 2005

, 29 Dec 2006
Rate this:
Please Sign up or sign in to vote.
This article shows how to use CLR integration to compress data in SQL Server 2005

Introduction

Manipulating Binary Large Objects (BLOBs) and Character Large Objects (CLOBs) has always been difficult in SQL Server. The new SQL Server 2005 provides new data types (NVARCHAR(MAX), VARCHAR(MAX) and VARBINARY(MAX)) for large object storage (up to 2 GB) allowing better manipulation as well as the ability to process these data types using CLR procedures and functions. This article shows how to create CLR functions to seamlessly compress and decompress large data objects with minimum performance impact using compression algorithms included in.NET Framework 2.0. Code samples included in this article can be used in any database implementation but do not cover all possible scenarios. For large implementations or mission critical applications consider using third party products like SQLCompres.NET (it is free).

Using the code

The new data types

SQL Server 2005 provides three new data types to store and manipulate large objects. VARCHAR(MAX) can store CLOBs, NVARCHAR(MAX) does the same but allows Unicode characters and VARBINARY(MAX) can store BLOBs. Books Online states that “max indicates that the maximum storage size is 2^31-1 bytes”. These new data types can be used with a wide range of T-SQL commands and behave much like traditional VARBINARY(n), VARCHAR(n) and NVARCHAR(N). The new data types should replace the TEXT and IMAGE types from previous versions. As per SQL Server Books Online TEXT and IMAGE columns should not be used in new development and legacy applications should be changed to use the new types. These new types, as opposed to TEXT and IMAGE, can be used as variables and function parameters and can be returned by CLR (or T-SQL) scalar-value functions. These new characteristics make them great candidates for compression. Previous attempts to add CLOB and BLOB compression to SQL Server involved using extended procedures, a difficult and risky business. Using the CLR integration capabilities introduced with SQL Server 2005 makes such implementation more secure and stable (and sometimes faster than their extended procedure counterparts). CLR’s procedures and functions parameters can receive and process these data types as SQLChars and SQLBytes. SQLChars can be used to pass VARCHAR(MAX) and NVARCHAR(MAX) parameter while SQLBytes is used to pass VARBINARY(MAX) types. CLR uses only Unicode characters and passing VARCHAR(MAX) as a parameter implies converting it to Unicode before parameters are passed.

Compressing a BLOB

SQL Server 2005 allows CLR code to be registered as functions and stored procedure. Developers can now extend SQL Server’s functionality using a broad array of programming languages from managed C++ to Visual Basic. How the CLR is hosted inside SQL Server goes beyond the scope of this article. For those who worry about enabling CLR integration, suffice to say that Microsoft has made a conscious effort to keep this integration as safe and secure as possible. Let’s use VARBINARY(MAX) for simplicity sake, since data can be converted between BLOB and CLOB types this article’s code can be extended using T-SQL functions. Listing 1 contains a function to compress BLOBs in SQL server, the function receives a VARBINARY(MAX) or IMAGE as a SQLBytes and compresses it using the DeflateStream class provided in .NET Framework 2.0. SQLBytes represents a mutable type that wraps either an array or a stream. We are going to assume it wraps an array to avoid complicated code, and get the data from the Buffer property. Using this property might fail with larger BLOBs causing the CLR to throw an out of memory exception (but don’t worry, unlike extended procedures errors, CLR exceptions should not crash your SQL Server).

Listing 1: Compression function

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

using System.IO;
using System.IO.Compression;

public partial class UserDefinedFunctions
{
    // Setting function characteristics
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true,
                                            DataAccess=DataAccessKind.None)]
    public static SqlBytes fn_compress(SqlBytes blob)
    {
        if (blob.IsNull)
            return blob;

        // Retrieving BLOB data
        byte[] blobData = blob.Buffer;

        // Preparing for compression
        MemoryStream compressedData = new MemoryStream();
        DeflateStream compressor = new DeflateStream(compressedData,
                                           CompressionMode.Compress, true);

        // Writting uncompressed data using a DeflateStream compressor
        compressor.Write(blobData, 0, blobData.Length);

        // Clossing compressor to allow ALL compressed bytes to be written
        compressor.Flush();
        compressor.Close();
        compressor = null;

        // Returning compressed blob
        return new SqlBytes(compressedData);
    }
};
Compressing a BLOB in SQL Server 2005 is as easy as passing a SQLBytes parameter, reading its content and writing it to a compression stream. The compression stream writes to a MemoryStream that is later used to create a new SQLBytes object that can be returned to SQL Server applications or directly to the client. There is only one caveat: Microsoft’s implementation of DeflateStream requires the stream to be closed before it writes the last compressed bytes, flushing is not enough. Listing 2 loads an assembly in SQL Server (a process called cataloging where assemblies are verified for security and reliability) and creates the compression function. Listing 3 shows how to use the function in a T-SQL update. This usage makes compressing columns a seamless process that would require only server side adjustments.

Listing 2: Loading assembly and creating CLR function

CREATE ASSEMBLY [BlobCompression]
                  FROM 'D:\Development\BlobCompression.Dll'
                  WITH PERMISSION_SET = SAFE

CREATE FUNCTION [fn_decompress]           (
                 @compressedBlob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME [BlobCompression].[UserDefinedFunctions]
                                .[fn_decompress];
Listing 3: Compressing data
create table #temp (
      blob_col    varbinary(max));

insert into #temp
values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in
                               ' your project. This file is located in the Test 
                               ' Scripts folder in the Solution Explorer.'));
drop table #temp;

Decompressing a BLOB

Listing 4 contains a function to decompress a BLOB. This function follows the same principles used in compression but now reads from a stream returning a decompressed block that can be used to create and return a decompressed SQLBytes object.

Listing 4: Decompression function

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

using System.IO;
using System.IO.Compression;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, 
                                            DataAccess = DataAccessKind.None)]
    public static SqlBytes fn_decompress(SqlBytes compressedBlob)
    {
        if (compressedBlob.IsNull)
            return compressedBlob;

        // Preparing to read data from compressed stream
        DeflateStream decompressor = new DeflateStream(compressedBlob.Stream,
                                           CompressionMode.Decompress, true);

        // Initializing variables
        int bytesRead = 1;
        int chunkSize = 10000;
        byte[] chunk = new byte[chunkSize];

        // Preparing destination stream to hold decompressed data
        MemoryStream decompressedData = new MemoryStream();

        try
        {
            // Reading from compressed stream 
            while ((bytesRead = decompressor.Read(chunk, 0, chunkSize)) > 0)
            {
                // Writting decompressed data
                decompressedData.Write(chunk, 0, bytesRead); 
            }
        }
        catch (Exception)
        {
            throw; // Nothing to do...
        }
        finally
        {
            // Cleaning up
            decompressor.Close();
            decompressor = null;
        }

        // Returning a decompressed BLOB
        return new SqlBytes(decompressedData);
    }
};
Listing 5 loads an assembly and creates a compression and decompression CLR function. Compression and decompression can be tested using listing 6, it creates a table and add some values to it, a compression update is run followed by a select statement that returns uncompressed data.

Listing 5: Loading assembly and creating CLR functions

CREATE ASSEMBLY [BlobCompression]
                  FROM 'D:\Development\BlobCompression.Dll'
                  WITH PERMISSION_SET = SAFE

CREATE FUNCTION [fn_decompress]           (
                  @compressedBlob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME [BlobCompression].[UserDefinedFunctions]
                                .[fn_decompress];

CREATE FUNCTION [fn_compress](
                  @blob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME [BlobCompression].[UserDefinedFunctions]
                                .[fn_compress];
Listing 6: Testing functionality
create table #temp (
      blob_col    varbinary(max));

insert into #temp
values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in
                               'your project. This file is located in the Test 
                               'Scripts folder in the Solution Explorer.'));

update #temp
set blob_col = master.dbo.fn_compress(blob_col);

select convert(varchar(1000), master.dbo.fn_decompress(blob_col))
from #temp;

drop table #temp;

Limitations

The code included in this article allows column level compression in SQL Server 2005 but it lacks functions for consistency check and will not work very well with large objects (5 MB or more depending on configuration). It is intended to show how to use CLR integration in SQL Server to extend the engine’s functionality and provides an overview on what can be done with the new BLOB/CLOB data types.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

yolousa72
Database Developer
United States United States
Yoel Martinez has a Master Degree in Information Technology. He has been working for over seven years with large distributed environments and specializes in replication and SQL Server development. Yoel can be reached at yoelm@bellsouth.net.

Comments and Discussions

 
QuestionIs it Possible to do the same with Oracle Database also? Pinmemberrajinca12-Jun-09 2:03 
GeneralMinor issue with compression Pinmemberjustsomeguy766-May-09 6:21 
Generaljust what i need Pinmembersheijin18-Jun-07 22:40 
QuestionQuestion PinmemberBitterstamps3-May-07 10:10 
AnswerRe: Question Pinmemberyolousa723-May-07 23:17 
GeneralGreat Article PinmemberMember #345147721-Jan-07 21:17 
GeneralRe: Great Article Pinmemberyolousa7222-Jan-07 4:09 
GeneralGreat Article PinmemberMember #345147721-Jan-07 21:17 
GeneralVery Nice PineditorPaul Conrad30-Dec-06 7:41 
GeneralRe: Very Nice Pinmemberyolousa725-Jan-07 7:43 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140916.1 | Last Updated 29 Dec 2006
Article Copyright 2006 by yolousa72
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid