Click here to Skip to main content
6,305,776 members and growing! (15,233 online)
Email Password   helpLost your password?
Database » Database » ADO.NET     Intermediate

Using CLR integration to compress BLOBs/CLOBs in SQL Server 2005

By yolousa72

This article shows how to use CLR integration to compress data in SQL Server 2005
C#, Windows, .NET, Visual Studio, DBA, Dev
Posted:29 Dec 2006
Views:49,133
Bookmarked:39 times
Announcements
Loading...
 
Search    
Advanced Search
printPrint   Broken Article?Report       add Share
  Discuss Discuss   Recommend Article Email
14 votes for this article.
Popularity: 5.30 Rating: 4.63 out of 5

1

2

3
4 votes, 28.6%
4
10 votes, 71.4%
5

Introduction

Manipulating Binary Large Objects (BLOBs) and Character Large Objects (CLOBs) has always been difficult in SQL Server. The new SQL Server 2005 provides new data types (NVARCHAR(MAX), VARCHAR(MAX) and VARBINARY(MAX)) for large object storage (up to 2 GB) allowing better manipulation as well as the ability to process these data types using CLR procedures and functions. This article shows how to create CLR functions to seamlessly compress and decompress large data objects with minimum performance impact using compression algorithms included in.NET Framework 2.0. Code samples included in this article can be used in any database implementation but do not cover all possible scenarios. For large implementations or mission critical applications consider using third party products like SQLCompres.NET (it is free).

Using the code

The new data types

SQL Server 2005 provides three new data types to store and manipulate large objects. VARCHAR(MAX) can store CLOBs, NVARCHAR(MAX) does the same but allows Unicode characters and VARBINARY(MAX) can store BLOBs. Books Online states that �max indicates that the maximum storage size is 2^31-1 bytes�. These new data types can be used with a wide range of T-SQL commands and behave much like traditional VARBINARY(n), VARCHAR(n) and NVARCHAR(N). The new data types should replace the TEXT and IMAGE types from previous versions. As per SQL Server Books Online TEXT and IMAGE columns should not be used in new development and legacy applications should be changed to use the new types. These new types, as opposed to TEXT and IMAGE, can be used as variables and function parameters and can be returned by CLR (or T-SQL) scalar-value functions. These new characteristics make them great candidates for compression. Previous attempts to add CLOB and BLOB compression to SQL Server involved using extended procedures, a difficult and risky business. Using the CLR integration capabilities introduced with SQL Server 2005 makes such implementation more secure and stable (and sometimes faster than their extended procedure counterparts). CLR�s procedures and functions parameters can receive and process these data types as SQLChars and SQLBytes. SQLChars can be used to pass VARCHAR(MAX) and NVARCHAR(MAX) parameter while SQLBytes is used to pass VARBINARY(MAX) types. CLR uses only Unicode characters and passing VARCHAR(MAX) as a parameter implies converting it to Unicode before parameters are passed.

Compressing a BLOB

SQL Server 2005 allows CLR code to be registered as functions and stored procedure. Developers can now extend SQL Server�s functionality using a broad array of programming languages from managed C++ to Visual Basic. How the CLR is hosted inside SQL Server goes beyond the scope of this article. For those who worry about enabling CLR integration, suffice to say that Microsoft has made a conscious effort to keep this integration as safe and secure as possible. Let�s use VARBINARY(MAX) for simplicity sake, since data can be converted between BLOB and CLOB types this article�s code can be extended using T-SQL functions. Listing 1 contains a function to compress BLOBs in SQL server, the function receives a VARBINARY(MAX) or IMAGE as a SQLBytes and compresses it using the DeflateStream class provided in .NET Framework 2.0. SQLBytes represents a mutable type that wraps either an array or a stream. We are going to assume it wraps an array to avoid complicated code, and get the data from the Buffer property. Using this property might fail with larger BLOBs causing the CLR to throw an out of memory exception (but don�t worry, unlike extended procedures errors, CLR exceptions should not crash your SQL Server).

Listing 1: Compression function

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

using System.IO;
using System.IO.Compression;

public partial class UserDefinedFunctions
{
    // Setting function characteristics

    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true,
                                            DataAccess=DataAccessKind.None)]
    public static SqlBytes fn_compress(SqlBytes blob)
    {
        if (blob.IsNull)
            return blob;

        // Retrieving BLOB data

        byte[] blobData = blob.Buffer;

        // Preparing for compression

        MemoryStream compressedData = new MemoryStream();
        DeflateStream compressor = new DeflateStream(compressedData,
                                           CompressionMode.Compress, true);

        // Writting uncompressed data using a DeflateStream compressor

        compressor.Write(blobData, 0, blobData.Length);

        // Clossing compressor to allow ALL compressed bytes to be written

        compressor.Flush();
        compressor.Close();
        compressor = null;

        // Returning compressed blob

        return new SqlBytes(compressedData);
    }
};
Compressing a BLOB in SQL Server 2005 is as easy as passing a SQLBytes parameter, reading its content and writing it to a compression stream. The compression stream writes to a MemoryStream that is later used to create a new SQLBytes object that can be returned to SQL Server applications or directly to the client. There is only one caveat: Microsoft�s implementation of DeflateStream requires the stream to be closed before it writes the last compressed bytes, flushing is not enough. Listing 2 loads an assembly in SQL Server (a process called cataloging where assemblies are verified for security and reliability) and creates the compression function. Listing 3 shows how to use the function in a T-SQL update. This usage makes compressing columns a seamless process that would require only server side adjustments.

Listing 2: Loading assembly and creating CLR function

CREATE ASSEMBLY 
                  FROM 'D:\Development\BlobCompression.Dll'
                  WITH PERMISSION_SET = SAFE

CREATE FUNCTION [fn_decompress]           (
                 @compressedBlob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME .[UserDefinedFunctions]
                                .[fn_decompress];
Listing 3: Compressing data
create table #temp (
      blob_col    varbinary(max));

insert into #temp
values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in
                               ' your project. This file is located in the Test 
                               ' Scripts folder in the Solution Explorer.'));
drop table #temp;

Decompressing a BLOB

Listing 4 contains a function to decompress a BLOB. This function follows the same principles used in compression but now reads from a stream returning a decompressed block that can be used to create and return a decompressed SQLBytes object.

Listing 4: Decompression function

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

using System.IO;
using System.IO.Compression;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, 
                                            DataAccess = DataAccessKind.None)]
    public static SqlBytes fn_decompress(SqlBytes compressedBlob)
    {
        if (compressedBlob.IsNull)
            return compressedBlob;

        // Preparing to read data from compressed stream

        DeflateStream decompressor = new DeflateStream(compressedBlob.Stream,
                                           CompressionMode.Decompress, true);

        // Initializing variables

        int bytesRead = 1;
        int chunkSize = 10000;
        byte[] chunk = new byte[chunkSize];

        // Preparing destination stream to hold decompressed data

        MemoryStream decompressedData = new MemoryStream();

        try
        {
            // Reading from compressed stream 

            while ((bytesRead = decompressor.Read(chunk, 0, chunkSize)) > 0)
            {
                // Writting decompressed data

                decompressedData.Write(chunk, 0, bytesRead); 
            }
        }
        catch (Exception)
        {
            throw; // Nothing to do...

        }
        finally
        {
            // Cleaning up

            decompressor.Close();
            decompressor = null;
        }

        // Returning a decompressed BLOB

        return new SqlBytes(decompressedData);
    }
};
Listing 5 loads an assembly and creates a compression and decompression CLR function. Compression and decompression can be tested using listing 6, it creates a table and add some values to it, a compression update is run followed by a select statement that returns uncompressed data.

Listing 5: Loading assembly and creating CLR functions

CREATE ASSEMBLY 
                  FROM 'D:\Development\BlobCompression.Dll'
                  WITH PERMISSION_SET = SAFE

CREATE FUNCTION [fn_decompress]           (
                  @compressedBlob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME .[UserDefinedFunctions]
                                .[fn_decompress];

CREATE FUNCTION [fn_compress](
                  @blob varbinary(MAX))
            RETURNS varbinary(MAX)
            AS    EXTERNAL NAME .[UserDefinedFunctions]
                                .[fn_compress];
Listing 6: Testing functionality
create table #temp (
      blob_col    varbinary(max));

insert into #temp
values(convert(varbinary(max), 'To run your project, please edit the Test.sql file in
                               'your project. This file is located in the Test 
                               'Scripts folder in the Solution Explorer.'));

update #temp
set blob_col = master.dbo.fn_compress(blob_col);

select convert(varchar(1000), master.dbo.fn_decompress(blob_col))
from #temp;

drop table #temp;

Limitations

The code included in this article allows column level compression in SQL Server 2005 but it lacks functions for consistency check and will not work very well with large objects (5 MB or more depending on configuration). It is intended to show how to use CLR integration in SQL Server to extend the engine�s functionality and provides an overview on what can be done with the new BLOB/CLOB data types.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

yolousa72


Member
Yoel Martinez has a Master Degree in Information Technology. He has been working for over seven years with large distributed environments and specializes in replication and SQL Server development. Yoel can be reached at yoelm@bellsouth.net.
Occupation: Database Developer
Location: United States United States

Other popular Database articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 11 of 11 (Total in Forum: 11) (Refresh)FirstPrevNext
GeneralIs it Possible to do the same with Oracle Database also? Pinmemberrajinca3:03 12 Jun '09  
GeneralMinor issue with compression Pinmemberjustsomeguy767:21 6 May '09  
Generaljust what i need Pinmembersheijin23:40 18 Jun '07  
QuestionQuestion PinmemberBitterstamps11:10 3 May '07  
AnswerRe: Question Pinmemberyolousa720:17 4 May '07  
GeneralGreat Article Pinmember22:17 21 Jan '07  
GeneralRe: Great Article Pinmemberyolousa725:09 22 Jan '07  
GeneralGreat Article Pinmember22:17 21 Jan '07  
GeneralVery Nice PineditorPaul Conrad8:41 30 Dec '06  
GeneralRe: Very Nice Pinmemberyolousa728:43 5 Jan '07  
GeneralRe: Very Nice PineditorPaul Conrad9:10 5 Jan '07  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 29 Dec 2006
Editor: Paul Conrad
Copyright 2006 by yolousa72
Everything else Copyright © CodeProject, 1999-2009
Web15 | Advertise on the Code Project