Click here to Skip to main content
15,860,859 members
Articles / Programming Languages / Visual Basic
Article

Better Than Zip Algorithm For Compressing In-Memory Data

Rate me:
Please Sign up or sign in to vote.
4.07/5 (41 votes)
23 May 2006CPOL6 min read 175.9K   5.2K   75   67
An article on the in-memory data compression engine for .NET
Image 1

Introduction

Sometimes you may need to compress data in memory without creating disk files. For example, it is possible to compress binary data in a CDATA section of an XML file. Also, you may want to compress data transmitted between the client and server machines over a low-bandwidth connection.

The proposed classes implement the original compression algorithm. This method is based on the "DEFLATE Compressed Data Format Specification" (RFC 1951) with some enhancements. It works slightly better than the generally used zip compression. The algorithm is especially effective on Centrino and other platforms with 2 MB L2 cache. The above picture shows results of the mscorlib.xml file compression/decompression on a machine with Pentium M processor (1.8 GHz). Running the same test on a machine with AMD Athlon 2100+ processor (with 256 KB L2 cache) gives 721 ms as the compression time and 40 ms as the decompression time. It is interesting that Intel and AMD platforms have such an evident difference in performance for various operations (here 110 ms on the Pentium M is quite imperfect data, however, because of the big inaccuracy).

You can compare this compression engine with the standard DeflateStream compressor added in .NET 2.0. To do so, please open the source code for the demo project (Form1.cs) and comment/uncomment several code blocks. They are marked in the source code. Then, try the demo project with various data files and compare results with the MemCompressor engine. You will see, the standard DeflateStream engine is rather poor.

Using MemCompressor

The MemCompressor assembly includes two main classes: AcedDeflator and AcedInflator. The AcedDeflator class has the Compress method. This method accepts a byte array and returns new byte array with the same data in the compressed form. For example:

C#
private int _originalSize;
private byte[] _originalData;

private int _originalChecksum;
private byte[] _compData;
private int _compSize;

private void CompressData()
{
    _originalChecksum = AcedUtils.Adler32(_originalData, 0, _originalSize);
    _compData = AcedDeflator.Instance.Compress(_originalData, 0, _originalSize,
        AcedCompressionLevel.Normal, 0, 0);
    _compSize = _compData.Length;
}

The AcedInflator class works quite the contrary. It has the Decompress method which accepts a binary array with compressed data. This method restores the original data in the output array.

C#
private byte[] _decompData;
private int _decompSize;
private int _decompChecksum;

private void DecompressData()
{
    _decompData = AcedInflator.Instance.Decompress(_compData, 0, 0, 0);
    _decompSize = _decompData.Length;
    _decompChecksum = AcedUtils.Adler32(_decompData, 0, _decompSize);
    Debug.Assert(_decompChecksum == _originalChecksum);
}

The AcedDeflator.Compress method accepts, among other, the compressionLevel parameter. This parameter chooses balance between the speed and the compression ratio. Here you may pass one of the following values:

  • Store (simple copying data to the output array without compression)
  • Fastest (poor compression ratio but the best speed)
  • Fast (optimal for less-redundant data)
  • Normal (better choice for higher-redundant data, such as texts)
  • Maximum (slow but has the best compression ratio)

AcedDeflator Class

C#
public class AcedDeflator

Provides methods for compressing binary data with a combination of the LZ77 algorithm and Huffman coding similar to the method described in RFC 1951. You should synchronize calls to the methods of this class in a multithreaded application.

C#
public AcedDeflator()

Creates a new instance of the AcedDeflator class. It is not recommended to call this constructor directly. When possible, please use a single cached instance of the AcedDeflator class which is returned by the Instance static property.

C#
public static AcedDeflator Instance { get; }

Returns a cached instance of the AcedDeflator class. Such an instance occupies a lot of memory (a little more than 2 MB). So, a single instance is created and cached in an internal static field of this class. This instance can be used each time when you want to compress some data. The only exception may be a multithreaded application where you compress data in several threads simultaneously. If so, you should create each instance with the regular constructor.

C#
public byte[] Compress(byte[] sourceBytes, int sourceIndex, int sourceLength,
    AcedCompressionLevel compressionLevel, int beforeGap, int afterGap)

Compresses binary data passed in the sourceBytes array starting from sourceIndex with sourceLength bytes in length. The compression ratio depends on the compressionLevel parameter. This method creates a new byte array which is enough to store the compressed data. You may also reserve beforeGap bytes before the compressed data block and afterGap bytes after the compressed data block. This may be useful if you want to supply a custom header and/or tail to the compressed data and you don't want to reallocate memory unnecessarily. The result byte array consists of (beforeGap + Compressed_Data_Length + afterGap) bytes.

C#
public int Compress(byte[] sourceBytes, int sourceIndex, int sourceLength,
    AcedCompressionLevel compressionLevel, byte[] destinationBytes,
    int destinationIndex)

This method is similar to the previous one but it doesn't allocate the memory for the output array. The output array is passed as the destinationBytes argument. The compressed data will be stored in that array starting from the destinationIndex. The maximum length of the compressed data is (sourceLength + 4) bytes. The output array must have enough space. This method returns the number of bytes stored in the output array. If you pass null in the destinationBytes argument, the method performs "dummy" compression. In such a way, you can find out the length of the output data. However, the "dummy" compression takes almost the same time as the regular compression.

C#
public static void Release()

Call this method to release an internal static reference to the cached instance of the AcedDeflator class. After that, you may call GC.Collect() to free memory which was occupied by that cached instance.

AcedInflator Class

C#
public class AcedInflator

Provides methods for decompressing binary data which was compressed using the AcedDeflator class. You should synchronize calls to the methods of this class in a multithreaded application.

C#
public AcedInflator()

Creates a new instance of the AcedInflator class. It is not recommended to call this constructor directly. When possible, to avoid unnecessary memory reallocation, please use a single cached instance of the AcedInflator class which is returned by the Instance static property.

C#
public static AcedInflator Instance { get; }

Returns a cached instance of the AcedInflator class. Such an instance is created when you access this property for the first time. Then, the instance is cached in an internal static field of this class. You can use it to decompress any data in a single-threaded application. In a multithreaded application, please create an AcedInflator for each the thread with the regular constructor.

C#
public static int GetDecompressedLength(byte[] sourceBytes, int sourceIndex)

Returns the minimum length, in bytes, of the binary array where you can store the decompressed data for the specified compressed binary array (sourceBytes). The sourceIndex argument specifies the start index in the sourceBytes array from which the compressed data begins.

C#
public byte[] Decompress(byte[] sourceBytes, int sourceIndex,
    int beforeGap, int afterGap)

Decompresses binary data passed in the sourceBytes array starting from sourceIndex. This method creates a new byte array which is enough to store the decompressed data. You may also reserve beforeGap bytes before the decompressed data block and afterGap bytes after the decompressed data block. This may be useful if you want to supply a custom header and/or tail to decompressed data and you don't want to reallocate memory unnecessarily. The result array will consist of (beforeGap + Decompressed_Data_Length + afterGap) bytes.

C#
public int Decompress(byte[] sourceBytes, int sourceIndex,
    byte[] destinationBytes, int destinationIndex)

The same as the previous method but the memory for the output array must be allocated before calling this method. You should pass the destination array in the destinationBytes argument. The destinationIndex specifies offset in that array from which the decompressed data will be stored. The output array must have enough space to store all the data. To obtain the decompressed data size, please call the GetDecompressedLength method of this class. The Decompress method returns the number of bytes stored in the output (destinationBytes) array. If you pass null in the destinationBytes argument, this method does nothing, just returns the decompressed data length, in bytes.

C#
public static void Release()

Call this method to release an internal static reference to the cached instance of the AcedInflator class. After that, you may call GC.Collect() to free memory which was occupied by that cached instance.

AcedUtils Class

C#
public class AcedUtils

This class has one public static method to compute the Adler-32 checksum.

C#
public static int Adler32(byte[] bytes, int offset, int length)

Computes the Adler-32 checksum in accordance with RFC 1950 for length bytes of the bytes array starting from the offset index.

Conclusion

Just have these simple classes in mind when you will consider adding some compression capabilities to your application.

History

  • 23rd May, 2006: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Russian Federation Russian Federation
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralRe: Is the code safe to use in a production environment ? Pin
Andrey Dryazgov14-Nov-10 11:59
professionalAndrey Dryazgov14-Nov-10 11:59 
Generalbig problem Pin
maorosh27-Oct-10 13:00
maorosh27-Oct-10 13:00 
GeneralRe: big problem Pin
Andrey Dryazgov27-Oct-10 13:22
professionalAndrey Dryazgov27-Oct-10 13:22 
Generaldecompress at silverlight Pin
reymond10-Aug-10 23:27
reymond10-Aug-10 23:27 
GeneralRe: decompress at silverlight Pin
Andrey Dryazgov11-Aug-10 1:35
professionalAndrey Dryazgov11-Aug-10 1:35 
Generalexception when decompressing large in memory data 4096 bytes Pin
Shalamander19-Jun-10 11:29
Shalamander19-Jun-10 11:29 
GeneralRe: exception when decompressing large in memory data 4096 bytes Pin
Andrey Dryazgov19-Jun-10 16:50
professionalAndrey Dryazgov19-Jun-10 16:50 
GeneralRe: exception when decompressing large in memory data 4096 bytes Pin
Shalamander20-Jun-10 16:12
Shalamander20-Jun-10 16:12 
this is the situation:

i call your code with this code:
RetVal = mci.Decompress(DataToDecompress, 0, 0, 0);
mci = memory compressor instance

now in your code it does this:

public unsafe byte[] Decompress(byte[] sourceBytes, int sourceIndex, int beforeGap, int afterGap)
{
if (sourceBytes == null)
MemCmprssrMCException.ThrowArgumentNullException("sourceBytes");
int byteCount;
fixed (byte* pSrcBytes = &sourceBytes[sourceIndex])
byteCount = *((int*)pSrcBytes);
if (byteCount < 0)
byteCount = -byteCount;
byte[] result = new byte[byteCount + beforeGap + afterGap];
if (byteCount != 0)
Decompress(sourceBytes, sourceIndex, result, beforeGap);
return result;
}


you call Decompress and you give it the result as the destinationBytes

now in the Decompress method we get this error:

at this
if (byteCount <= 0)
{
byteCount = -byteCount;
if (destinationBytes.Length - destinationIndex < byteCount)
MemCmprssrMCException.ThrowNoPlaceToStoreDecompressedDataException();
if (byteCount > 0)
Buffer.BlockCopy(sourceBytes, sourceIndex + 4, destinationBytes, destinationIndex, byteCount);
return byteCount;
}

in this line in the above block we get error:

Buffer.BlockCopy(sourceBytes, sourceIndex + 4, destinationBytes, destinationIndex, byteCount);
System.ArgumentException occurred
Message=Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
Source=mscorlib
ParamName=""
StackTrace:
at System.Buffer.BlockCopy(Array src, Int32 srcOffset, Array dst, Int32 dstOffset, Int32 count)
at *********Inflator.mci.Decompress(Byte[] sourceBytes, Int32 sourceIndex, Byte[] destinationBytes, Int32 destinationIndex) in Inflator.cs:line 572

the arguments are as follows:
sourceBytes = [1280] in length
sourceIndex = 0
destinationBytes = [1215565931]
destinationIndex = 0
bytesCount = 1215565931

can you fix it please?
GeneralRe: exception when decompressing large in memory data 4096 bytes Pin
Shalamander20-Jun-10 16:14
Shalamander20-Jun-10 16:14 
GeneralRe: exception when decompressing large in memory data 4096 bytes Pin
Shalamander20-Jun-10 18:45
Shalamander20-Jun-10 18:45 
GeneralRe: exception when decompressing large in memory data 4096 bytes Pin
Andrey Dryazgov20-Jun-10 18:42
professionalAndrey Dryazgov20-Jun-10 18:42 
GeneralRe: exception when decompressing large in memory data 4096 bytes Pin
Shalamander20-Jun-10 18:49
Shalamander20-Jun-10 18:49 
GeneralRe: exception when decompressing large in memory data 4096 bytes Pin
Andrey Dryazgov20-Jun-10 19:58
professionalAndrey Dryazgov20-Jun-10 19:58 
Generalwould be interesting to make this a drop-in replacement to DeflateStream. Pin
Cheeso22-Mar-09 10:37
Cheeso22-Mar-09 10:37 
GeneralRe: would be interesting to make this a drop-in replacement to DeflateStream. Pin
Tiago Freitas Leal9-Dec-09 15:40
professionalTiago Freitas Leal9-Dec-09 15:40 
GeneralLicense Pin
Caetano88820-Nov-07 14:28
Caetano88820-Nov-07 14:28 
GeneralRe: License Pin
Andrey Dryazgov20-Nov-07 23:02
professionalAndrey Dryazgov20-Nov-07 23:02 
GeneralRe: License Pin
Caetano88821-Nov-07 3:44
Caetano88821-Nov-07 3:44 
GeneralRe: License Pin
Jaime Olivares15-Jul-08 4:53
Jaime Olivares15-Jul-08 4:53 
GeneralRe: License Pin
Andrey Dryazgov16-Jul-08 4:31
professionalAndrey Dryazgov16-Jul-08 4:31 
QuestionSystem.OutOfMemoryException Pin
hemkumarsym5-Jun-07 3:41
hemkumarsym5-Jun-07 3:41 
AnswerRe: System.OutOfMemoryException Pin
Andrey Dryazgov5-Jun-07 3:59
professionalAndrey Dryazgov5-Jun-07 3:59 
GeneralRe: System.OutOfMemoryException Pin
hemkumarsym5-Jun-07 4:52
hemkumarsym5-Jun-07 4:52 
GeneralRe: System.OutOfMemoryException Pin
Andrey Dryazgov5-Jun-07 5:11
professionalAndrey Dryazgov5-Jun-07 5:11 
GeneralRe: System.OutOfMemoryException Pin
hemkumarsym5-Jun-07 19:55
hemkumarsym5-Jun-07 19:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.