Click here to Skip to main content
15,868,440 members
Articles / Programming Languages / C#
Article

MemoryStream Compression

Rate me:
Please Sign up or sign in to vote.
3.43/5 (36 votes)
20 Jun 20044 min read 308.6K   2.8K   54   42
MemoryStream based compression based on SharpZipLib.

Introduction

Hello, this is my first article on CodeProject. I have been a long time reader, and the CodeProject resource has been an endless supply of answers to many questions. After searching CodeProject, I found that the .NET section lacked any articles on compression, so I thought I would write this article.

SharpZipLib from ICSharpCode

First of all, this article depends on the SharpZipLib which is 100% free to use, in any sort of projects. Details on the license and download links are available here.

Purpose

A friend asked me to teach him C#.NET, and as a project to teach him, I decided to start writing a revision control system utilizing both server and client, we've both had our share of pitfalls with CVS. One of the features he wanted involved compression, so I sought out this library, but its documentation is sketchy unless you use it purely for an API reference. Also, the documentation only shows examples of file based compression. However, in our project, we wanted the ability to work in memory (with custom diff-type patches). Originally, I found this library on a forum that said this wasn't possible, but after digging into the library documentation, I found some Stream-oriented classes that looked promising. An hour or so of playing around, and this simple and short code was the result. Since the code is relatively short, I have not included any source or demo files to download. I hope someone finds this useful!

Compression

For convenience sake, we localize the namespaces IO, Text, and SharpZipLib:

C#
using System;
using System.IO;
using System.Text;
using ICSharpCode.SharpZipLib.BZip2;

First of all, we'll start with compression. Since we're using MemoryStreams, let's create a new one:

C#
MemoryStream msCompressed = new MemoryStream();

Simple enough, right? For this example, I will use BZip2. You can use Zip, or Tar, however, they require implementing a dummy FileEntry, which is extra overhead that is not needed. My choice of BZip2 over GZip comes from the experience that larger data can be compressed smaller, at the cost of a slightly larger header (discussed below).

Next, we create a BZip2 output stream, passing in our MemoryStream.

C#
BZip2OutputStream zosCompressed = new BZip2OutputStream(msCompressed);

Pretty easy... Now however, is a good time to address the header overhead I mentioned above. In my practical tests, compressing a 1 byte string, rendered a 28 byte overhead from the headers alone when using GZip, plus the additional byte that could not be compressed any further. The same test with BZip2 rendered a 36 byte overhead from the headers alone. In practice, compressing a source file from a test project of 12892 bytes was compressed to 2563 bytes, about a 75% compression rate give or take my bad math, using BZip2. Similarly, another test revealed 730 bytes compressed to 429 bytes. And a final test, a 174 bytes compressed to 161 bytes.

Obviously, with any compression, the more data is available, the better the algorithm can compress patterns.

So with that little bit of theory out of the way, back to the code... From here, we start writing data to the BZip2OutputStream:

C#
string sBuffer = "This represents some data being compressed.";


byte[] bytesBuffer = Encoding.ASCII.GetBytes(sBuffer);
zosCompressed.Write(bytesBuffer, 0, bytesBuffer.Length);
zosCompressed.Finalize();
zosCompressed.Close();

Pretty easy. As with most IO and stream methods, byte arrays are used instead of strings. So we encode our output as a byte array, then write it to the compression stream, which in turn compresses the data and writes it to the inner stream, which is our MemoryStream.

C#
bytesBuffer = msCompressed.ToArray();
string sCompressed = Encoding.ASCII.GetString(bytesBuffer);

So now, the MemoryStream contains the compressed data, so we pull it out as a byte array and convert it back to a string. Note that this string is NOT readable, attempting to put this string into a textbox will render strange results. If you want to view the data, the way I did it was to convert it into a Base64 string, but this increases the size, anyone has any suggestions to that are welcome to comment. The result of running this specific code renders the 43 byte uncompressed data as 74 byte compressed data, and when encoded as a base 64 string, the final result is 100 characters as follows:

QlpoOTFBWSZTWZxkIpsAAAMTgEABBAA+49wAIAAxTTIxMTEImJhNNDIbvQ
                                  aWyYEHiwN49LdoKNqKN2C9ZUG5+LuSKcKEhOMhFNg=

Obviously, these are not desirable results. However, I believe the speed of which the library compresses short strings of data could be extended into a method which returns either a compressed or uncompressed string with a flag indicating which was more efficient.

Uncompression

Now of course, to test our code above, we need some uncompression code. I will put all the code together, since it's pretty much the same, just using a BZip2InputStream instead of a BZip2OutputStream, and Read instead of Write:

C#
MemoryStream msUncompressed = 
    new MemoryStream(Encoding.ASCII.GetBytes(sCompressed));
BZip2InputStream zisUncompressed = new BZip2InputStream(msUncompressed);
bytesBuffer = new byte[zisUncompressed.Length];
zisUncompressed.Read(bytesBuffer, 0, bytesBuffer.Length);
zisUncompressed.Close();
msUncompressed.Close();
string sUncompressed = Encoding.ASCII.GetString(bytesBuffer);

Now, a quick check on sUncompressed should reveal the original string intact... No files involved, however, if you wanted to load a file, there are a few ways you can do it, and I leave it to your imagination.

Closing

Special thanks to the developers at ICSharpCode.Net for providing this awesome library free to the public which makes this article possible. I have no affiliation with ICSharpCode.Net, so I hope I have not breached anything in posting this article.

I hope you all find this as useful as I have!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Canada Canada
Short and simple, I'm a self contracted programmer, my strongest programming skills are in C/C++ and C#/.NET. I have a nack for porting C algorithms to C#.

Comments and Discussions

 
GeneralRe: Bad uncompression example Try This Pin
Jeffrey Scott Flesher30-Nov-05 19:08
Jeffrey Scott Flesher30-Nov-05 19:08 
GeneralRe: Bad uncompression example Try This Pin
mindphaser2k518-Jul-08 14:21
mindphaser2k518-Jul-08 14:21 
QuestionHow to Zip more than one file Pin
edsanfor2311-Jun-04 13:06
edsanfor2311-Jun-04 13:06 
AnswerRe: How to Zip more than one file Pin
changcn21-Jun-04 14:46
changcn21-Jun-04 14:46 
AnswerRe: How to Zip more than one file Pin
MunkieFish26-Dec-06 7:39
MunkieFish26-Dec-06 7:39 
GeneralSecurity Exeception Pin
Eric Coll8-Jun-04 9:20
Eric Coll8-Jun-04 9:20 
GeneralRe: Security Exeception Pin
Astaelan10-Jun-04 9:36
Astaelan10-Jun-04 9:36 
GeneralRe: Security Exeception Pin
Eric Coll10-Jun-04 9:58
Eric Coll10-Jun-04 9:58 
The way the security works in the IE hosted controls is a little bit odd.

As soon as a function which calls another assembly is called, the framework loads the assembly and performs a security check for the entire assembly, regardless of what class is being used.

Notice that this check happens before entering the function which calls the assembly, and not before the call to a function in the assembly. So, not even the first line of the outer function is called. Something similar happens when trying to use remoting.

In this particular case some portion of the ICSharpZip uses file access functions, so the entire assembly fails to load on the IE hosted control.

The exception throw is a plain uninformative security exception without any message and without an inner exception.

Using the source code and making an assembly just with the compression and uncompression routines, works fine. But I don't know if that is legal.

Thank you for your help and fast response,

Eric.


GeneralRe: Security Exeception Pin
Astaelan10-Jun-04 10:26
Astaelan10-Jun-04 10:26 
GeneralSomething is wrong Pin
dorutzu14-May-04 3:42
dorutzu14-May-04 3:42 
GeneralRe: Something is wrong Pin
Astaelan14-May-04 9:35
Astaelan14-May-04 9:35 
GeneralRe: Something is wrong Pin
dorutzu14-May-04 11:47
dorutzu14-May-04 11:47 
GeneralRe: Something is wrong Pin
OferBB2-Jun-04 0:24
OferBB2-Jun-04 0:24 
GeneralRe: Something is wrong Pin
dorutzu2-Jun-04 7:55
dorutzu2-Jun-04 7:55 
GeneralRe: Something is wrong Pin
Astaelan10-Jun-04 10:44
Astaelan10-Jun-04 10:44 
GeneralRe: Something is wrong Pin
OferBB15-Jun-04 6:33
OferBB15-Jun-04 6:33 
GeneralRe: Something is wrong Pin
OferBB23-Jun-04 3:31
OferBB23-Jun-04 3:31 
GeneralRe: Something is wrong - Solution Pin
TLang_Insyst24-Jun-04 11:33
TLang_Insyst24-Jun-04 11:33 
GeneralRe: Something is wrong - Solution Pin
Astaelan25-Jun-04 2:07
Astaelan25-Jun-04 2:07 
GeneralRe: Something is wrong Pin
RandyY10-Nov-04 19:01
RandyY10-Nov-04 19:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.