Click here to Skip to main content
11,411,840 members (65,434 online)
Click here to Skip to main content

MemoryStream Compression

, 20 Jun 2004
Rate this:
Please Sign up or sign in to vote.
MemoryStream based compression based on SharpZipLib.


Hello, this is my first article on CodeProject. I have been a long time reader, and the CodeProject resource has been an endless supply of answers to many questions. After searching CodeProject, I found that the .NET section lacked any articles on compression, so I thought I would write this article.

SharpZipLib from ICSharpCode

First of all, this article depends on the SharpZipLib which is 100% free to use, in any sort of projects. Details on the license and download links are available here.


A friend asked me to teach him C#.NET, and as a project to teach him, I decided to start writing a revision control system utilizing both server and client, we've both had our share of pitfalls with CVS. One of the features he wanted involved compression, so I sought out this library, but its documentation is sketchy unless you use it purely for an API reference. Also, the documentation only shows examples of file based compression. However, in our project, we wanted the ability to work in memory (with custom diff-type patches). Originally, I found this library on a forum that said this wasn't possible, but after digging into the library documentation, I found some Stream-oriented classes that looked promising. An hour or so of playing around, and this simple and short code was the result. Since the code is relatively short, I have not included any source or demo files to download. I hope someone finds this useful!


For convenience sake, we localize the namespaces IO, Text, and SharpZipLib:

using System;
using System.IO;
using System.Text;
using ICSharpCode.SharpZipLib.BZip2;

First of all, we'll start with compression. Since we're using MemoryStreams, let's create a new one:

MemoryStream msCompressed = new MemoryStream();

Simple enough, right? For this example, I will use BZip2. You can use Zip, or Tar, however, they require implementing a dummy FileEntry, which is extra overhead that is not needed. My choice of BZip2 over GZip comes from the experience that larger data can be compressed smaller, at the cost of a slightly larger header (discussed below).

Next, we create a BZip2 output stream, passing in our MemoryStream.

BZip2OutputStream zosCompressed = new BZip2OutputStream(msCompressed);

Pretty easy... Now however, is a good time to address the header overhead I mentioned above. In my practical tests, compressing a 1 byte string, rendered a 28 byte overhead from the headers alone when using GZip, plus the additional byte that could not be compressed any further. The same test with BZip2 rendered a 36 byte overhead from the headers alone. In practice, compressing a source file from a test project of 12892 bytes was compressed to 2563 bytes, about a 75% compression rate give or take my bad math, using BZip2. Similarly, another test revealed 730 bytes compressed to 429 bytes. And a final test, a 174 bytes compressed to 161 bytes.

Obviously, with any compression, the more data is available, the better the algorithm can compress patterns.

So with that little bit of theory out of the way, back to the code... From here, we start writing data to the BZip2OutputStream:

string sBuffer = "This represents some data being compressed.";

byte[] bytesBuffer = Encoding.ASCII.GetBytes(sBuffer);
zosCompressed.Write(bytesBuffer, 0, bytesBuffer.Length);

Pretty easy. As with most IO and stream methods, byte arrays are used instead of strings. So we encode our output as a byte array, then write it to the compression stream, which in turn compresses the data and writes it to the inner stream, which is our MemoryStream.

bytesBuffer = msCompressed.ToArray();
string sCompressed = Encoding.ASCII.GetString(bytesBuffer);

So now, the MemoryStream contains the compressed data, so we pull it out as a byte array and convert it back to a string. Note that this string is NOT readable, attempting to put this string into a textbox will render strange results. If you want to view the data, the way I did it was to convert it into a Base64 string, but this increases the size, anyone has any suggestions to that are welcome to comment. The result of running this specific code renders the 43 byte uncompressed data as 74 byte compressed data, and when encoded as a base 64 string, the final result is 100 characters as follows:


Obviously, these are not desirable results. However, I believe the speed of which the library compresses short strings of data could be extended into a method which returns either a compressed or uncompressed string with a flag indicating which was more efficient.


Now of course, to test our code above, we need some uncompression code. I will put all the code together, since it's pretty much the same, just using a BZip2InputStream instead of a BZip2OutputStream, and Read instead of Write:

MemoryStream msUncompressed = 
    new MemoryStream(Encoding.ASCII.GetBytes(sCompressed));
BZip2InputStream zisUncompressed = new BZip2InputStream(msUncompressed);
bytesBuffer = new byte[zisUncompressed.Length];
zisUncompressed.Read(bytesBuffer, 0, bytesBuffer.Length);
string sUncompressed = Encoding.ASCII.GetString(bytesBuffer);

Now, a quick check on sUncompressed should reveal the original string intact... No files involved, however, if you wanted to load a file, there are a few ways you can do it, and I leave it to your imagination.


Special thanks to the developers at ICSharpCode.Net for providing this awesome library free to the public which makes this article possible. I have no affiliation with ICSharpCode.Net, so I hope I have not breached anything in posting this article.

I hope you all find this as useful as I have!


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Web Developer
Canada Canada
Short and simple, I'm a self contracted programmer, my strongest programming skills are in C/C++ and C#/.NET. I have a nack for porting C algorithms to C#.

Comments and Discussions

GeneralRe: Security Exeception PinmemberEric Coll10-Jun-04 10:58 
GeneralRe: Security Exeception PinmemberAstaelan10-Jun-04 11:26 
GeneralSomething is wrong Pinmemberdorutzu14-May-04 4:42 
GeneralRe: Something is wrong PinmemberAstaelan14-May-04 10:35 
GeneralRe: Something is wrong Pinmemberdorutzu14-May-04 12:47 
GeneralRe: Something is wrong PinmemberOferBB2-Jun-04 1:24 
GeneralRe: Something is wrong Pinmemberdorutzu2-Jun-04 8:55 
GeneralRe: Something is wrong PinmemberAstaelan10-Jun-04 11:44 
I'm sorry I have not had a chance to respond sooner. I also do not have Visual Studio .NET currently available to me, so your code is lost on me at the moment.

However, someone else did have a similar problem, and what they found they had to do, was reset the memorystream back to the beginning, because they were reusing the same memory stream or something like that. Have you tried to catch any exceptions? It may be that when it hangs for you, it's actually producing an exception the OS isn't catching. However, in both cases it sounds like the SharpZipLib has gone into limbo trying to decompress a memorystream from an invalid offset.

Here is what I think is occuring.

1) You compress successfully to the memory stream.
2a) You do not pull the data out to a byte array
2b) You do not reset the memorystream
3) You attempt to uncompress and one of the following occurs:
4a) You attempt to uncompress the old memory stream using the stream, not the byte array
4b) You didn't reset the memorystream, so the uncompressed data is at the tail of the memorystream
4c) You reset the memorystream, and attempted to uncompress into the same memory stream
5) CPU in any of these cases, goes into limbo because Streams work based on dataavailable, and block syncronously until it's available otherwise. My guess is the SharpLib has been unable to properly decompress the data, and it's related to the memorystream being used incorrectly.
6) If all else fails, trap everything, don't reuse memory streams, create new objects for everything, expand your code to the fullest, and step through making sure data is intact at every point. This code has been confirmed by a number of people that it does work if you follow it closely and once it works, expand it to your needs.

Be aware, that another poster has informed me that there is a vanilla security exception thrown when attempting to load the CSharpZip library into an IE Hosted Control, as the library attempts to open files upon linking, before any calls to actual methods of the DLL. The gentleman has found a work around involving stripping minimal functionality from the library, we can only hope he's kind enough to rerelease the routines tailored for memory compression/decompression. Check the other posts here if you want to contact him.

GeneralRe: Something is wrong PinmemberOferBB15-Jun-04 7:33 
GeneralRe: Something is wrong PinmemberOferBB23-Jun-04 4:31 
GeneralRe: Something is wrong - Solution PinmemberTLangFromCodeProject24-Jun-04 12:33 
GeneralRe: Something is wrong - Solution PinmemberAstaelan25-Jun-04 3:07 
GeneralRe: Something is wrong PinmemberRandyY10-Nov-04 20:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.150414.5 | Last Updated 21 Jun 2004
Article Copyright 2004 by Astaelan
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid