Click here to Skip to main content
Click here to Skip to main content

MemoryStream Compression

By , 20 Jun 2004
 

Introduction

Hello, this is my first article on CodeProject. I have been a long time reader, and the CodeProject resource has been an endless supply of answers to many questions. After searching CodeProject, I found that the .NET section lacked any articles on compression, so I thought I would write this article.

SharpZipLib from ICSharpCode

First of all, this article depends on the SharpZipLib which is 100% free to use, in any sort of projects. Details on the license and download links are available here.

Purpose

A friend asked me to teach him C#.NET, and as a project to teach him, I decided to start writing a revision control system utilizing both server and client, we've both had our share of pitfalls with CVS. One of the features he wanted involved compression, so I sought out this library, but its documentation is sketchy unless you use it purely for an API reference. Also, the documentation only shows examples of file based compression. However, in our project, we wanted the ability to work in memory (with custom diff-type patches). Originally, I found this library on a forum that said this wasn't possible, but after digging into the library documentation, I found some Stream-oriented classes that looked promising. An hour or so of playing around, and this simple and short code was the result. Since the code is relatively short, I have not included any source or demo files to download. I hope someone finds this useful!

Compression

For convenience sake, we localize the namespaces IO, Text, and SharpZipLib:

using System;
using System.IO;
using System.Text;
using ICSharpCode.SharpZipLib.BZip2;

First of all, we'll start with compression. Since we're using MemoryStreams, let's create a new one:

MemoryStream msCompressed = new MemoryStream();

Simple enough, right? For this example, I will use BZip2. You can use Zip, or Tar, however, they require implementing a dummy FileEntry, which is extra overhead that is not needed. My choice of BZip2 over GZip comes from the experience that larger data can be compressed smaller, at the cost of a slightly larger header (discussed below).

Next, we create a BZip2 output stream, passing in our MemoryStream.

BZip2OutputStream zosCompressed = new BZip2OutputStream(msCompressed);

Pretty easy... Now however, is a good time to address the header overhead I mentioned above. In my practical tests, compressing a 1 byte string, rendered a 28 byte overhead from the headers alone when using GZip, plus the additional byte that could not be compressed any further. The same test with BZip2 rendered a 36 byte overhead from the headers alone. In practice, compressing a source file from a test project of 12892 bytes was compressed to 2563 bytes, about a 75% compression rate give or take my bad math, using BZip2. Similarly, another test revealed 730 bytes compressed to 429 bytes. And a final test, a 174 bytes compressed to 161 bytes.

Obviously, with any compression, the more data is available, the better the algorithm can compress patterns.

So with that little bit of theory out of the way, back to the code... From here, we start writing data to the BZip2OutputStream:

string sBuffer = "This represents some data being compressed.";


byte[] bytesBuffer = Encoding.ASCII.GetBytes(sBuffer);
zosCompressed.Write(bytesBuffer, 0, bytesBuffer.Length);
zosCompressed.Finalize();
zosCompressed.Close();

Pretty easy. As with most IO and stream methods, byte arrays are used instead of strings. So we encode our output as a byte array, then write it to the compression stream, which in turn compresses the data and writes it to the inner stream, which is our MemoryStream.

bytesBuffer = msCompressed.ToArray();
string sCompressed = Encoding.ASCII.GetString(bytesBuffer);

So now, the MemoryStream contains the compressed data, so we pull it out as a byte array and convert it back to a string. Note that this string is NOT readable, attempting to put this string into a textbox will render strange results. If you want to view the data, the way I did it was to convert it into a Base64 string, but this increases the size, anyone has any suggestions to that are welcome to comment. The result of running this specific code renders the 43 byte uncompressed data as 74 byte compressed data, and when encoded as a base 64 string, the final result is 100 characters as follows:

QlpoOTFBWSZTWZxkIpsAAAMTgEABBAA+49wAIAAxTTIxMTEImJhNNDIbvQ
                                  aWyYEHiwN49LdoKNqKN2C9ZUG5+LuSKcKEhOMhFNg=

Obviously, these are not desirable results. However, I believe the speed of which the library compresses short strings of data could be extended into a method which returns either a compressed or uncompressed string with a flag indicating which was more efficient.

Uncompression

Now of course, to test our code above, we need some uncompression code. I will put all the code together, since it's pretty much the same, just using a BZip2InputStream instead of a BZip2OutputStream, and Read instead of Write:

MemoryStream msUncompressed = 
    new MemoryStream(Encoding.ASCII.GetBytes(sCompressed));
BZip2InputStream zisUncompressed = new BZip2InputStream(msUncompressed);
bytesBuffer = new byte[zisUncompressed.Length];
zisUncompressed.Read(bytesBuffer, 0, bytesBuffer.Length);
zisUncompressed.Close();
msUncompressed.Close();
string sUncompressed = Encoding.ASCII.GetString(bytesBuffer);

Now, a quick check on sUncompressed should reveal the original string intact... No files involved, however, if you wanted to load a file, there are a few ways you can do it, and I leave it to your imagination.

Closing

Special thanks to the developers at ICSharpCode.Net for providing this awesome library free to the public which makes this article possible. I have no affiliation with ICSharpCode.Net, so I hope I have not breached anything in posting this article.

I hope you all find this as useful as I have!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Astaelan
Web Developer
Canada Canada
Member
Short and simple, I'm a self contracted programmer, my strongest programming skills are in C/C++ and C#/.NET. I have a nack for porting C algorithms to C#.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
AnswerRe: How to Zip more than one filememberchangcn21 Jun '04 - 14:46 
download sharpziplib, see the samples
AnswerRe: How to Zip more than one filememberMunkieFish26 Dec '06 - 7:39 
I know right? Why can't they just make it real simple?
Why should I have to deal with adding CRC checksums to the zip files and all that crap?
 
This is all I want:
 
ZipFile zipFile = new ZipFile();
zipFile.AddFile(memoryStreamToCompress, "ExampleFileName.xml");
zipFile.AddFile(byteArrayToCompress, "ExampleLogFile.txt");
 
MemoryStream compressedMemoryStream = zipFile.Save(CompressionLevel.Maximum);
 
I know the functionality is there... but common guys, make a better api. Think about making some common scenarios dead simple to do... and use Enums!
 
If anyone has a wrapper out there that can do what I listed above, I would love to have it. Otherwise, one of these days I'm going to have to sit down and do it myself.
 
Thanks!
-Jules
GeneralSecurity ExeceptionmemberEric Coll8 Jun '04 - 9:20 
When trying to decompress in an IE hosted control running on the intranet security zone, a security exception is thrown.
 
It's not possible to modify the client local security policy.
 
Why a 100% managed code decompression routine would throw a security exception?
 
Any thoughts?

GeneralRe: Security ExeceptionmemberAstaelan10 Jun '04 - 9:36 
I can only guess 1 of 2 things.
1) I assume you are using MemoryStream, but if you are not, and you are using a FileStream, then the security policy may well define (as is defaulted) to not allow certain code domains access to the filesystem. This is a known "Feature" of managed code, to prevent unauthorized access to files. The way to fix this, should be to digitally sign your code, and add the certificates to the client machines so they allow that program the access required. This assumes of course, for final projects since digital signature will change with each recompilation I believe.
2) Since the code runs, and it's only an exception, dig for the inner exception and see what is causing it. I'm sure you've stepped through the code, but try again. Have you tried running the code without being run through an IE hosted control to make sure you have it working properly? I've heard a few people report they couldn't get it working, but I'm certain all of those are end-user errors, because it works fine for me and quite a few others. The only reason I can suggest that an IE Hosted Control might throw a security exception is from attempting to access the filesystem without proper security policy. I can't help much more without detailed exception information and perhaps a chunk of code to review causing the issue.
 
Best of luck,
-Shane
GeneralRe: Security ExeceptionmemberEric Coll10 Jun '04 - 9:58 
The way the security works in the IE hosted controls is a little bit odd.
 
As soon as a function which calls another assembly is called, the framework loads the assembly and performs a security check for the entire assembly, regardless of what class is being used.
 
Notice that this check happens before entering the function which calls the assembly, and not before the call to a function in the assembly. So, not even the first line of the outer function is called. Something similar happens when trying to use remoting.
 
In this particular case some portion of the ICSharpZip uses file access functions, so the entire assembly fails to load on the IE hosted control.
 
The exception throw is a plain uninformative security exception without any message and without an inner exception.
 
Using the source code and making an assembly just with the compression and uncompression routines, works fine. But I don't know if that is legal.
 
Thank you for your help and fast response,
 
Eric.
 

GeneralRe: Security ExeceptionmemberAstaelan10 Jun '04 - 10:26 
Ahha, nice catch Eric. It never even occured to me that at a global level, they may reserve files, or open them strictly for the purpose of throwing the exception.
In my opinion, this is kinda strange, since the library works fine using MemoryStreams, there should not be a dependancy on any files.
Upon searching for a little more information now, I see that there is more information on this subject popping up. See here for something by someone else:
 
http://weblogs.asp.net/cfranklin/archive/2003/12/13/43355.aspx
 
Again, his code relies back on ICSharpZip library.
 
As far as the terms on the ICSharpZip library, here is something from their website:
 
"Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License cover the whole combination."
 
"As a special exception, the copyright holders of this library give you permission to link this library with independent modules to produce an executable, regardless of the license terms of these independent modules, and to copy and distribute the resulting executable under terms of your choice, provided that you also meet, for each linked independent module, the terms and conditions of the license of that module. An independent module is a module which is not derived from or based on this library. If you modify this library, you may extend this exception to your version of the library, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version."
 

In simplest terms, this means you can link the library as-is in commercial projects. If you choose to strip out functionality from the library to support what you require, then your project must be liscensed under the same work as ICSharpZip itself, which is the GNU GPL.
 
[edit]
After reading the terms closer, I think you can modify their code, strip out what you need, and still use it in commercial projects. To be exact: "If you modify this library, you may extend this exception to your version of the library, but you are not obligated to do so."
In that case, you're well within your rights to strip out the troublesome file IO, cut back to only the compression/decompression routines you require and probably gain a little speed dropping the excessive IO. You said it works, great job, sounds like a worthwhile effort to strip out and profile those routines and redistribute the library under the same liscense.
[edit]
 
If your work is a commercial project, in this specific case, I would contact the author of ICSharpZip, explain the situation, and offer a slimmed down, memory-only/efficient version that they may be willing to release as a DLL which can be linked in commercial projects. If it were me, I'd happily do it, because they have crippled their own library and not realized it because they probably don't use remote IE hosted controls.
 
Kudo's on finding the problem in their library. And thanks for also confirming the code otherwise works for you. A lot of people have experienced some strange problems that I haven't been able to replicate.
 
I have been thinking of writing a completely opensource public domain compression library. If you are interested in getting involved, I would appreciate someone with your experience on the matter. Feel free to email me.

GeneralSomething is wrongmemberdorutzu14 May '04 - 3:42 
Hi!
 
I tried your code (exactly), but something happens when I try to uncompress. It just fails at
 
BZip2InputStream zisUncompressed = new BZip2InputStream(msUncompressed);
 
Any idea what is going on? Maybe you could, after all, add some sample code in a complilable form, to see it really works.
 
Thanks!

GeneralRe: Something is wrongmemberAstaelan14 May '04 - 9:35 
Hello, I hope you got it working, but in case you haven't, I'd like to say that this code works as-is, if you copy/paste it. The code here is copied directly from a test program I was working on that still works right now. Of course, apply it to your own needs, but the code works fine. If you are having a problem with allocating new memory, I suggest encasing the problematic code into a try/catch block, and see what exception is being thrown.
If you still can't figure it out, I will write another test program and zip up the code for an example, and upload it. I have tested this on 2 machines, running the 1.1 framework and no problems... Catch your exception and post it, that would be much more helpful in tracking down the issue.
 
Some additional details would be helpful... Does the program stall before it fails on this line? Any warnings at all during the build? How big is the compressed/uncompressed data in question? Does compressing work for you? If you used another program to compress the data, what was it? (may need to contact the ziplib people if there is a bug in their library)
GeneralRe: Something is wrongmemberdorutzu14 May '04 - 11:47 
Yes, I did finally get it working. The problem was, I think, that the MemoryStream's position wasn't set at the beginning.
msUncompressed.Seek(0, SeekOrigin.Begin);

I changed it a lot since then, because I need it a little bit different, but I think this is the first change I did to make it work Wink | ;-)
 
Thanks for your quick answer.
 
Doru K.

GeneralRe: Something is wrongmemberOferBB2 Jun '04 - 0:24 
The problem appears in my code too!
I attached a simple program which its main button compresses a string, then decompresses it and displays it.
The program fails at the same line:
 
BZip2InputStream zisUncompressed = new BZip2InputStream(msUncompressed);
 
If it's running under a WinForms app, from the IDE, the app "hangs" and only Debug -> Stop Debugging helps, even Ctrl+Break doesn't.
If it's running under ASP.NET, the aspnet_wp process utilizes the CPU to its maximum.
 
Because I can't attach things in this forum, I uploaded my program to: http://www.pixiesoft.com/compression.zip so you can see it.
2 notes:
1) It's in VB.NET, but it's a good conversion from C#.
2) You need to update the reference to the SharpZipLib.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 21 Jun 2004
Article Copyright 2004 by Astaelan
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid