Introduction
This article discusses string compression with optional decent encryption with pure VB.NET code, and no external tools required.
It can easily be integrated into existing projects. As the code is kept simple, it's suitable for beginners and a conversion to C# can be done easily.
Background
In need of a routine to quickly and safely deflate and inflate big strings, I searched the net for a solution. A comprehensive set of functionalities didn't show up, so I decided to write this class module, which encapsulates all the functionality needed to complete the task.
Using the Code
Although strings of any length can be applied to the process, the compression of short strings (i.e. 'Hello World!') is counterproductive as it results in even bigger compressed counterparts. The CompressionRatio property of the class tells you how effective the compression was. You can decide then, if you want to use the compressed string and if so, prefix and suffix can be automatically applied to it, to distinguish between compressed and uncompressed content afterwards.
Process overview:
Plain text -> to byte array -> gzip compression -> encryption -> to base64 string = shrinked text
shrinked text -> to byte array -> decryption -> gzip decompression -> to string = plain text
The code is simple to use. Here's the quick way to compress a string:
Dim CompStr As New clsCompressedString(System.Text.Encoding.UTF8)
CompStr.UnCompressed = "some large text content..."
MsgBox "The compressed string is: " & CompStr.Compressed
... and the way back:
Dim CompStr As New clsCompressedString(System.Text.Encoding.UTF8)
CompStr.Compressed = "..."
MsgBox "The uncompressed string is: " & CompStr.UnCompressed
Error handling is kept at the minimum. The class returns empty strings when fed with corrupt data or supplied with wrong passphrase.
Optional encryption is performed by utilizing the .NET built-in RijndaelManaged at maximum key length and simplified usage: You just need to provide a single passphrase for encryption and decryption. Encryption key and iv are generated based on the passphrase by using SHA256 and MD5 hash value generation.
The demo project shows all features available.
Points of Interest
With string conversions involved, text encoding has to be addressed properly. Otherwise some or all characters could get messed up in the process of compression/decompression, depending on what content you try to compress/decompress.
Why Not Use ICSharpCode.SharpZipLib?
Well, you can easily alter the compression routines in the class to use ZipLib. I experimented with that and it showed, that ZipLib (0.85.4.369) is only up to 7% more efficient than the built in GZip. To get this slight better performance, you have to set ZipLib to the highest compression level (9). But that comes with a price: ZipLib at highest level is very slow compared to GZip and therefore takes several times longer to compress a huge string. So I prefer GZip for this task as it is fast, reliable and doesn't require to link to additional DLLs and I don't run into licensing and security issues by using comprehensive third party code.
Preferences could possibly change when it comes to binary file compression. Maybe then ZipLib outruns GZip - but binary file compression was not the assigned task in this case.
History
- 1st July, 2008: This is the first version. Participate and help to optimize and extend the code.
| You must Sign In to use this message board. |
|
|
 |
|
|
 |
|
|
 |
|
 |
First of all, thank you very much for this code example! It helped me reducing dramatically the size of xml data sent over the web!!
I think I have found a small bug though:
In Private Sub Decompress():
Dim sizeBytes(3) As Byte
objMemStream.Position = objMemStream.Length - 5 objMemStream.Read(sizeBytes, 0, 4)
should read: objMemStream.Length - 4
because the last 4 bytes of the compressed string hold the length of the uncompressed input string.
Usually the code will work nevertheless for small strings by using the "wrong" bytes because it is likely that a much bigger size of the output byte array will be calculated since the 5th-last byte is unlikely to be 0.
Regards,
Florian
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Fridolin, This is true. Otherwise your decompressed string will be much larger than the original because it will be padded at the end with null bytes ('\0'). I'm not sure how it affects this program because I noticed this in a modified version of the decompress function.
Regardless, thanks to BTDex for providing this instructive code.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
 | Thanks!  Member 1951610 | 21:37 2 Jun '09 |
|
 |
I had my compression and decompression methods written already. I thought they were working flawlessly then I had an issue with the encoding. I found your example very helpful, and I like the idea of the encryption option as well. The key that I was missing was the System.Convert.ToBase64String. That was the trick to make my code work in the environment I was trying to use it.
Thanks 
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
 | Paradox  Mr.PoorEnglish | 0:29 18 Jul '08 |
|
 |
AFAIK is String a bad type to hold compressed data. Each Character takes 2 bytes. And Base64-String only knows 64 different Characters, which blows up the datasize one more time about factor 1.5. better keep the compressed data as unread stream. u can read from it (for large data using a buffer is recommended), when necessary. What can u do with the "compressed" Base64String? U can convert it to Byte(), then decompress. Hmm - why not hold the bytes at once?
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
 |
The subject of this article is to shrink a large sequence of characters into to a smaller one without loosing the information content of the primal source. What you state is correct and I agree - but you're missing the point in this case.
If you're working with xml then string is your primary input and output data type. Within your application data can be held in any format. But when it comes to data exchange via xml you'll have to convert to string in the end.
In practice I use this type of conversion to place xml serialized .net datasets as elements into xml documents. That way I secure and preserve the serialized datasets during their journey through different processes and data channels outside my influence. The prefix/suffix feature is a heritage of that.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
BTDex wrote: But when it comes to data exchange via xml you'll have to convert to string in the end.
Yes. But I think, I would do the data-exchange with the compressed streams directly. Not convert them to a base64-string. That means, the sender writes its data into a encryption-stream, the encryptionstream into a compressStream, the compressStream into a NetWorkStream. The receiver loads its XmlDocument from a DecryptionStream, which reads from a DecompressStream, which reads from a NetWorkStream.
Here a sample, how one can stick streams together, for to get such transformer-behaviour: Private Sub btZip_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btZip.Click Using ReadStream As New FileStream("Test.doc", FileMode.Open), _ WriteStream As New FileStream("Test.Zip", FileMode.Create), _ Zipper As New GZipStream(WriteStream, CompressionMode.Compress) ReadStream.WriteTo(Zipper) End Using End Sub
Private Sub btUnZip_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btUnZip.Click Using ReadStream As New FileStream("Test.Zip", FileMode.Open), _ UnZipper As New GZipStream(ReadStream, CompressionMode.Decompress), _ WriteStream As New FileStream("Test2.doc", FileMode.Create) UnZipper.WriteTo(WriteStream) End Using End Sub <extension()> _ Public Sub WriteTo( _ ByVal ReadStream As Stream, _ ByVal WriteStream As Stream, _ Optional ByVal BytesToRead As Long = -1, _ Optional ByVal Bufsize As Integer = Byte.MaxValue) Dim Buf(Bufsize - 1) As Byte If BytesToRead < 0 Then If ReadStream.CanSeek Then BytesToRead = ReadStream.Length Else 'ReadStream.Length - Property not available Do Bufsize = ReadStream.Read(Buf, 0, Bufsize) WriteStream.Write(Buf, 0, Bufsize) Loop Until Bufsize < Buf.Length Return End If End If Dim Sum = 0L Do Dim Portion = ReadStream.Read(Buf, 0, CInt(Math.Min(BytesToRead, Bufsize))) BytesToRead -= Portion WriteStream.Write(Buf, 0, Portion) Loop Until BytesToRead = 0 End Sub I think, one could apply that principle also to EncryptionStreams and NetworkStreams (which may be a little more complicated).
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
I cannot see how you want to improve the compressed size of specific xml elements within xml files by combining streams the way you suggest it. If the precondition is to produce or to extend xml files by not violating xml standards, then all the (binary) streaming must result in text somehow as xml is a text only based data format.
Compressing the xml file in total and sending it over a network stream is of no use to me, as my xml files are exchanged between different applications written in different languages passing various filtering and transforming procedures (on- and offline). Sticking to the rules that apply to xml therefore is a must to ensure interoperability between the platforms.
But let's get back to this article:
The task here was long string in and short string out by preserving all the information of the original long string. Safe encryption included as the secondary task.
So how can we use your code/idea to improve the efficiency of the solution of the specific task in this article?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Sorry, sometimes (or some more times ) I'm slow in understanding. Yeah, if the receiver may be a not framework-based application I agree with your approach.
Regards
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Thx BT for this post and your help.
Have a nice one!
"Nothing is lost, Nothing is created, Everything is transformed" Lavoisier
http://wlwilliamsiv.com
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|