Click here to Skip to main content
15,888,816 members
Articles / Operating Systems / Windows
Article

QuickLZ Pure C# Port

Rate me:
Please Sign up or sign in to vote.
4.73/5 (16 votes)
23 Dec 20064 min read 129.5K   1.2K   33   49
This article describes a C# port of the new QuickLZ compression algorithm

Introduction

It seems there is no end to the porting of good C/C++ algorithms to .NET, and so here comes another one.  This algorithm comes courtesy of Lasse Mikkel Reinhold, and is called QuickLZ.

You can read more about QuickLZ at the website, however I will outline some of the aspects of QuickLZ as I go.

What IS QuickLZ?

It comes as no surprise if you haven't heard of QuickLZ, it is a new compression algorithm particularily well suited for psuedo streaming (packet compression).  It was originally released early November 2006, and updated to 1.10 later that month.  The porting is based on 1.10, which as of the writing of this article boasts an incredible 20-30% improvement over other well known algorithms such as LZO1X and LZF.  Complete benchmark information on the C implementation can be found at the above linked website, however here is an overall assessment:

QuickLZ 1.10LZO 1X 2.02LZP-1LZF 1.6ZLIB-1 1.2.3
Compressed Size61.0%60.2%59.9%60.7%46.4%
Compression MB/Sec14881.860.460.97.45
Decompression MB/Sec38030789.3198120


If you are reading this article, there is a chance you also read my article including a port of MiniLZO.  Based on the numbers here, it's easy to see why LZO was my previous choice.  However, QuickLZ is now 19.2% faster on average than LZO on decompression, and an astonishing 44.7% faster on compression.  And approximately 0.8% less efficient on final compression.  This large improvement by QuickLZ made it a prime candidate for porting to .NET and while it doesn't totally replace my minilzo port, it does give a better alternative for those who don't need the LZO format.

Things To Remember

While I was porting the code, which was incredibly clear and easy to port, I noticed a number of areas for improvement.  This specific iteration of the port is a first revision and is based on the optimizations for better ratio working with data > 100kb in size.  A more refined and "Minimalistic" version of the algorithm is possible, and may come as a complimentary article specifically regarding packet compression.  Meanwhile, this class serves as a general use class for both files and packets that should outperform other managed options.  I have not profiled this port to C# whatsoever, and I know that for simplicity of the public interface there is some additional overhead.  If you must make the Unsafe versions public and use them, make sure you understand the code before you cry to me about it not working :)

A comment made during the port of MiniLZO was how the original uncompressed size was not easy to attain from the compressed data, so prepending the size was requested but would ultimately break compatibility with the original LZO1X implementation.  Based on the standard implementation of QuickLZ, I have provided a public method to find the original uncompressed size quickly and painlessly.

Without Further Ado...

So let's look at how to use the safe(?) interface to the implementation.  All source examples are psuedo, and assume knowledge of basic C# operations.

byte[] data = File.ReadAllBytes("...");
byte[] compressed = QuickLZ.Compress(data, 0, (UInt32)data.Length);

This little snippet shows the basic idea of obtaining some data, in this case loading it from a file, and passing it into the compression method, which takes a Byte[] for the buffer to read from, a starting offset within the buffer to start at, and the length from the starting point to include in compression.  This is useful if you need to work with a packet, where packet headers may not be included in the compression/decompression routines.  The return value is an exact size Byte[] stripped of the extra scratch space allocated during compression.  It should be noted that QuickLZ imposes a requirement that during compression the destination buffer is required to be the length of the original data plus an additional 36000 bytes for temporary workspace.  This implementation copies the temporary destination buffer into an exact size buffer for final return.

UInt32 size = QuickLZ.GetDecompressedSize(data, 0);

This little snippet here shows how you can manually obtain the size of the compressed data out of the Byte[] directly.  The second argument, again, reflects where the actual compressed data headers begin.  It should be noted that this method is never really needed, but is provided for completion.

byte[] decompressed = QuickLZ.Decompress(compressed, 0);

This last snippet is fairly self explanitory by this point. It takes a Byte[] for the compressed data, and a starting point once again.  It returns an exact size buffer like Compress, however does not have the added overhead of copying from a temporary destination, as the exact uncompressed size is known ahead of time, which is why the GetDecompressedSize method is not required.

Conclusion

QuickLZ as a new alternative to the compression scene and seems to currently be the fastest within this category of compression.  I hope you will find this port useful, and as always if you find any bugs or bottlenecks that degrade performance please let me know.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Canada Canada
Short and simple, I'm a self contracted programmer, my strongest programming skills are in C/C++ and C#/.NET. I have a nack for porting C algorithms to C#.

Comments and Discussions

 
GeneralCompression Pin
waheedahmedb31-Mar-07 4:45
waheedahmedb31-Mar-07 4:45 
GeneralRe: Compression Pin
Astaelan31-Mar-07 8:30
Astaelan31-Mar-07 8:30 
GeneralWell done Pin
ronnotel16-Jan-07 3:47
ronnotel16-Jan-07 3:47 
GeneralThis should be a stream implementation to follow .NET semantics. Pin
Dale Thompson8-Jan-07 5:23
Dale Thompson8-Jan-07 5:23 
GeneralRe: This should be a stream implementation to follow .NET semantics. Pin
Ri Qen-Sin13-Jan-07 18:16
Ri Qen-Sin13-Jan-07 18:16 
Generalversion without unsafe code Pin
Colby Dillion27-Dec-06 20:42
professionalColby Dillion27-Dec-06 20:42 
GeneralRe: version without unsafe code Pin
Astaelan28-Dec-06 8:38
Astaelan28-Dec-06 8:38 
GeneralRe: version without unsafe code Pin
Colby Dillion28-Dec-06 10:48
professionalColby Dillion28-Dec-06 10:48 
Guessing about optimization is one of those 7 deadly sins isn't it? It will always come back to bite you. Wink | ;)

Before you read any further, please note that I am not comparing code, algorithms, or "real world" situation tests. I am only attempting (very poorly) to make a point that array access is very rarely significantly slower than using unsafe pointers. As I interpret it, an unsafe pointer is accessing managed memory using a C style pointer without the bounds and possibly type checking of array access.

Don't be nearly so quick to write off the capabilities of the JIT optimizer, it has surprised me (and made me feel dumb) on more than one occasion. I have found that it handles array access very well and only does checks as necessary. I've ported lots of C code to C# and there have only been one or two situations that using unsafe methods proved to be faster than the array access that I was able to come up with. Just to fight boredom over my lunch break I ran a few test using Stanislav Panasik's test app, modified to fixed a bug where the reading FileStream was not being closed and disposed of in a timely manner. I renamed my class to QuickLZ2 and uncommented the following lines 1 at a time in Tester.cs.

Byte[] ResultBytes = QuickLZ.Compress(SourceBytes, 0, (uint)tr.lSourceBytes);
Byte[] ResultBytes = QuickLZ2.Compress(SourceBytes);
Byte[] ResultBytes = QuickLZ.Decompress(SourceBytes, 0);
Byte[] ResultBytes = QuickLZ2.Decompress(SourceBytes);


I ran 3 series of tests against a couple of binaries I had on my machine. Code changes were limited to changing a single line of code for each test. All were done using a Release mode build and I restarted my machine before each test. There was no other software running at the time, background or foreground, other than the normal Windows processes. All that said, this is a pretty rudimentary test and in no way conclusive or at all scientific.

Before each series I ran the Compress/Decompress code 3 times. To get my measurements I ran the test 14 times, dropped the 2 highest and 2 lowest scores and averaged the remaining 10. I checked to ensure that the compression ratios were the same for both versions of code. This involved modifying the sequence length variable in my code to match yours. Sorry for the difficulty formatting the data.

			Array Access			Unsafe Pointers
		Ratio	Compression	Decompression	Compression	Decompression
  412Kb		70.08%	48.45Mbps	68.45Mbps	32.51Mbps	70.23Mbps
 7.27Mb		65.87%	48.33Mbps	63.32Mbps	33.74Mbps	67.39Mbps
18.64Mb		58.95%	45.35Mbps	59.19Mbps	37.43Mbps	65.59Mbps


As you can see, at least on these 3 binary application files, compression was always faster using the "slow" array access code. For some reason, which I still point the finger at the memcpy_up method, decompression was always slower. After looking at your code, the final array copy into the destination buffer of the Compress method could be smudging the results slightly. Still, I am doing an Array.Resize at the end of my Compress method, so this should not be a deciding factor for the results that I gathered.

Really the test needs to be run again using packet sized data just to see the results of that test. Unfortunately it is hard to find data of that size without generating it, which would reflect a real world situation even less than these tests. I see a few possible hot spots glaring in your code, but I think I made my point already that array access is not always slower than pointer access, at least by any significant margin, and in many cases it can be faster.

As a side note, I ran the last series of tests again using Debug mode. The array access code was faster by a very significant margin for compression and decompression both. Makes you wonder what kind of magic actually happens behind the scenes. I have been wanting to finish an article that I started some time ago comparing array access to unsafe pointers, maybe now would be a good time seeing as how I have two comparable code bases to work with.

I must say that reading your response to my earlier message was a true pleasure and that I look forward to corresponding further.

Colby
GeneralRe: version without unsafe code Pin
Astaelan28-Dec-06 12:13
Astaelan28-Dec-06 12:13 
GeneralRe: version without unsafe code Pin
Colby Dillion28-Dec-06 15:53
professionalColby Dillion28-Dec-06 15:53 
GeneralRe: version without unsafe code Pin
Astaelan1-Jan-07 1:04
Astaelan1-Jan-07 1:04 
GeneralRe: version without unsafe code Pin
Colby Dillion1-Jan-07 9:05
professionalColby Dillion1-Jan-07 9:05 
GeneralRe: version without unsafe code Pin
Astaelan1-Jan-07 9:23
Astaelan1-Jan-07 9:23 
QuestionIs there a way to detect Network Printers' IP addresses? Pin
Jan Palmer25-Dec-06 21:03
Jan Palmer25-Dec-06 21:03 
NewsCompression ratio for small data sources Pin
rlasse25-Dec-06 8:48
rlasse25-Dec-06 8:48 
GeneralRe: Compression ratio for small data sources Pin
Astaelan25-Dec-06 17:15
Astaelan25-Dec-06 17:15 
GeneralInteresting Pin
Stanislav Panasik24-Dec-06 22:21
Stanislav Panasik24-Dec-06 22:21 
GeneralRe: Interesting Pin
Septimus Hedgehog24-Dec-06 23:43
Septimus Hedgehog24-Dec-06 23:43 
GeneralRe: Interesting Pin
Stanislav Panasik25-Dec-06 5:32
Stanislav Panasik25-Dec-06 5:32 
GeneralRe: Interesting Pin
Astaelan25-Dec-06 6:42
Astaelan25-Dec-06 6:42 
AnswerRe: Interesting Pin
Stanislav Panasik25-Dec-06 7:06
Stanislav Panasik25-Dec-06 7:06 
GeneralRe: Interesting Pin
Astaelan25-Dec-06 7:19
Astaelan25-Dec-06 7:19 
GeneralRe: Interesting Pin
Stanislav Panasik25-Dec-06 23:22
Stanislav Panasik25-Dec-06 23:22 
GeneralRe: Interesting Pin
Ri Qen-Sin13-Jan-07 18:20
Ri Qen-Sin13-Jan-07 18:20 
GeneralRe: Interesting Pin
rlasse25-Dec-06 9:36
rlasse25-Dec-06 9:36 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.