|
As I mentioned, I have modified the stock example and it produces better results.
Why do you keep posting results that use the original algorithm?
That is clearly slower and less effective than your algorithm.
Let me give you a hint: Re-write your algorithm in C++/CLI because it can be much faster.
Then, use it from C#. You'll probably get better results too.
Regards
NT
|
|
|
|
|
You just wrote "So I can't back your claim that QLZ outcompetes LZF, as it has a worse compression ratio.".
Assuming that the official LZF C# version and your CLI version produces the same output, let's agree that it was clearly wrong. Any compression algorithm has anomalies like your .doc file. Stop comparing compression ratios based on just a single file.
As for speed, I won't argue against that your CLI version performs better than the official LZF C# version (although still slower than QLZ C#). I even have an old assembler version of 1.00 that outperforms the C version of 1.40 in speed...
|
|
|
|
|
Yes but C++/CLI is consumable from C# as simply as C# DLLs, whereas assembler generally isn't.
You're right to say that I compare only 1 file, but unlike you, I don't have time to run as many benchmarks.
I'm merely providing a pointer that others may want to follow if they read this article.
Anyway, great job on QuickLZ and thanks for sharing it.
Regards
NT
|
|
|
|
|
It all depends on what you're compressing. For example, a video file compresses poorly because it's already compressed. If you're only compressing .doc files, then sure, why not go with LZF? But if you're compressing text files (or anything that is text encoded, such as .html and .xml files), which I think was what the first test was using, then you'd want to go with QuickLZ.
|
|
|
|
|
I have just ported level 1 of version 1.40 and a beta is available on www.quicklz.com.
It's clean C# not using unsafe code, and its compression speed is a little faster (20-30%) than this port.
|
|
|
|
|
How can I compress huge file (over 2gig) using this library?
It seems that there is only one method that take the entire file as parameter and add the header.
Is there something that could be done to allow multiple read/write like the built-in compression stream for zip.
|
|
|
|
|
What can i do to make it works with a pocket pc
|
|
|
|
|
|
This port assumes that unaligned memory access is allowable if the processor is little endian.
For little endian processors that disallow unaligned access (such as Hitachi SH4), simply invert all occurences of BitConverter.IsLittleEndian, that is, replace it with !BitConverter.IsLittleEndian.
|
|
|
|
|
I tried this class to compress a stream of random bytes. I started with 3KB and tested random sizes all the way to 10MB and surprisingly not on a single occasion did the size decrease.
It seems to work better with higher sizes, from 3KB - 40KB it was almost doubling the size after "compressing" it, but somewhere close to 10MB it would just add a hundred bytes to it, in all cases it never compressed.
one
|
|
|
|
|
In all fairness, high entropy random bytes generally cannot be compressed. However, if you feel your data SHOULD be compressable, then please refer to www.quicklz.com, and see if the original C code does any better. If you achieve the same final results, then it's safe to assume that QuickLZ is not the algorithm for you. It should be noted that the algorithm QuickLZ uses does potentially bloat high entropy data beyond it's normal size due to having to keep a dictionary and header information.
If you have suggestions, please talk to the original author who may be able to assist in improving the algorithm for your needs. Generally speaking the algorithm works quite well and quickly, but as with most compression routines (ZIP inclusive), high entropy data (random, previously compressed, etc) will always have a very low compression rate, if not increase the size due to overhead in the compression algorithm itself. On a side note you might try my minilzo C# port to see if it suits your needs any better.
I'd like to also note that I'm not really supporting this project anymore, as someone else seemed to have been working on one as well and was achieving similar results to my unsafe code with his fully managed solution.
Best of luck.
|
|
|
|
|
Thanks for the tight routine - turned out to be very useful in improving performance for transmission bottleneck in my application. Very much appreciated!
|
|
|
|
|
Just a suggestion that any .NET compression implementation follow the same semantics as the framkework. This should be modeled after the stream, like GZipStream and DeflateStream classes.
Dale Thompson
|
|
|
|
|
Hmm… This oculd be plain simple, or a challenge. I'll give it a shot.
ROFLOLMFAO
|
|
|
|
|
I happened to run across the QuickLZ algorithm about a week ago. I have a bad habit of rewriting somewhat obscure code in C# (good practice?) and so I spent a few hours writing up a static class. Today a friend pointed me to your article and so I thought that I would share my code. My version is a straight forward rewrite of quick.c but it does not use unsafe methods or pointers and could be used in a trusted environment.
The main thing holding back the decompression method is the memcpy_up method. I tried using Buffer.BlockCopy and Array.Copy but they really bork things up. The actual method call to one of those may be more costly than a short byte copy loop anyways. Most of the copies seemed to be 3 or 4 bytes long so I wrote unrolled versions for those cases which helped tremendously. I did the same for the FastRead and FastWrite methods as the JIT wasn't really optimizing away the switch statements. That turned out to be a negligible performance gain.
It is almost sad to think that I wasted 3 more hours of my life on code that I will never use. It is even more sad that I would never be able to use it because of the GPL license. Sad sad.
Anyways, take a look. I would love to hear your thoughts and suggestions. I haven't taken the time to do any benchmarks (just basic profiling) but I would love to see how it stacks up against the unsafe code.
http://www.metadojo.com/code/evilgpl/quicklz.zip
Colby
|
|
|
|
|
Greetings Colby,
Thanks for your interest, no matter what you do with C#/.NET, or GPL code in general, it is rarely a waste of time, especially when we're talking hours and not days.
"Safe" or managed .NET code for a situation like this is almost unbearable. That is to say, without the benefit of pointer arithmetic, you are adding a lot more overhead than an unsafe solution does.
Let me be clear about something (and I only found this out somewhat recently myself), that unsafe code does not mean you are forced to write your own code unsafe which uses the unsafe port I wrote. To elaborate, .NET allows you to write unsafe DLL's and basically, once compiled into an assembly, they are wrapped in a construct that makes them appear "safe" to any other software which references your unsafe DLL.
While I have not had a chance to look at your code yet (being Christmas and all, I'm kinda busy), I will state that a 100% safe and managed solution will end up considerably slower as a result of boundry checking, and other stuff that is not applicable to unsafe and fixed pointers. For example, garbage collection is occuring a lot more frequently in managed code than in the unmanaged version due to "fixed" pinning the buffer in memory. Since it pins it during the entire compress and decompress process, little garbage collection or memory defragmenting occurs.
I wouldn't care to guess how the managed solution holds up not just under profiling, but actual uses. The reason I ported QuickLZ was the high performance, and even my unsafe port has only seen approximately 1/3rd of the C code's speed. But I would guess that fully managed code would further degrade that perfomance considerably.
I have, since the writing of this article, with the help of the original author, fine tuned and rewritten parts of the algorithm to be better suited towards packet compression. Smaller headers, optimized for smaller data subsets, and removed some of the code that the original author mentioned was "added overhead" for smaller packets.
Anyway, the reason I ported it was out of boredom, I didn't see it as a waste of time, even if I never personally use the code... I know someone will, and perhaps your pure managed solution will be the right one for someone who doesn't want to mark their code unsafe, and doesn't want to use my port as an external DLL. A pure managed solution, while without a doubt slower, may fill a small void, so don't be too discouraged. Thanks for giving us all a link, and don't give up porting that archaic C code, I'm in the same boat
|
|
|
|
|
Guessing about optimization is one of those 7 deadly sins isn't it? It will always come back to bite you.
Before you read any further, please note that I am not comparing code, algorithms, or "real world" situation tests. I am only attempting (very poorly) to make a point that array access is very rarely significantly slower than using unsafe pointers. As I interpret it, an unsafe pointer is accessing managed memory using a C style pointer without the bounds and possibly type checking of array access.
Don't be nearly so quick to write off the capabilities of the JIT optimizer, it has surprised me (and made me feel dumb) on more than one occasion. I have found that it handles array access very well and only does checks as necessary. I've ported lots of C code to C# and there have only been one or two situations that using unsafe methods proved to be faster than the array access that I was able to come up with. Just to fight boredom over my lunch break I ran a few test using Stanislav Panasik's test app, modified to fixed a bug where the reading FileStream was not being closed and disposed of in a timely manner. I renamed my class to QuickLZ2 and uncommented the following lines 1 at a time in Tester.cs.
Byte[] ResultBytes = QuickLZ.Compress(SourceBytes, 0, (uint)tr.lSourceBytes);
Byte[] ResultBytes = QuickLZ2.Compress(SourceBytes);
Byte[] ResultBytes = QuickLZ.Decompress(SourceBytes, 0);
Byte[] ResultBytes = QuickLZ2.Decompress(SourceBytes);
I ran 3 series of tests against a couple of binaries I had on my machine. Code changes were limited to changing a single line of code for each test. All were done using a Release mode build and I restarted my machine before each test. There was no other software running at the time, background or foreground, other than the normal Windows processes. All that said, this is a pretty rudimentary test and in no way conclusive or at all scientific.
Before each series I ran the Compress/Decompress code 3 times. To get my measurements I ran the test 14 times, dropped the 2 highest and 2 lowest scores and averaged the remaining 10. I checked to ensure that the compression ratios were the same for both versions of code. This involved modifying the sequence length variable in my code to match yours. Sorry for the difficulty formatting the data.
Array Access Unsafe Pointers
Ratio Compression Decompression Compression Decompression
412Kb 70.08% 48.45Mbps 68.45Mbps 32.51Mbps 70.23Mbps
7.27Mb 65.87% 48.33Mbps 63.32Mbps 33.74Mbps 67.39Mbps
18.64Mb 58.95% 45.35Mbps 59.19Mbps 37.43Mbps 65.59Mbps
As you can see, at least on these 3 binary application files, compression was always faster using the "slow" array access code. For some reason, which I still point the finger at the memcpy_up method, decompression was always slower. After looking at your code, the final array copy into the destination buffer of the Compress method could be smudging the results slightly. Still, I am doing an Array.Resize at the end of my Compress method, so this should not be a deciding factor for the results that I gathered.
Really the test needs to be run again using packet sized data just to see the results of that test. Unfortunately it is hard to find data of that size without generating it, which would reflect a real world situation even less than these tests. I see a few possible hot spots glaring in your code, but I think I made my point already that array access is not always slower than pointer access, at least by any significant margin, and in many cases it can be faster.
As a side note, I ran the last series of tests again using Debug mode. The array access code was faster by a very significant margin for compression and decompression both. Makes you wonder what kind of magic actually happens behind the scenes. I have been wanting to finish an article that I started some time ago comparing array access to unsafe pointers, maybe now would be a good time seeing as how I have two comparable code bases to work with.
I must say that reading your response to my earlier message was a true pleasure and that I look forward to corresponding further.
Colby
|
|
|
|
|
Well done on the research end of things.
As for my code, it is far from optimized. I found a few rather ugly spots in my code as well that could be better optimized. It was more of a direct first shot 1:1 port. The extra cost you see is definately from the difference in resizing versus reallocating. If you wanted to do a better test, make the 2 unsafe methods public and call them with preallocated memory. There is still quite a bit of code to clean up in the decompression, however it is a more accurate example as it's a one time allocation for both codes.
However, I do digress. The difference we see here is the type of difference we'd care about with C and machine code, and not with .NET, so using a fully managed solution may be viable.
The memcopyup concept is really a rather difficult one to solve with .NET in general, and as you seem to have made a somewhat cleaner solution for this through managed code it may superceed my attempt as I don't intend to continue maintaining it, but rather start pushing towards what I've done all this for in the first place, which is a robust, and scalable secure TCP Network engine. However, I do believe that I will remain using unsafe code for 1 reason when it comes to the engine, and that's because once it's inlined, the compression and decompression need every bit of speed they can get not to be a bottleneck in the engine as it scales upto (hopefully) 1000 concurrent connections as the goal.
This is starting to get a bit off topic, so I will add some additional details and get your feedback in a private message.
|
|
|
|
|
I'm going to reply to your private message soon but I thought I would share this for everyone else.
I modified the Compress method of your code to use the same Array.Resize method as mine. Compressing a 7.5Mb buffer in a loop 100 times it shaved about 0.8 seconds off of a 23 second run. I don't think that is the major hold up. I think the main thing is that the JIT is able to compile the array access code down to something remarkably similar to the unsafe pointer code. Also, I believe that because it is using the array access methods of the language the JIT is able to perform better optimizations whereas your pointer code is mostly limited to what you are able to do with your algorithm and pointer math smarts.
End result is that I believe our code turns into something similar which is the reason there isn't a huge speed gap. Aside from the memcpy_up method, I don't believe there are any optimizations you could make in your code that I couldn't make in mine.
I work in the medical imaging field and have been able to write code using C# to perform real time operations on large images (2000x2000 pixels) without using pointer math. Much of our software is deployed using ClickOnce and is run in an IE app domain and so we don't use unsafe assemblies. The .Net runtime provides many features to help you write very efficient code in a short amount of time, if you know where to find them. Also, because C# is compiled to IL, it isn't always as intuitive as you would think to find hot spots in your code and so a good profiler such as red gate or dottrace is invaluable.
It comes down to asking yourself if you want to fight the machine or use it to your advantage. Arrays are much nicer to work with.
Colby
|
|
|
|
|
I think I found bugs with your code. I attempted to create an empty byte[] of size 256. Upto about 270 or so, it was crashing. I tested my code and it worked fine, though I think I may have spotted other errors in my own code to be fixed (has to do with the placement of the second QCLZ string, haven't looked closely yet)
Anyway, hopefully you'll maintain this code and look into the problem. I attempted to look at the problem, but your code is considerably different than the original and found it difficult to track down where it's going wrong, all I could figure out is that the loop appears to be going further than it should be on the source and FastRead4 is attempting to read past the end of the source. This wouldn't happen in typical use, but with a large empty buffer, it's trying to match a really long RLE sequence and it's getting to the end, trying to match another 4 bytes but only 1 or 2 is available. Adding the boundry check wasn't enough, as the loop continued to run I believe due to the way the main loop uses a subtraction of 2 uint's and it wraps over to a really high value. That's as far as I got, before I tested my code again and confirmed that it compressed an empty buffer, despite it growing larger until around 300 bytes when it shrunk.
|
|
|
|
|
This bug has to do with the main while() loop and GuaranteeUncompressed and SequenceLength both being uints in an unchecked region of code. SequenceLength is larger than GuaranteeUncompressed for such a small buffer length and so there is an underflow. There is a fix on my site if you want to download the latest zip. Hopefully I will have time to get the rest of the site back operational soon.
As for maintaining this code.. if you can talk the original author into allowing a license other than the GPL then I would be glad to address bugs and even work to improve the code and compression algorithm. Otherwise this code is not of much use to me and so it will eventually bitrot and disappear.
Your private message was not forgotten, it still has a star beside it. With the holidays I haven't had much time to reply but I will get to it.
|
|
|
|
|
Ultimately, I ended up just plopping my code back in. I need to optimize the protocol for data <64kb, remove the uncesssary headers and try to reduce to 1 8bit header and 1 16bit header, the rest is unneccary.
I understand your reluctance to continue maintaining the code, my original port is probably going to find it's way to nirvana eventually as well. As nice as GPL is, I admit that anything short of LGPL is going to cause this code to go to waste.
No worries on the private reply, it wasn't time critical. I have already written another iteration of my network engine and I think I'm happy with this one. Just need to optimize the compression/decompression for packet type data.
At this point now I'm just bored, and will probably be looking for something new to port, or try to think of a hobby project to consume a little time.
|
|
|
|
|
I have been trying to solve on how to detect IP addresses based on my installed printer drivers around my network but I am getting frustrated each day..heheh.. Win32_Printer, Win32_NetworkAdapter and Win32_TCPIPPrinter failed to get the information that I want..
if you're kind enough share your knowledge to email it to me at janverge@gmail.com
I would prefer your solution to be in C#
thanks again..
Merry Christmas
nice one..
|
|
|
|
|
In the current port upto SEQLEN = 2050 bytes of the source are saved uncompressed. To increase compression ratio for small data sources, find the line
const UInt32 SEQLEN = 2 + (1 << 11);
and replace it with
const UInt32 SEQLEN = 64;
The constant can be fine tuned experimentally, but setting it too low can decrease compression ratio and speed because it also determins the maximum RLE length and LZ length allowed. It can also be set at runtime according to the 'size' argument - the only reason it isn't is for optimization reasons in the ANSI C version.
Maximum setting is 2050, never exceed that.
- Lasse Reinhold (author of original QuickLZ)
|
|
|
|
|
Thank you for gracing us with a little wisdom of the specifics. Today I got a little adventurous and murdered the algorithm into a ComplexBuffer class I've been working on, and got some nice results integrating it. I did fine tune it down to 64 byte sequence matching, and I didn't notice much change on the ratio, as we discussed. However, I was able to compress a 390 byte file to 328, so that was good.
In my butchering for my buffer, I cut out most of the header costs, down to 5 bytes on compressed data, and 1 byte on uncompressable data (no easy task, let me tell you...), and now it seems suitable for packet compression of smaller input sources.
Glad to see this is stimulating a little discussion
|
|
|
|
|