|
Hello Oskar,
Thank you for your suggestion. I'm sorry if the code was unclear, it's obvious I should have added another object to wrap conventional uses a little bit easier.
To be honest, I was in a bit of a rush to get it out and find out if any odd bugs popped up. So far nothing major has come up, so I'll look into perhaps adding an example and explaining it a bit more.
|
|
|
|
|
... Is my LZF.NET implementation based upon Marc Lehmann's LibLZF, It's also quite fast...
http://www.goof.com/pcg/marc/liblzf.html
|
|
|
|
|
Great, thanks for the link. I hadn't come across this in my searches, however the link you gave doesn't directly point me to the .NET implmentation. Perhaps you'd care to share a more direct link to it, and give .NET people another option.
Thanks again!
|
|
|
|
|
LZO is THE fastest. The next speed king is direct memory copy. =)
ROFLOLMFAO
|
|
|
|
|
Any numbers to back that up ? Are we talking about C# here ? C++ ? What's the result you got that make you claim that ?
|
|
|
|
|
Neither. The website clearly says that it is written in C. Oberhumer's LZO is also written in C. Check it out:
http://www.oberhumer.com/opensource/lzo/
=)
As for numbers, I'm more worried about its speed in C#. I'll investigate. LZO has been around for a decade, so it's likely that its more mature.
ROFLOLMFAO
|
|
|
|
|
Both have C versions and C# ports. Both have been optimized. LZF's site has numbers, as does LZO's site. However, they were measured on different machines, so the only way to compare is by normalizing by memcpy's speed measured on these machines, which plants LZF as the winner (which is why I decided to port it to C#).
|
|
|
|
|
I don't mean to insult, or demean you, but you do realize "normalizing a memcpy" on 2 different machines is virtually impossible right?
You have everything from CPU architecture, cache, clock speed, bus speeds, memory speeds and actual CPU instruction sets affecting the actual result. It is entirely unfair to say "LZF is the winner because memcpy is faster on a different computer". Keep in mind, LZO was designed and profiled on VERY low end hardware, making it even faster on high end hardware.
To this end, and to put the discussion to rest, I found some actual numbers showing LZO is leaps and bounds faster than LZF.
http://www.quicklz.com/
Let's look at the more important numbers involving 68MB of data.
LZO Compression = 81.8MB/sec
LZO Decompression = 307MB/sec
LZF Compression = 60.9MB/sec
LZF Decompression = 198MB/sec
And just for reference, someone who recommended zlib based stuff that's part of .NET now, here's a reference for you:
ZLIB Compression = 7.45MB/sec
ZLIB Decompression = 120MB/sec
Note that if you check QuickLZ on the page, in this case seems the dominating, but also note that version 1.0 was released in November, and are at 1.10 now, making it really an alpha product by comparison. There may be buffer overflow issues and other security risks still. However, should the product mature nicely, and prove it's stability, it may be worth my time to port a .NET QuickLZ as well in a month or two.
What I do want to point out here, is that all around LZO seems about 33% faster than LZF, and has slightly better compression rate too. So, let this be factual information and not some guess by normalizing memcpy's on different hardware.
Now, this doesn't accurately reflect our C# ports, but this does show the basic potential of the algorithm and tells me you made an uneducated guess about which is "the winner" algorithm.
So ultimately, I stand my ground behind why I ported minilzo. It's faster. Period.
|
|
|
|
|
The numbers indeed indicate that LZO is faster on the whole. however :
1. As you indicated, QuickLZ is faster than both, by a lot. Indeed it's new, but has potential.
2. In some of the compression cases, LZF is faster than LZO (exe file, BMP, divx), and as the averaging is done based on the files' size, I could have chosen a bigger DivX file, and hence "lzf would win". What I'm trying to say is that the advantage depends on what you're trying to compress.
3. Normalizing by memcpy is not quite as invalid as you try to make it look, though I admit I did it out of laziness, not out of true belief it's 100% accurate. What I compressed in an AVI file (almost uncompressible), and my results resemble the ones in QuickLZ's tables, so the accuracy is better than I expected.
Anyway, about QuickLZ, I looked at the code, and it looks a lot more complicated than LZO/LZF, so I for one will not be porting it. Also, the speed there, while faster, isn't enough to make me want to put the effort.
|
|
|
|
|
"In some of the compression cases LZF is faster than LZO"
In some cases, perhaps. All EXE's? I doubt it. I think when you're specifically testing against 1 file, any algorithm can become biased. I could write an algorithm that compresses winword even faster and better, but it would only compress winword well and maybe a few select other exe's.
The point I'm making is you can't assess your "better algorithm" based on 1 test, unless you intend your LibLZF to be used only for 1 specific use.
The design of minilzo, more importantly my port of it, was to provide fast streaming compression of basically ALL data and still get a good 40% reduction on average. Do I care that I can't compress an already compressed file? Not really, because if I was using minilzo or LZF, I wouldn't be compressing the data in either case, it's a waste of a lot of CPU cycles on data that, as shown by LZF, gains less than a 1% saving for probably a significant amount of time compressing.
Anyway, beyond the point that LZO makes a better overall compression solution for streaming, the true reason I felt the need to reply again was your final comment about QuickLZ not being "worth th effort". I don't know about you, but in every case it's almost twice as fast as the second fastest alternative for compression, and comes out leaps and bounds ahead of LZO or LZF in ALL cases. The compression ratio's tend to be a little worse, but I'd say that given a month or two to refine the original code, this could replace everything we use now for streaming compression.
Anyway, I didn't mean to be rude, LZF obviously has it's uses which I would place somewhere between LZO and ZLIB in that it's a good large binary file compressor, but I couldn't stand by and have you say LZF is better, because for general purpose uses (streaming packet data that isn't compressed), LZO will come out ahead more often than not.
Anyway, I think I'm going to look more into QuickLZ, it has my curiousity, and it may put both our LZO and LZF products to shame
|
|
|
|
|
Where is the C# port of the liblzf.net?
|
|
|
|
|
go to http://www.goof.com/pcg/marc/liblzf.html
There's a tar.gz file for download, inside it is a "cs" subfolder...
|
|
|
|
|
if you add the original size info to the header of the buffer, end-users won't need to remember that number.
Regards,
unruledboy@hotmail.com
|
|
|
|
|
Technically speaking, adding the size would be duplicating work. If you know the LZO format, the size is there. However, it's not assumed that all uses of this would WANT the 4 bytes for size added again as part of the buffer.
For the laymen, yes, a simple size prepend will work, however that was left out because it is specific to implementation and how the compression is being used. This is a relatively advanced topic, I'd assume such things as specifics of design and implementation don't need to be covered here.
|
|
|
|
|
How should the original size be calculated from just the destination byte array (I don't know the LZO format)? I've done the size prepend for now but it would be nice to know.
Also, the M3_MAX_LEN and M1_MAX_OFFSET constants aren't used - is this just because this is a subset?
Thanks for the great code BTW!
Another change I made was to allow a maximum length to be specified as an optional parameter for the Compress routine. This is helpful if you have a MemoryStream where the internal buffer is available (via GetBuffer()) but now I can pass MemoryStream.Length and extract a compressed byte array directly.
Cheers
Simon
Cheers
Simon
|
|
|
|
|
Using a prepend is probably the easiest way, otherwise you'd have to run some rather archaic processing over the data. So a prepend is the most effective way in terms of transmitting the data.
In my next update, I am considering the possibility of adding the code to also pass an "out" buffer, such that it can be calculated and assigned, but this may not always be the desired use.
M3_MAX_LEN and M1_MAX_OFFSET aren't used based on implementation, they are just leftover if not used, from copying all the like-constants that are used.
I considered the idea of simply allowing to pass a MemoryStream, but again it is specific to implementation so your approach may work. You may also just change the arguments and pass your memory stream, and take care of the GetBuffer/GetLength calls inside the desired function.
Generally speaking it was built to best mimic the original C code while simplifying the interface as much as possible. I'm thinking however, passing offset/length for the byte[]'s may have been better.
In anycase, thanks for your feedback, glad to hear it's getting used
|
|
|
|
|
Since the dotNet Framework 2, there are two options of compressing Streams.
System.IO.Compression.GZipStream;<br />
System.IO.Compression.DeflateStream;
I did not test what is faster.
My second computer is your linux box.
|
|
|
|
|
Thanks for pointing this out, however, it was not the intention of this project to use a slow compression algorithm such as gzip. LZO is incredibly fast, and perhaps one of the best, if not the best option for streaming compression... Please don't confuse this with file based compression, which gzip is far more suited for. The methods you mention better relate to my other article which was about SharpZipLib, at which time I was using 1.1 framework and yes, they added something that was missing.
LZO gives you the option to trade off compression speed for size, without losing any of the decompression speed. This project wasn't meant to superceed anything around, it simply fills a void that nothing else has yet for .NET, and still managed around a 43% savings with the default.
I don't much care about the rating, I didn't do this for popularity, though I am a bit surprised someone felt this deserved such a low rating. Anyway, profile the codes, you'll find minilzo is significantly faster (unless I screwed up my code somewhere) and as such is useful for other purposes than the gzip streams.
|
|
|
|
|
Yes, I agree. Deflate and GZip isn't going to give you realtime compression for demanding applications. LZO, however, is up to the task.
To paraphrase the previous post, LZO is about speed. Zip and GZip may win in a data crushing contest, but not a speed race.
ROFLOLMFAO
|
|
|
|
|