|
Hi.
Can you provide an example on using this lib reading and writing from/to a memory stream?
The objective is to see if it's possible to read and write to a database, without reading the entire file into memory, but in smaller chunks of data, compressing each chunk in-memory, and send it to the database stream:
1-grab a chunk of the file (maybe 25MB)
2-compress
3-write to db
4-repeat
Thanks,
Duarte
|
|
|
|
|
While it is possible, I am afraid I don't have the time to put into expanding this project beyond it's original port, I'm sorry.
Although I can see some issues with the idea, since LZO is based on look ahead parsing, it may not compress as well as it could unchunked. However if you standardize your calls, and read blocks in definate sizes (like 1MB segments) then you should be able to read 1MB, compress, write result and continue until out of data. Reversing the process should be the same, however as I have no tried segmenting/streaming compression I'm not sure how LZO reacts to it.
Best of luck to you.
|
|
|
|
|
Hello Fellows,
Does any one has the original code,
It would be nice to have it.
If any one could send me the code it would be great.
Best Regards
Tit Copernicus, MCAD
|
|
|
|
|
I'm afraid I'm not sure what you are asking for? If you want the source code, it is linked to the article at the top. If you are seeking the minilzo original source, google will return what you need with little effort.
Good luck.
|
|
|
|
|
In order to make this work on Pocket PC / Windows Mobile (an I suspect on 64 bit architectures but I am not sure) you need to make a small change. Because of the fact that ushorts are always word alligned on those platforms you can not simply cst a byte* to a ushort*. As a result the following changes need to be made:
Add the following method to the MiniLZO class:
<br />
private unsafe static ushort BytePtrToUshort(byte* input)<br />
{<br />
byte * bp = input;<br />
byte b1 = *bp;<br />
bp++;<br />
byte b2 = *bp;<br />
ushort result = (ushort)b1;<br />
ushort pres = (ushort)b2;<br />
pres <<= 8;<br />
result += pres;<br />
return result;<br />
}<br />
Change the occurrance of
<br />
(ushort*)pos <br />
to be
<br />
BytePtrToUshort(pos)<br />
(there is one change)
and all occurances of
<br />
(ushort*)ip <br />
to be
<br />
BytePtrToUshort(ip)<br />
(there should be three changes)
Will now work for WM5 etc.
Can't wait to see your pure managed version!!!
RKM
|
|
|
|
|
first of all Thanks for converting lzo in to C#. You library is working perfectly on 32 bit but the same code compiled on 64 bit gives the following error
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
And this happens inside the compress method at the following line(pos = ip - (ip - dict[index]))
I have extracted the code part including the line gives the error.
for (; ; )
{
offset = 0;
index = D_INDEX1(ip);
pos = ip - (ip - dict[index]);
if (pos < input || (offset = (uint)(ip - pos)) <= 0 || offset > M4_MAX_OFFSET)
literal = true;
else if (offset <= M2_MAX_OFFSET || pos[3] == ip[3])
{
}
else
{
index = D_INDEX2(index);
pos = ip - (ip - dict[index]);
if (pos < input || (offset = (uint)(ip - pos)) <= 0 || offset > M4_MAX_OFFSET)
literal = true;
It would be really great if you can make it 64 bit compatible as well.
Thanks.
Bimal
|
|
|
|
|
Unfortunately I do not have a copy of Vista, or any other (?) 64bit OS in which I could do the testing and determine the exact problem. My guess is that because x86/x64 is little endian, that perhaps some code is relying on 32bit overflow wrapping, or something. I don't know the specifics of the implementation, porting didn't require such knowledge, and the initial port was the end of my support as there is no commercial viability. It was more out of boredom than anything
As an alternative, if you don't need LZO specifically, check out the QuickLZ port I did after this LZO port. If you're looking for speed I think it's faster, and I took more time to understand the algorithm, so the datatypes SHOULD be defined correctly to work on 64bit, though neither tested nor optimized for it.
Best of luck.
|
|
|
|
|
The fix is to double the size of the dictionary on 64-bit systems. The "dict" local variable is a pointer array, so on 64-bit systems each pointer will occupy 8 bytes instead of 4 bytes. So, I modified the code as follows (and it works fine now on 64-bit systems):
private static readonly uint DICT_SIZE;
static MiniLZO() {
if ( IntPtr.Size == 8 ) {
DICT_SIZE = (65536 + 3) * 2;
} else {
DICT_SIZE = (65536 + 3);
}
}
|
|
|
|
|
Thanks, really great.
Now I am 64bit!
|
|
|
|
|
|
I was interested in LZO for a while. The website said that a .NET port would be arriving soon. Too bad Oberhumer's definition of "soon" was many many years (or maybe never). I'm glad you're technical skills in C brought LZO to the .NET community finally. ^_^
ROFLOLMFAO
|
|
|
|
|
Thank you, I'm glad someone had positive feedback about the project. I was worried I was the only one who saw the potential need for something like this. I read your other comments, standing up for LZO being the fastest algorithm. I'd be interested to see some profiled results of both my minilzo port and the other author's LZF port.
Keep in mind, while minilzo may be more mature, my port most certainly is not. I'm sure he's tested and profiled his own code a few times to ensure it's running smooth. On the other hand, my minilzo port testing was well, uploading it here to see if the community found it useful and found bugs. So as he's motivated himself to improve his, I've decided to stay simple with minilzo, keep the smallest memory footprint and overhead possible as was the original intention of MiniLZO.
|
|
|
|
|
I think there may be a bug when the source array size is less than the M2_MAX_LEN + 5 threshold.
I haven't looked at the C source but this appears to fix it:
if (src.Length <= M2_MAX_LEN + 5)
{
tmp = (uint) src.Length;
dstlen = 0; <--- Add this line
}
Cheers
Simon
|
|
|
|
|
Much appreciated Simon, sorry it took so long, I just now got around to practical use of the code myself and remembered this comment. I looked over the point and you're absolutely correct. I appologize for the oversight, and thank you for the bug report.
I'll post a new version with the fix.
|
|
|
|
|
[Low importance]
As a piece of code, I'm sure this is great - frankly it's a bit beyond me.
As an article, it's less great - some more extensive examples, sample code (beyond the MiniLZO class itself), etc. would be helpful in furthering the understanding of the code and its usefulness.
|
|
|
|
|
Hello Oskar,
Thank you for your suggestion. I'm sorry if the code was unclear, it's obvious I should have added another object to wrap conventional uses a little bit easier.
To be honest, I was in a bit of a rush to get it out and find out if any odd bugs popped up. So far nothing major has come up, so I'll look into perhaps adding an example and explaining it a bit more.
|
|
|
|
|
... Is my LZF.NET implementation based upon Marc Lehmann's LibLZF, It's also quite fast...
http://www.goof.com/pcg/marc/liblzf.html
|
|
|
|
|
Great, thanks for the link. I hadn't come across this in my searches, however the link you gave doesn't directly point me to the .NET implmentation. Perhaps you'd care to share a more direct link to it, and give .NET people another option.
Thanks again!
|
|
|
|
|
LZO is THE fastest. The next speed king is direct memory copy. =)
ROFLOLMFAO
|
|
|
|
|
Any numbers to back that up ? Are we talking about C# here ? C++ ? What's the result you got that make you claim that ?
|
|
|
|
|
Neither. The website clearly says that it is written in C. Oberhumer's LZO is also written in C. Check it out:
http://www.oberhumer.com/opensource/lzo/
=)
As for numbers, I'm more worried about its speed in C#. I'll investigate. LZO has been around for a decade, so it's likely that its more mature.
ROFLOLMFAO
|
|
|
|
|
Both have C versions and C# ports. Both have been optimized. LZF's site has numbers, as does LZO's site. However, they were measured on different machines, so the only way to compare is by normalizing by memcpy's speed measured on these machines, which plants LZF as the winner (which is why I decided to port it to C#).
|
|
|
|
|
I don't mean to insult, or demean you, but you do realize "normalizing a memcpy" on 2 different machines is virtually impossible right?
You have everything from CPU architecture, cache, clock speed, bus speeds, memory speeds and actual CPU instruction sets affecting the actual result. It is entirely unfair to say "LZF is the winner because memcpy is faster on a different computer". Keep in mind, LZO was designed and profiled on VERY low end hardware, making it even faster on high end hardware.
To this end, and to put the discussion to rest, I found some actual numbers showing LZO is leaps and bounds faster than LZF.
http://www.quicklz.com/
Let's look at the more important numbers involving 68MB of data.
LZO Compression = 81.8MB/sec
LZO Decompression = 307MB/sec
LZF Compression = 60.9MB/sec
LZF Decompression = 198MB/sec
And just for reference, someone who recommended zlib based stuff that's part of .NET now, here's a reference for you:
ZLIB Compression = 7.45MB/sec
ZLIB Decompression = 120MB/sec
Note that if you check QuickLZ on the page, in this case seems the dominating, but also note that version 1.0 was released in November, and are at 1.10 now, making it really an alpha product by comparison. There may be buffer overflow issues and other security risks still. However, should the product mature nicely, and prove it's stability, it may be worth my time to port a .NET QuickLZ as well in a month or two.
What I do want to point out here, is that all around LZO seems about 33% faster than LZF, and has slightly better compression rate too. So, let this be factual information and not some guess by normalizing memcpy's on different hardware.
Now, this doesn't accurately reflect our C# ports, but this does show the basic potential of the algorithm and tells me you made an uneducated guess about which is "the winner" algorithm.
So ultimately, I stand my ground behind why I ported minilzo. It's faster. Period.
|
|
|
|
|
The numbers indeed indicate that LZO is faster on the whole. however :
1. As you indicated, QuickLZ is faster than both, by a lot. Indeed it's new, but has potential.
2. In some of the compression cases, LZF is faster than LZO (exe file, BMP, divx), and as the averaging is done based on the files' size, I could have chosen a bigger DivX file, and hence "lzf would win". What I'm trying to say is that the advantage depends on what you're trying to compress.
3. Normalizing by memcpy is not quite as invalid as you try to make it look, though I admit I did it out of laziness, not out of true belief it's 100% accurate. What I compressed in an AVI file (almost uncompressible), and my results resemble the ones in QuickLZ's tables, so the accuracy is better than I expected.
Anyway, about QuickLZ, I looked at the code, and it looks a lot more complicated than LZO/LZF, so I for one will not be porting it. Also, the speed there, while faster, isn't enough to make me want to put the effort.
|
|
|
|
|
"In some of the compression cases LZF is faster than LZO"
In some cases, perhaps. All EXE's? I doubt it. I think when you're specifically testing against 1 file, any algorithm can become biased. I could write an algorithm that compresses winword even faster and better, but it would only compress winword well and maybe a few select other exe's.
The point I'm making is you can't assess your "better algorithm" based on 1 test, unless you intend your LibLZF to be used only for 1 specific use.
The design of minilzo, more importantly my port of it, was to provide fast streaming compression of basically ALL data and still get a good 40% reduction on average. Do I care that I can't compress an already compressed file? Not really, because if I was using minilzo or LZF, I wouldn't be compressing the data in either case, it's a waste of a lot of CPU cycles on data that, as shown by LZF, gains less than a 1% saving for probably a significant amount of time compressing.
Anyway, beyond the point that LZO makes a better overall compression solution for streaming, the true reason I felt the need to reply again was your final comment about QuickLZ not being "worth th effort". I don't know about you, but in every case it's almost twice as fast as the second fastest alternative for compression, and comes out leaps and bounds ahead of LZO or LZF in ALL cases. The compression ratio's tend to be a little worse, but I'd say that given a month or two to refine the original code, this could replace everything we use now for streaming compression.
Anyway, I didn't mean to be rude, LZF obviously has it's uses which I would place somewhere between LZO and ZLIB in that it's a good large binary file compressor, but I couldn't stand by and have you say LZF is better, because for general purpose uses (streaming packet data that isn't compressed), LZO will come out ahead more often than not.
Anyway, I think I'm going to look more into QuickLZ, it has my curiousity, and it may put both our LZO and LZF products to shame
|
|
|
|
|