Introduction
I've looked around the internet for an implementation of dynamic LZW compression in MFC. Unfortunately, since I couldn't find one, I've decided to write one myself. This article will not explain the basics of the LZW algorithm. To find that all you need to do is to look for "LZW algorithm" in google, and you'll find a lot information.
But I will explain what is the dynamic and static compression in the LZW implementations, and some problems that someone might face during implementing LZW algorithm.
The files LZWCompression.h, LZWCompression.cpp, Dictionary.h, Dictionary.cpp are all you need to use, to insert the LZW into your project.
Background
what is static compression in LZW?
The static compression is when we use a static number of bits to make the compression. The total number of bits we have to our disposal is 32, which gives us 4,294,967,295 possible entries in our dictionary. In most cases we don't get that far, actually, most of the times we won't get passed the 16 bit barrier (which gives us 65,535 possible entries in the dictionary). So in order to speed up the processing time, we set the number of bits used for compression to a limit (most applications put it on 14). That way every information written to the destination buffer, will be in the same bit size.
So if someone wants to represent the letter 'A' in the 14 bit format it will look like this: 00000001000001. If the dictionary have the value of 245 in its dictionary it will look like so: 00000011110101. As you can see, we have some leading 0 in the number, which means that the file can be compressed even more.
what is dynamic compression in LZW?
The dynamic compression changes the number of bits used to compress the data. It starts with 9 bits for each new value, and goes up until it reaches 32 or until the file ends.
The number of bits to use in the dynamic compression is set by the size of the dictionary you have. Since the dictionary begins with all the 256 (0-255) ASCII codes. Once the dictionary reached the point were the next entry will have to have another bit to represent her (256, 512, ...) we change the bit size of the compression from that point. This also explains why we start with 9 bit compression.
Since the dynamic compression have only one leading 0 in front of each number, you get a more compressed file then the static method. But you pay in extra calculation time. for each entry you put in the dictionary, you have to calculate the number of bits to use.
So if someone wants to represent the letter 'A' it will look like this: 001000001.
If the dictionary have the value of 245 in its dictionary it will look like so: 011110101.
As you can see, we have one leading 0 in the number, and the number of bits to use is still 9.
If the dictionary have the value of 276 in its dictionary it will look like so: 0100010100. And the limit will now be 512, which means we are now dealing with 10 bits.
Compressing the file
Compressing the file is simply saving all the bits in a buffer, and then writing them into the destination file in bytes. To accomplish that we can't just write the data into the file (writing the value 245 into the file will cause the number to look like this: 0000000011110101, which we don't want).
So we have to push the buffer data to the left, in order to make room for the new data to be inserted.
The number of bits we push the data is changing according to the number of bits we need for the data. After pushing the data in the buffer we add to it the data we want it to have. After doing so, we write some information into the destination file, and continue with the application. The following function receive the destination file to write in, and the new data to combine:
void CLZWCompression::CompressData(CFile &dest, long toSave)
{
DWORD writeData = 0;
m_SavedData |= (DWORD) toSave << (32 - m_MaxBits - m_TotalBits);
m_TotalBits += m_MaxBits;
while (m_TotalBits >= 8)
{
writeData = m_SavedData;
writeData >>= 24;
dest.Write(&writeData, 1);
m_SavedData <<= 8;
m_TotalBits -= 8;
}
}
m_MaxBits holds the number of bits that represent the number held by toSave
m_SavedData is the buffer (DWORD) that holds the data to be written to the file, and m_TotalBits is the remaining bits in the buffer that represent a number.
Since we have to use bytes when we write in to files, and since the bit size we are using changes, we remain with bits in the buffer of a previous data. This is the main reason for the use of m_TotalBits to prevent from us to write information on the previous number information, and to make sure that all the data will reach the file correctly.
Decompressing the file
Decompressing the file is to take all the data we had from the previous compression, and translate it into sequences of information that we can understand.
In the decompression we take the code we had in the file, and using the dictionary (which we build again while decompressing), we get the sequence of letters that construct this code.
The decode buffer is filled in the opposite way, meaning from the end byte that needs to be written, to the beginning byte of the sequence.
void CDictionary::GetBytesFromCode(CByteArray *Buffer, DWORD code)
{
dicElement *tmpEl = NULL;
while (code > 255)
{
tmpEl = (dicElement*)GetAt(code - 256);
Buffer->Add(tmpEl->m_Letter);
code = tmpEl->m_Prefix;
}
Buffer->Add((BYTE)code);
}
The buffer then is written into the file when the letters are written from the last to end.
while (counter > 0)
{
writeData = (BYTE)decodeString.GetAt(--counter);
destination.Write(&writeData, 1);
}
decodeString.RemoveAll();
Unlike in the compression, we can't stop the decompression process when the file is over, since we still have information inside the buffer that needs to be decoded. So to end the file, in the compression we entered at the end of the file the maximum number possible by the current bit size.
void CLZWCompression::CloseCompressedFile(CFile &source)
{
CompressData(source, m_MaxCode[m_MaxBits]);
CompressData(source, 0);
}
We flash the end after the m_MaxCode[m_MaxBits] with '0' to make sure that the file will contain all the code. Since the '0' after it will never be used. This value is compared with the currently read data, and if exist means that the end of file arrived, and that we need to end the decompression process.
while ((data = DecompressData(source)) != m_MaxCode[m_MaxBits])
Using the code
All you need to use this code, is just to insert the code into your project. The class creates its own dictionary, and clears it after the use of the compression or decompression process, so using one class to manage all your LZW compression in the project is easy.
Unfortunately, when using this class in Thread, you must make a new instance of the class for every thread you are running, since the class can have only one dictionary at a time.
Code History
- 25 Apr 2004 - The code was added with
convertASCIIToText function for showing the code actual character in decompression, as suggested by WREY. The previous log string was commented and a new one is in use.
| You must Sign In to use this message board. |
|
|
 |
|
|
 |
|
 |
Can I use LZW to compress a string of 50 characters into a shorter string of length less than the original string length 50? Thanks you!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
I want to implement LZW compression nad decompression in verilog HDL can any one send me the code for it.
|
| Sign In·View Thread·PermaLink | 1.00/5 |
|
|
|
 |
|
 |
Hi please can you help me. I need to understand how to use your code, I mean how to run it. it's the 1st time that I search for code by Internet. I know it is a stupid question, but what can I do? please give me the steps so I can put your code & run it, the detailed steps. sama from Baghdad, Iraq
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
|
 |
|
 |
how to show all of the CStringArray members to editbox that have a cstring variable. i have some difficulties coz only last member i can show in my editbox ?.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hi...i have a problem.... Has anyone here know how to compress and uncompress file using LZW???? please help me.......
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
|
 |
|
 |
I have been using this algorithm for a while for sending binary messages and have noticed decompression issues from time to time. I have finally tracked it down and it is caused when the number of dictionary elements is close to a "number of bits" boundary.
For example, if the number of dictionary elements is exactly 511, then the last prefix gets compressed at 9 bits and not 10 that the decompression is expecting. Also if the number of dictionary elements is 510 then the terminating code is set to 511 but the decompression will actually look for 1023. These both cause decompression failures and will happen at all boundaries, i.e. 1023 and 1022, etc.
A solution to these issues is to change the last few lines of the compression to the following:
CalculateBitSize(resAdd+1); CompressData(destination, prefix); CalculateBitSize(resAdd+2); CloseCompressedFile(destination);
// Remove the existing dictionary ClearDictionary(); return TRUE; }
This looks to have solved all my problems..
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
 |
in my first tests the algorithm does not work with pdf-files. I try to extract byte-streams between "stream" and "endstream" in pdf-files and decode the text (I remove linefeeds from the beginning of the stream before decoding). Does someone have experience with decoding pdf-streams? Do i need other algorithm or should this work and i parse it wrong?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
In case it helps anyone....
I don't fully understand what the code is doing as I've not had time to look into it but I have found a condition where the decompression never terminates when m_TotalBits goes negative. I replaced m_TotalBits == 0 with <= which gets it through. This is not detrimental in my code as I only use a known portion of the decompressed data but I guess it may emit more bytes than required in other usage.
In DecompressData...
DWORD returnValue; BYTE readByte = 0; if (SourceIndex < SourceLength) { while (m_TotalBits <= 24) { readByte = SourceBuffer[SourceIndex++]; m_SavedData |= (DWORD) readByte << (24 - m_TotalBits); m_TotalBits += 8; } } else { if (m_SavedData == 0 && m_TotalBits == 0) // Race? return m_MaxCode[m_MaxBits]; }
returnValue = m_SavedData >> (32 - m_MaxBits); m_SavedData <<= m_MaxBits; m_TotalBits -= m_MaxBits;
return returnValue;
|
| Sign In·View Thread·PermaLink | 3.00/5 |
|
|
|
 |
|
 |
iam involved in developing an application where i need to give the input the folder name.i have to zip that particular folder in the path. please help me to proceed further.
regards shiva
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Well, if you want only to compress the folder, then you can either develop something yourself, or get something ready. since you wish to "zip" the folder, I can only suggest you will take a prepered code from the internet.
I suggest the open source known as 7-zip, it's cute fast, and very good.
here is the URL for it: http://www.7-zip.org/
If you want to develop a compression algorithm for yourself, then I'm all for helping with that.
Good luck.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
HI
thanks for ur kind response. actualy i need to zip the folder from my application. i will give the input as one source folder,i have to zip the whole content including if any files there the folder.
please help me to proceed further
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
hi.. i'm developing an application also now.. and i need to zip the folder ... so how is ur progress to zip a folder? if u oredi make it, can u tell how u do that?
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
 |
Hi
Congratulations by your article. I've tried to compress some binary files (regedit.exe by example) and the process took more than minutes to complete the task. One problem that I could see is the dictionary method that you have implemented.
For a binary file, a great dictionary was created (more than 100000 elements) and for every byte found in the source data, the code perform a linear search in the dictionary.
Maybe using a hash table (CMap) instead a CPtrArray could improve the performance.
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
 |
I think it was a nice explanation. But my problem is that I have to compress files that usually are greater than 10GB, even 20GB! My question is: will the method that you describe work for me? What considerations do you think I'll have to take care of?
Regards,
Fernando.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
hi i just got 2 questions that i would like to know their answers if it's ok with u 1- the prefix variable was assigned 4 bytes what if its size increased over that size 2- why didn't u use strings instead of using DWORD thx for time
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hi
1 - A prefix of 4 bytes give you 4G-1 of prefix avialible (since you need the last FFFF as EOF code), so in most cases that will prove to be enough. In case you need more, then you can always set more the 4 bytes, but you have to define the size of the dictionary in advance, since this affect the size of information you can manipulate, and the size of variable that hold the information for you. if you for instance make a system of 5 bytes then you will have to use some sort of pre-configured structure or a double (8 bytes) to make it. and change the types of the variables to that structure, and also define a new limits for writing in the data variable (instead of 24)
2 - String have several problems: a - The are not easy to use when you want to shift them (you have to memmove them or use some other (maybe more string related) function. b - The are not easy when you want to enter 9 bits at the end of the string. c - In relation to the previous note, they are hard when you want to keep a number of bits not divided by 8. d - Strings in MFC (CString class and its derived classes) have problem keeping 0 Byte information (it doesnt keep it), and you sometimes need it in your data with all the shifting.
The Prince of Chaos
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
 |
do this technique LZW really helps me in compression ..... i m trying to compress MPEG file up to MP3 can u gv any sugesstions.....i need help and available source code for help in MFC
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
 |
Hi.
I havn't wrote sourcecode for mp3 yet (mainly due to lack of time), but I can direct you to some sources on the net that might be usful.
First I recomment you'll go over the http://www.codeproject.com/audio/ since there are many examples and sourcecode there on how to read mp3 files. You may be able to construct some information that will help you get passed the difficulties you experiance.
mp3 file structure: (not the very best but helpful) http://www.multiweb.cz/twoinches/MP3inside.htm
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
After decompression is done, all the user sees, are lines stating, "Adding byte 47 to file," or "Adding byte 115 to file," or "Adding byte 10 to file," (etc.).
It would be nice if there was a way the user can see what byte 47 is, or what byte 115 is, (etc.). I realize this might require some 'bit' padding be done in order to reclaim readability, but if such a routine were to be written, it would become a matter of simply using it for whichever byte were to get passed through it. IOW, providing readability would be verification of the algorithm's workability and correctness.
William
Fortes in fide et opere!
|
| Sign In·View Thread·PermaLink | 2.00/5 |
|
|
|
 |
|
|