Click here to Skip to main content
Click here to Skip to main content

LZW Compression

By , 23 Apr 2004
 
<!-- Article image -->

Sample Image - LZW.gif

Introduction

I've looked around the internet for an implementation of dynamic LZW compression in MFC. Unfortunately, since I couldn't find one, I've decided to write one myself. This article will not explain the basics of the LZW algorithm. To find that all you need to do is to look for "LZW algorithm" in google, and you'll find a lot information.

But I will explain what is the dynamic and static compression in the LZW implementations, and some problems that someone might face during implementing LZW algorithm.

The files LZWCompression.h, LZWCompression.cpp, Dictionary.h, Dictionary.cpp are all you need to use, to insert the LZW into your project.

Background

what is static compression in LZW?

The static compression is when we use a static number of bits to make the compression. The total number of bits we have to our disposal is 32, which gives us 4,294,967,295 possible entries in our dictionary. In most cases we don't get that far, actually, most of the times we won't get passed the 16 bit barrier (which gives us 65,535 possible entries in the dictionary). So in order to speed up the processing time, we set the number of bits used for compression to a limit (most applications put it on 14). That way every information written to the destination buffer, will be in the same bit size.

So if someone wants to represent the letter 'A' in the 14 bit format it will look like this: 00000001000001. If the dictionary have the value of 245 in its dictionary it will look like so: 00000011110101. As you can see, we have some leading 0 in the number, which means that the file can be compressed even more.

what is dynamic compression in LZW?

The dynamic compression changes the number of bits used to compress the data. It starts with 9 bits for each new value, and goes up until it reaches 32 or until the file ends.

The number of bits to use in the dynamic compression is set by the size of the dictionary you have. Since the dictionary begins with all the 256 (0-255) ASCII codes. Once the dictionary reached the point were the next entry will have to have another bit to represent her (256, 512, ...) we change the bit size of the compression from that point. This also explains why we start with 9 bit compression.

Since the dynamic compression have only one leading 0 in front of each number, you get a more compressed file then the static method. But you pay in extra calculation time. for each entry you put in the dictionary, you have to calculate the number of bits to use.

So if someone wants to represent the letter 'A' it will look like this: 001000001.

If the dictionary have the value of 245 in its dictionary it will look like so: 011110101.
As you can see, we have one leading 0 in the number, and the number of bits to use is still 9.

If the dictionary have the value of 276 in its dictionary it will look like so: 0100010100. And the limit will now be 512, which means we are now dealing with 10 bits.

Compressing the file

Compressing the file is simply saving all the bits in a buffer, and then writing them into the destination file in bytes. To accomplish that we can't just write the data into the file (writing the value 245 into the file will cause the number to look like this: 0000000011110101, which we don't want).

So we have to push the buffer data to the left, in order to make room for the new data to be inserted.

The number of bits we push the data is changing according to the number of bits we need for the data. After pushing the data in the buffer we add to it the data we want it to have. After doing so, we write some information into the destination file, and continue with the application. The following function receive the destination file to write in, and the new data to combine:

void CLZWCompression::CompressData(CFile &dest, long toSave)
{
	DWORD writeData = 0;

	m_SavedData |= (DWORD) toSave << (32 - m_MaxBits - m_TotalBits);

	m_TotalBits += m_MaxBits;

	while (m_TotalBits >= 8)
	{
		//	Get the byte we want to write
		writeData = m_SavedData;
		writeData >>= 24;
		dest.Write(&writeData, 1);

		//	remove the byte from the buffer
		m_SavedData <<= 8;
		//	Remove the byte from the counter
		m_TotalBits -= 8;
	}
}

m_MaxBits holds the number of bits that represent the number held by toSave
m_SavedData is the buffer (DWORD) that holds the data to be written to the file, and m_TotalBits is the remaining bits in the buffer that represent a number.

Since we have to use bytes when we write in to files, and since the bit size we are using changes, we remain with bits in the buffer of a previous data. This is the main reason for the use of m_TotalBits to prevent from us to write information on the previous number information, and to make sure that all the data will reach the file correctly.

Decompressing the file

Decompressing the file is to take all the data we had from the previous compression, and translate it into sequences of information that we can understand.

In the decompression we take the code we had in the file, and using the dictionary (which we build again while decompressing), we get the sequence of letters that construct this code.

The decode buffer is filled in the opposite way, meaning from the end byte that needs to be written, to the beginning byte of the sequence.

void CDictionary::GetBytesFromCode(CByteArray *Buffer, DWORD code)
{
	dicElement *tmpEl = NULL;

	while (code > 255)
	{
		tmpEl = (dicElement*)GetAt(code - 256);
		Buffer->Add(tmpEl->m_Letter);
		code = tmpEl->m_Prefix;
	}
	Buffer->Add((BYTE)code);
}

The buffer then is written into the file when the letters are written from the last to end.

while (counter > 0)
{
	writeData = (BYTE)decodeString.GetAt(--counter);
	destination.Write(&writeData, 1);
}

decodeString.RemoveAll();

Unlike in the compression, we can't stop the decompression process when the file is over, since we still have information inside the buffer that needs to be decoded. So to end the file, in the compression we entered at the end of the file the maximum number possible by the current bit size.

void CLZWCompression::CloseCompressedFile(CFile &source)
{
	CompressData(source, m_MaxCode[m_MaxBits]);
	CompressData(source, 0);
}

We flash the end after the m_MaxCode[m_MaxBits] with '0' to make sure that the file will contain all the code. Since the '0' after it will never be used. This value is compared with the currently read data, and if exist means that the end of file arrived, and that we need to end the decompression process.

while ((data = DecompressData(source)) != m_MaxCode[m_MaxBits])

Using the code

All you need to use this code, is just to insert the code into your project. The class creates its own dictionary, and clears it after the use of the compression or decompression process, so using one class to manage all your LZW compression in the project is easy.

Unfortunately, when using this class in Thread, you must make a new instance of the class for every thread you are running, since the class can have only one dictionary at a time.

Code History

  • 25 Apr 2004 - The code was added with convertASCIIToText function for showing the code actual character in decompression, as suggested by WREY. The previous log string was commented and a new one is in use.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

The Prince of Chaos
Web Developer
United States United States
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionLZW Compress and DecompressmemberNDL220423-Feb-12 15:40 
Hi.
Can you tell me, now I want to contact the author write the program in the illustration above, because I have a problem relating to compress and decompress files such as. Doc file. Xls , file. pdf and some other file. So who is the author of the program on the application for contacts, and help me solve my problem. Please thank the authors and thank everyone.
I would send the pictures to show you, this is the error in image decompression process the data file.
http://www.mediafire.com/i/?123ec0dcrwqt2hl
http://www.mediafire.com/i/?1b3v05i9tdy3751
http://www.mediafire.com/i/?yjw2uhiuyqmj3xu
 
Goodbye and see you. Hope your answer.
QuestionWhat type of outputmembermosselios4-Feb-12 20:43 
I want to ask about the output...
what type of output you use? i use this text as input :
 
"Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet."
 
in "text.txt" files and compress it, then it create "text.lzw" self that contain this output (if i opened it with microsoft wordpad):
 
txt&ÎFSh€’p9`„Cy² )šN‚     ´Êt ($
†C¢(¤Z1Á`ð‘.r‡ÄbqX¼f(ŽÊ岌’i'ŽJ£ò鄎g&›P#ÒɾE2’ÍcršTê›<£Tfô]=£Ô§)Ý¡?©Îi”J|ú‘h±Õ¬¶Û
n«k¯ÿÀ
 
i try to make lzw but the output still in character level (8 bit /char), with the same input my program generate this output :
 
25:E:11:4:C:3E:22:F:12:14:63:1D:E:B:60:3E:2C:8:13:3E:1A:C:4:13:4C:3E:5F:61:63:65:67:69:6B:6D:6F:71:73:75:77:79:62:64:66:68:3E:6A:6C:11:6E:70:72:74:76:78:60:87:7C:8A:8C:80:90:83:93:86:7B:89:7E:8D:8F:82:92:85:95:9F:7D:8B:7F:8E:81:91:84:94:7A:88:A9:99:AC:9B:A5:B0:96:A0:AA:A2:AD:9C:A6:B1:97:A1:9A:A4:AF:9E:B2:98:AB:A3:AE:9D:A7:C7:C2:B5:C4:CC:C0:BA:B4:CA:BE
 
The idea is every hex code in this output is hex representation from decimal number of dictionary indeks, i'm stuck in this problem, can you give me some advice, what type of output that should i use?
 
then i will search, browse, and learn how to change my lzw output to type of output from your advice...
 
thanks
 
Best regards
mosselios evan
Institut Teknologi Bandung, Indonesia
QuestionLZW compression and decompression in MATLAB.memberalbinbenny15-Sep-09 3:54 
I want to implement LZW compression and decompression in MATLAB. can any one send me the code for it.
QuestionCan it compress a short string into a shorter string?memberchocm27-Dec-08 1:55 
Can I use LZW to compress a string of 50 characters into a shorter string of length less than the original string length 50? Thanks you!
AnswerRe: Can it compress a short string into a shorter string?memberapex_tyf21-Apr-11 4:38 
want to see the answer
QuestionLZW compression and decompressionmembermanekantan9-Dec-08 21:26 
I want to implement LZW compression nad decompression in verilog HDL
can any one send me the code for it.
Questionhelp me pleasemembersam s. samaan21-Oct-08 18:38 
Hi
please can you help me. I need to understand how to use your code, I mean how to run it. it's the 1st time that I search for code by Internet. I know it is a stupid question, but what can I do?
please give me the steps so I can put your code & run it, the detailed steps.
sama from Baghdad, Iraq
AnswerRe: help me pleasememberbill_wang15-Mar-09 15:48 
I think you need to get a vc++ 6.0 or vs2003 to compile the source code.It's very easy, just have a try.
 

 
codeuu,source code
QuestionCStringArray variable to CString variablememberciiiek9-Jan-08 17:00 
how to show all of the CStringArray members to editbox that have a cstring variable.
i have some difficulties coz only last member i can show in my editbox ?.Confused | :confused: Confused | :confused:
GeneralLZW compress filemembercaeBAIK20-Mar-06 3:19 
Hi...i have a problem....
Has anyone here know how to compress and uncompress file using LZW????
please help me.......
GeneralRe: LZW compress filememberHatem Mostafa5-Jan-07 10:27 
yes,
check this http://www.codeproject.com/cpp/LZW.asp[^]
 
Thanks
GeneralSmall BugmemberDanoraky7-Mar-06 8:02 

I have been using this algorithm for a while for sending binary messages and have noticed decompression issues from time to time. I have finally tracked it down and it is caused when the number of dictionary elements is close to a "number of bits" boundary.
 
For example, if the number of dictionary elements is exactly 511, then the last prefix gets compressed at 9 bits and not 10 that the decompression is expecting. Also if the number of dictionary elements is 510 then the terminating code is set to 511 but the decompression will actually look for 1023. These both cause decompression failures and will happen at all boundaries, i.e. 1023 and 1022, etc.
 
A solution to these issues is to change the last few lines of the compression to the following:
 
// Compress the remaining information in the refix into the destination file
CalculateBitSize(resAdd+1);
CompressData(destination, prefix);
// Close the destination file
CalculateBitSize(resAdd+2);
CloseCompressedFile(destination);

 
// Remove the existing dictionary
ClearDictionary();
return TRUE;
}
 
This looks to have solved all my problems..
Generaldo it with lzw-compressed pdf streamsmemberprivet25-Nov-05 8:49 
in my first tests the algorithm does not work with pdf-files.
I try to extract byte-streams between "stream" and "endstream" in pdf-files and decode the text (I remove linefeeds from the beginning of the stream before decoding).
Does someone have experience with decoding pdf-streams?
Do i need other algorithm or should this work and i parse it wrong?
QuestionRace Condition?memberrocksoft16-May-05 5:28 
In case it helps anyone....
 
I don't fully understand what the code is doing as I've not had time to look into it but I have found a condition where the decompression never terminates when m_TotalBits goes negative. I replaced m_TotalBits == 0 with <= which gets it through. This is not detrimental in my code as I only use a known portion of the decompressed data but I guess it may emit more bytes than required in other usage.
 
In DecompressData...
 
DWORD returnValue;
BYTE readByte = 0;

if (SourceIndex < SourceLength)
{
while (m_TotalBits <= 24)
{
readByte = SourceBuffer[SourceIndex++];
m_SavedData |= (DWORD) readByte << (24 - m_TotalBits);
m_TotalBits += 8;
}
}
else
{
if (m_SavedData == 0 && m_TotalBits == 0) // Race?
return m_MaxCode[m_MaxBits];
}
 
returnValue = m_SavedData >> (32 - m_MaxBits);
m_SavedData <<= m_MaxBits;
m_TotalBits -= m_MaxBits;
 
return returnValue;
Questionhow to zip a foldermembernkshiv17-Mar-05 22:14 
iam involved in developing an application where i need to give the input the folder name.i have to zip that particular folder in the path.
please help me to proceed further.
 
regards
shiva
AnswerRe: how to zip a foldermemberThe Prince of Chaos17-Mar-05 22:23 
Well, if you want only to compress the folder, then you can either develop something yourself, or get something ready.
since you wish to "zip" the folder, I can only suggest you will take a prepered code from the internet.
 
I suggest the open source known as 7-zip, it's cute fast, and very good.
 
here is the URL for it: http://www.7-zip.org/
 
If you want to develop a compression algorithm for yourself, then I'm all for helping with that.
 
Good luck.
GeneralRe: how to zip a foldersussAnonymous18-Mar-05 19:12 
HI
 
thanks for ur kind response.
actualy i need to zip the folder from my application.
i will give the input as one source folder,i have to zip the whole content
including if any files there the folder.
 
please help me to proceed further
 


GeneralRe: how to zip a foldersussnicole_y21-Jul-05 1:08 
hi.. i'm developing an application also now.. and i need to zip the folder ... so how is ur progress to zip a folder? if u oredi make it, can u tell how u do that? Big Grin | :-D
GeneralGod, but very slowmemberleandrobecker28-Dec-04 11:41 
Hi
 
Congratulations by your article. I've tried to compress some binary files (regedit.exe by example) and the process took more than minutes to complete the task. One problem that I could see is the dictionary method that you have implemented.
 
For a binary file, a great dictionary was created (more than 100000 elements) and for every byte found in the source data, the code perform a linear search in the dictionary.
 
Maybe using a hash table (CMap) instead a CPtrArray could improve the performance.
Questionwhat would you recomend...sussKith_Kahnan19-Nov-04 12:13 
I think it was a nice explanation. But my problem is that I have to compress files that usually are greater than 10GB, even 20GB! My question is: will the method that you describe work for me? What considerations do you think I'll have to take care of?
 
Regards,
 
Fernando.
Generalgot some questionsmemberraars29-Sep-04 22:19 
hi
i just got 2 questions that i would like to know their answers if it's ok with u
1- the prefix variable was assigned 4 bytes what if its size increased over that size
2- why didn't u use strings instead of using DWORD
thx for time
GeneralRe: got some questionsmemberThe Prince of Chaos2-Oct-04 8:56 
Hi
 
1 - A prefix of 4 bytes give you 4G-1 of prefix avialible (since you need the last FFFF as EOF code), so in most cases that will prove to be enough.
In case you need more, then you can always set more the 4 bytes, but you have to define the size of the dictionary in advance, since this affect the size of information you can manipulate, and the size of variable that hold the information for you.
if you for instance make a system of 5 bytes then you will have to use some sort of pre-configured structure or a double (8 bytes) to make it. and change the types of the
variables to that structure, and also define a new limits for writing in the data variable (instead of 24)
 
2 - String have several problems:
a - The are not easy to use when you want to shift them (you have to memmove them or use some other (maybe more string related) function.
b - The are not easy when you want to enter 9 bits at the end of the string.
c - In relation to the previous note, they are hard when you want to keep a number of bits not divided by 8.
d - Strings in MFC (CString class and its derived classes) have problem keeping 0 Byte information (it doesnt keep it), and you sometimes need it in your data with all the shifting.
 

 
The Prince of Chaos
QuestionHow to Compress Image using LZW Compression.susspubba20-Jul-04 0:12 
hi,
I have a bmp file.now I want to do LZW Compression.Can I do that?.
please help me...

AnswerRe: How to Compress Image using LZW Compression.memberoseman23-May-05 8:19 
how can i compress pictureSigh | :sigh:
GeneralRe: How to Compress Image using LZW Compression.memberpubududilena24-May-05 17:49 
hi,
You have to use Image Encorders in GDI+.
try with MSDN Examples.
 
regards,
pubudu.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130617.1 | Last Updated 24 Apr 2004
Article Copyright 2004 by The Prince of Chaos
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid