Click here to Skip to main content
Click here to Skip to main content

LZW Compression

, 23 Apr 2004
Rate this:
Please Sign up or sign in to vote.
This will show the simple and useful way to implement a compression algorithm in MFC
<!-- Article image -->

Sample Image - LZW.gif


I've looked around the internet for an implementation of dynamic LZW compression in MFC. Unfortunately, since I couldn't find one, I've decided to write one myself. This article will not explain the basics of the LZW algorithm. To find that all you need to do is to look for "LZW algorithm" in google, and you'll find a lot information.

But I will explain what is the dynamic and static compression in the LZW implementations, and some problems that someone might face during implementing LZW algorithm.

The files LZWCompression.h, LZWCompression.cpp, Dictionary.h, Dictionary.cpp are all you need to use, to insert the LZW into your project.


what is static compression in LZW?

The static compression is when we use a static number of bits to make the compression. The total number of bits we have to our disposal is 32, which gives us 4,294,967,295 possible entries in our dictionary. In most cases we don't get that far, actually, most of the times we won't get passed the 16 bit barrier (which gives us 65,535 possible entries in the dictionary). So in order to speed up the processing time, we set the number of bits used for compression to a limit (most applications put it on 14). That way every information written to the destination buffer, will be in the same bit size.

So if someone wants to represent the letter 'A' in the 14 bit format it will look like this: 00000001000001. If the dictionary have the value of 245 in its dictionary it will look like so: 00000011110101. As you can see, we have some leading 0 in the number, which means that the file can be compressed even more.

what is dynamic compression in LZW?

The dynamic compression changes the number of bits used to compress the data. It starts with 9 bits for each new value, and goes up until it reaches 32 or until the file ends.

The number of bits to use in the dynamic compression is set by the size of the dictionary you have. Since the dictionary begins with all the 256 (0-255) ASCII codes. Once the dictionary reached the point were the next entry will have to have another bit to represent her (256, 512, ...) we change the bit size of the compression from that point. This also explains why we start with 9 bit compression.

Since the dynamic compression have only one leading 0 in front of each number, you get a more compressed file then the static method. But you pay in extra calculation time. for each entry you put in the dictionary, you have to calculate the number of bits to use.

So if someone wants to represent the letter 'A' it will look like this: 001000001.

If the dictionary have the value of 245 in its dictionary it will look like so: 011110101.
As you can see, we have one leading 0 in the number, and the number of bits to use is still 9.

If the dictionary have the value of 276 in its dictionary it will look like so: 0100010100. And the limit will now be 512, which means we are now dealing with 10 bits.

Compressing the file

Compressing the file is simply saving all the bits in a buffer, and then writing them into the destination file in bytes. To accomplish that we can't just write the data into the file (writing the value 245 into the file will cause the number to look like this: 0000000011110101, which we don't want).

So we have to push the buffer data to the left, in order to make room for the new data to be inserted.

The number of bits we push the data is changing according to the number of bits we need for the data. After pushing the data in the buffer we add to it the data we want it to have. After doing so, we write some information into the destination file, and continue with the application. The following function receive the destination file to write in, and the new data to combine:

void CLZWCompression::CompressData(CFile &dest, long toSave)
	DWORD writeData = 0;

	m_SavedData |= (DWORD) toSave << (32 - m_MaxBits - m_TotalBits);

	m_TotalBits += m_MaxBits;

	while (m_TotalBits >= 8)
		//	Get the byte we want to write
		writeData = m_SavedData;
		writeData >>= 24;
		dest.Write(&writeData, 1);

		//	remove the byte from the buffer
		m_SavedData <<= 8;
		//	Remove the byte from the counter
		m_TotalBits -= 8;

m_MaxBits holds the number of bits that represent the number held by toSave
m_SavedData is the buffer (DWORD) that holds the data to be written to the file, and m_TotalBits is the remaining bits in the buffer that represent a number.

Since we have to use bytes when we write in to files, and since the bit size we are using changes, we remain with bits in the buffer of a previous data. This is the main reason for the use of m_TotalBits to prevent from us to write information on the previous number information, and to make sure that all the data will reach the file correctly.

Decompressing the file

Decompressing the file is to take all the data we had from the previous compression, and translate it into sequences of information that we can understand.

In the decompression we take the code we had in the file, and using the dictionary (which we build again while decompressing), we get the sequence of letters that construct this code.

The decode buffer is filled in the opposite way, meaning from the end byte that needs to be written, to the beginning byte of the sequence.

void CDictionary::GetBytesFromCode(CByteArray *Buffer, DWORD code)
	dicElement *tmpEl = NULL;

	while (code > 255)
		tmpEl = (dicElement*)GetAt(code - 256);
		code = tmpEl->m_Prefix;

The buffer then is written into the file when the letters are written from the last to end.

while (counter > 0)
	writeData = (BYTE)decodeString.GetAt(--counter);
	destination.Write(&writeData, 1);


Unlike in the compression, we can't stop the decompression process when the file is over, since we still have information inside the buffer that needs to be decoded. So to end the file, in the compression we entered at the end of the file the maximum number possible by the current bit size.

void CLZWCompression::CloseCompressedFile(CFile &source)
	CompressData(source, m_MaxCode[m_MaxBits]);
	CompressData(source, 0);

We flash the end after the m_MaxCode[m_MaxBits] with '0' to make sure that the file will contain all the code. Since the '0' after it will never be used. This value is compared with the currently read data, and if exist means that the end of file arrived, and that we need to end the decompression process.

while ((data = DecompressData(source)) != m_MaxCode[m_MaxBits])

Using the code

All you need to use this code, is just to insert the code into your project. The class creates its own dictionary, and clears it after the use of the compression or decompression process, so using one class to manage all your LZW compression in the project is easy.

Unfortunately, when using this class in Thread, you must make a new instance of the class for every thread you are running, since the class can have only one dictionary at a time.

Code History

  • 25 Apr 2004 - The code was added with convertASCIIToText function for showing the code actual character in decompression, as suggested by WREY. The previous log string was commented and a new one is in use.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

The Prince of Chaos
Web Developer
United States United States
No Biography provided

Comments and Discussions

QuestionCStringArray variable to CString variable Pinmemberciiiek9-Jan-08 18:00 
GeneralLZW compress file PinmembercaeBAIK20-Mar-06 4:19 
GeneralRe: LZW compress file PinmemberHatem Mostafa5-Jan-07 11:27 
GeneralSmall Bug PinmemberDanoraky7-Mar-06 9:02 

I have been using this algorithm for a while for sending binary messages and have noticed decompression issues from time to time. I have finally tracked it down and it is caused when the number of dictionary elements is close to a "number of bits" boundary.
For example, if the number of dictionary elements is exactly 511, then the last prefix gets compressed at 9 bits and not 10 that the decompression is expecting. Also if the number of dictionary elements is 510 then the terminating code is set to 511 but the decompression will actually look for 1023. These both cause decompression failures and will happen at all boundaries, i.e. 1023 and 1022, etc.
A solution to these issues is to change the last few lines of the compression to the following:
// Compress the remaining information in the refix into the destination file
CompressData(destination, prefix);
// Close the destination file

// Remove the existing dictionary
return TRUE;
This looks to have solved all my problems..
Generaldo it with lzw-compressed pdf streams Pinmemberprivet25-Nov-05 9:49 
QuestionRace Condition? Pinmemberrocksoft16-May-05 6:28 
Questionhow to zip a folder Pinmembernkshiv17-Mar-05 23:14 
AnswerRe: how to zip a folder PinmemberThe Prince of Chaos17-Mar-05 23:23 
GeneralRe: how to zip a folder PinsussAnonymous18-Mar-05 20:12 
GeneralRe: how to zip a folder Pinsussnicole_y21-Jul-05 2:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.150327.1 | Last Updated 24 Apr 2004
Article Copyright 2004 by The Prince of Chaos
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid