Click here to Skip to main content
12,397,805 members (60,283 online)
Click here to Skip to main content
Add your own
alternative version

Stats

147.7K views
7.3K downloads
80 bookmarked
Posted

Simple and fast Huffman coding

, 3 Jan 2005 CPOL
Rate this:
Please Sign up or sign in to vote.
An article on fast Huffman coding technique.

Sample Image

Introduction

This article describes the simplest and fastest Huffman code you can find in the net, not using any external library like STL or components, just using simple C functions like: memset, memmove, qsort, malloc, realloc, and memcpy.

So, any one will find it easy to understand the code or even to modify it.

Background

Huffman compression is a lossless compression algorithm that is ideal for compressing text or program files. Huffman compression belongs into a family of algorithms with a variable codeword length. That means that individual symbols (characters in a text file, for instance) are replaced by bit sequences that have a distinct length. So, symbols that occur a lot in a file are given a short sequence while others that are used seldom get a longer bit sequence.

Using the code

I wrote the code as simple C functions to make it easy to use anywhere. You can put them in a class or just use the function. And I made it in a simple format, just in and out buffer, no input or output files like in many articles.

bool CompressHuffman(BYTE *pSrc, int nSrcLen, BYTE *&pDes, int &nDesLen);
bool DecompressHuffman(BYTE *pSrc, int nSrcLen, BYTE *&pDes, int &nDesLen);

Points of Interest

Speed

I spent a long time to make it (huffman.cpp) so fast and at the same time I don't use any libraries like STL or MFC. It compresses 16 MB in less than 1 sec (PIII processor, 1G speed).

Compression

The compression code is so simple, it starts by initializing 511 of Huffman nodes by its ASCII values:

CHuffmanNode nodes[511];
for(int nCount = 0; nCount < 256; nCount++)
    nodes[nCount].byAscii = nCount;

Then, it calculates each ASCII frequency in the input buffer:

for(nCount = 0; nCount < nSrcLen; nCount++)
    nodes[pSrc[nCount]].nFrequency++;

Then, it sorts ASCII characters depending on frequency:

qsort(nodes, 256, sizeof(CHuffmanNode), frequencyCompare);

Now, it constructs Huffman tree, to get each ASCII code bit that will be replaced in the output buffer:

int nNodeCount = GetHuffmanTree(nodes);

Constructing Huffman tree is so simple; it is just putting all nodes in a queue, and replacing the two lowest frequency nodes with one node that has the sum of their frequencies so that this new node will be the parent of these two nodes. And do this step till the queue just contains one node (tree root).

// parent node
pNode = &nodes[nParentNode++];
// pop first child
pNode->pLeft = PopNode(pNodes, nBackNode--, false);
// pop second child
pNode->pRight = PopNode(pNodes, nBackNode--, true);
// adjust parent of the two poped nodes
pNode->pLeft->pParent = pNode->pRight->pParent = pNode;
// adjust parent frequency
pNode->nFrequency = pNode->pLeft->nFrequency + pNode->pRight->nFrequency;

I made a good trick here to avoid using any queue module. I knew before that ASCII is 256 characters, but I allocated 511 (CHuffmanNode nodes[511];), and made use of the 255 nodes after the first 256 to be used as the parents in the Huffman tree. And just use an array of points (CHuffmanNode* pNodes[256]) to point to these nodes at the time of constructing the tree. Also, use two variables to handle queue indexing (int nParentNode = nNodeCount, nBackNode = nNodeCount-1;).

Then, the final step in the compression is to write each ASCII code in the output buffer:

int nDesIndex = 0;
// loop to write codes
for(nCount = 0; nCount < nSrcLen; nCount++)
{
    *(DWORD*)(pDesPtr+(nDesIndex>>3)) |=
              nodes[pSrc[nCount]].dwCode << (nDesIndex&7);
    nDesIndex += nodes[pSrc[nCount]].nCodeLength;
}
  • (nDesIndex>>3): >>3 to divide by 8 to reach the right byte to start with.
  • (nDesIndex&7): &7 to get the remainder of dividing by 8, to get the start bit.

Note: at the compressed buffer, we should save Huffman tree nodes with its frequencies so we can construct Huffman tree again at the time of decompression (just the ASCIIs that have a frequency).

Decompression

The decompression is more easier as we just need to construct Huffman tree, then loop in the input buffer to replace each code with its ASCII. Just remember that the input buffer, in this case, is a stream of bits that contain the codes of each ASCII. So, to replace the code with the ASCII, we need to iterate Huffman tree with the bit stream till we find a leaf. Then, we can append its ASCII at the output buffer:

int nDesIndex = 0;
DWORD nCode;
while(nDesIndex < nDesLen)
{
    nCode = (*(DWORD*)(pSrc+(nSrcIndex>>3)))>>(nSrcIndex&7);
    pNode = pRoot;
    while(pNode->pLeft)
    {
        pNode = (nCode&1) ? pNode->pRight : pNode->pLeft;
        nCode >>= 1;
        nSrcIndex++;
    }
    pDes[nDesIndex++] = pNode->byAscii;
}
  • (nSrcIndex>>3): >>3 to divide by 8 to reach the right byte to start with.
  • (nSrcIndex&7): &7 to get the remainder of dividing by 8, to get the start bit.

Source code files

  • huffman.cpp,
  • huffman.h.

Updates

  • 1-1-2005:
    • Increasing compression speed by replacing:
      dwCode = nodes[pSrc[nCount]].dwCode;
      nCodeLength = nodes[pSrc[nCount]].nCodeLength;
      while(nCodeLength--)
      {
          if(dwCode&1)
              SetBit(pDesPtr, nDesIndex);
          dwCode >>= 1, nDesIndex++;
      }
      

      with

      *(DWORD*)(pDesPtr+(nDesIndex>>3)) |=
            nodes[pSrc[nCount]].dwCode << (nDesIndex&7);
      

      at the compression function, so I avoided the use of SetBit macro which was slow to be done in a loop.

    • Increasing decompression speed by replacing:
      while(pNode->pLeft)    // if node has pLeft then it must has pRight
      {
          // node not leaf
          pNode = GetBit(pSrc, nSrcIndex) ? pNode->pRight : pNode->pLeft;
          nSrcIndex++;
      }
      

      with

      nCode = (*(DWORD*)(pSrc+(nSrcIndex>>3)))>>(nSrcIndex&7);
      while(pNode->pLeft)    // if node has pLeft then it must has pRight
      {
          // node not leaf
          pNode = (nCode&1) ? pNode->pRight : pNode->pLeft;
          nCode >>= 1;
          nSrcIndex++;
      }
      

      at the decompression function, so I avoided the use of GetBit macro which was slow to be done in a loop.

Thanks to...

I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author


You may also be interested in...

Comments and Discussions

 
QuestionGood article Pin
Wayne Wang8-Jul-15 3:59
memberWayne Wang8-Jul-15 3:59 
GeneralMy vote of 5 Pin
Member 102868165-Dec-14 2:07
memberMember 102868165-Dec-14 2:07 
GeneralGood Job Pin
ehab_nour9-May-14 7:45
memberehab_nour9-May-14 7:45 
QuestionFix buffer overrun crash during realloc on any uncompressible input (like random data or 0-length input) Pin
cniculescu19-Apr-12 10:58
membercniculescu19-Apr-12 10:58 
AnswerRe: Fix buffer overrun crash during realloc on any uncompressible input (like random data or 0-length input) Pin
Member 417463925-Mar-14 9:00
memberMember 417463925-Mar-14 9:00 
Question7x decompression speedup! Pin
IvanovaIsGod23-Sep-11 8:02
memberIvanovaIsGod23-Sep-11 8:02 
AnswerRe: 7x decompression speedup! Pin
cniculescu19-Apr-12 11:14
membercniculescu19-Apr-12 11:14 
GeneralMy vote of 5 Pin
stefan popa25-Jul-11 5:06
memberstefan popa25-Jul-11 5:06 
GeneralDecompress on a 64bit app Pin
razvar1-Feb-11 22:10
memberrazvar1-Feb-11 22:10 
GeneralRe: Decompress on a 64bit app Pin
cnjoin21-Aug-11 19:59
membercnjoin21-Aug-11 19:59 
Generalsorting!!!!! Pin
janmojzis25-Aug-10 13:33
memberjanmojzis25-Aug-10 13:33 
Questionhuffman(verilog) Pin
elec_engineer14-Feb-09 4:01
memberelec_engineer14-Feb-09 4:01 
AnswerRe: huffman(verilog) Pin
Hatem Mostafa8-Feb-11 23:09
memberHatem Mostafa8-Feb-11 23:09 
Generaldownloads not available Pin
Aamir Mustafa25-Dec-07 6:08
memberAamir Mustafa25-Dec-07 6:08 
GeneralHelp me !! Pin
fxnmri12-Jan-07 5:09
memberfxnmri12-Jan-07 5:09 
QuestionHow to compress many files in folder? Pin
Newcc11-Oct-06 18:14
memberNewcc11-Oct-06 18:14 
Questionhow to get the data entropy Pin
saproll3-Jun-06 4:08
membersaproll3-Jun-06 4:08 
GeneralRunning on PDA's Pin
mgmoss27-Jan-06 14:47
membermgmoss27-Jan-06 14:47 
GeneralRe: Running on PDA's Pin
HatemMostafa28-Jan-06 21:24
memberHatemMostafa28-Jan-06 21:24 
GeneralRe: Running on PDA's Pin
mgmoss30-Jan-06 15:53
membermgmoss30-Jan-06 15:53 
GeneralRe: Running on PDA's Pin
mgmoss30-Jan-06 17:27
membermgmoss30-Jan-06 17:27 
GeneralRe: Running on PDA's Pin
HatemMostafa30-Jan-06 21:26
memberHatemMostafa30-Jan-06 21:26 
GeneralIt is fast Pin
Yves25-Dec-05 18:55
memberYves25-Dec-05 18:55 
GeneralFFT data compresion Pin
Georgi Petrov4-Jun-05 5:47
memberGeorgi Petrov4-Jun-05 5:47 
GeneralHuffman in BC++V5.0 Pin
4020274526-Apr-05 4:55
suss4020274526-Apr-05 4:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.160721.1 | Last Updated 4 Jan 2005
Article Copyright 2004 by Hatem Mostafa
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid