Click here to Skip to main content
Click here to Skip to main content
Go to top

Simple and fast Huffman coding

, 3 Jan 2005
Rate this:
Please Sign up or sign in to vote.
An article on fast Huffman coding technique.

Sample Image

Introduction

This article describes the simplest and fastest Huffman code you can find in the net, not using any external library like STL or components, just using simple C functions like: memset, memmove, qsort, malloc, realloc, and memcpy.

So, any one will find it easy to understand the code or even to modify it.

Background

Huffman compression is a lossless compression algorithm that is ideal for compressing text or program files. Huffman compression belongs into a family of algorithms with a variable codeword length. That means that individual symbols (characters in a text file, for instance) are replaced by bit sequences that have a distinct length. So, symbols that occur a lot in a file are given a short sequence while others that are used seldom get a longer bit sequence.

Using the code

I wrote the code as simple C functions to make it easy to use anywhere. You can put them in a class or just use the function. And I made it in a simple format, just in and out buffer, no input or output files like in many articles.

    bool CompressHuffman(BYTE *pSrc, int nSrcLen, BYTE *&pDes, int &nDesLen);
    bool DecompressHuffman(BYTE *pSrc, int nSrcLen, BYTE *&pDes, int &nDesLen);

Points of Interest

Speed

I spent a long time to make it (huffman.cpp) so fast and at the same time I don't use any libraries like STL or MFC. It compresses 16 MB in less than 1 sec (PIII processor, 1G speed).

Compression

The compression code is so simple, it starts by initializing 511 of Huffman nodes by its ASCII values:

    CHuffmanNode nodes[511];
    for(int nCount = 0; nCount < 256; nCount++)
        nodes[nCount].byAscii = nCount;

Then, it calculates each ASCII frequency in the input buffer:

    for(nCount = 0; nCount < nSrcLen; nCount++)
        nodes[pSrc[nCount]].nFrequency++;

Then, it sorts ASCII characters depending on frequency:

    qsort(nodes, 256, sizeof(CHuffmanNode), frequencyCompare);

Now, it constructs Huffman tree, to get each ASCII code bit that will be replaced in the output buffer:

    int nNodeCount = GetHuffmanTree(nodes);

Constructing Huffman tree is so simple; it is just putting all nodes in a queue, and replacing the two lowest frequency nodes with one node that has the sum of their frequencies so that this new node will be the parent of these two nodes. And do this step till the queue just contains one node (tree root).

    // parent node
    pNode = &nodes[nParentNode++];
    // pop first child
    pNode->pLeft = PopNode(pNodes, nBackNode--, false);
    // pop second child
    pNode->pRight = PopNode(pNodes, nBackNode--, true);
    // adjust parent of the two poped nodes
    pNode->pLeft->pParent = pNode->pRight->pParent = pNode;
    // adjust parent frequency
    pNode->nFrequency = pNode->pLeft->nFrequency + pNode->pRight->nFrequency;

I made a good trick here to avoid using any queue module. I knew before that ASCII is 256 characters, but I allocated 511 (CHuffmanNode nodes[511];), and made use of the 255 nodes after the first 256 to be used as the parents in the Huffman tree. And just use an array of points (CHuffmanNode* pNodes[256]) to point to these nodes at the time of constructing the tree. Also, use two variables to handle queue indexing (int nParentNode = nNodeCount, nBackNode = nNodeCount-1;).

Then, the final step in the compression is to write each ASCII code in the output buffer:

    int nDesIndex = 0;
    // loop to write codes
    for(nCount = 0; nCount < nSrcLen; nCount++)
    {
        *(DWORD*)(pDesPtr+(nDesIndex>>3)) |= 
                  nodes[pSrc[nCount]].dwCode << (nDesIndex&7);
        nDesIndex += nodes[pSrc[nCount]].nCodeLength;
    }
  • (nDesIndex>>3): >>3 to divide by 8 to reach the right byte to start with.
  • (nDesIndex&7): &7 to get the remainder of dividing by 8, to get the start bit.

Note: at the compressed buffer, we should save Huffman tree nodes with its frequencies so we can construct Huffman tree again at the time of decompression (just the ASCIIs that have a frequency).

Decompression

The decompression is more easier as we just need to construct Huffman tree, then loop in the input buffer to replace each code with its ASCII. Just remember that the input buffer, in this case, is a stream of bits that contain the codes of each ASCII. So, to replace the code with the ASCII, we need to iterate Huffman tree with the bit stream till we find a leaf. Then, we can append its ASCII at the output buffer:

    int nDesIndex = 0;
    DWORD nCode;
    while(nDesIndex < nDesLen)
    {
        nCode = (*(DWORD*)(pSrc+(nSrcIndex>>3)))>>(nSrcIndex&7);
        pNode = pRoot;
        while(pNode->pLeft)
        {
            pNode = (nCode&1) ? pNode->pRight : pNode->pLeft;
            nCode >>= 1;
            nSrcIndex++;
        }
        pDes[nDesIndex++] = pNode->byAscii;
    }
  • (nSrcIndex>>3): >>3 to divide by 8 to reach the right byte to start with.
  • (nSrcIndex&7): &7 to get the remainder of dividing by 8, to get the start bit.

Source code files

  • huffman.cpp,
  • huffman.h.

Updates

  • 1-1-2005:
    • Increasing compression speed by replacing:
          dwCode = nodes[pSrc[nCount]].dwCode;
          nCodeLength = nodes[pSrc[nCount]].nCodeLength;
          while(nCodeLength--)
          {
              if(dwCode&1)
                  SetBit(pDesPtr, nDesIndex);
              dwCode >>= 1, nDesIndex++;
          }

      with

          *(DWORD*)(pDesPtr+(nDesIndex>>3)) |= 
                nodes[pSrc[nCount]].dwCode << (nDesIndex&7);

      at the compression function, so I avoided the use of SetBit macro which was slow to be done in a loop.

    • Increasing decompression speed by replacing:
          while(pNode->pLeft)    // if node has pLeft then it must has pRight
          {
              // node not leaf
              pNode = GetBit(pSrc, nSrcIndex) ? pNode->pRight : pNode->pLeft;
              nSrcIndex++;
          }

      with

          nCode = (*(DWORD*)(pSrc+(nSrcIndex>>3)))>>(nSrcIndex&7);
          while(pNode->pLeft)    // if node has pLeft then it must has pRight
          {
              // node not leaf
              pNode = (nCode&1) ? pNode->pRight : pNode->pLeft;
              nCode >>= 1;
              nSrcIndex++;
          }

      at the decompression function, so I avoided the use of GetBit macro which was slow to be done in a loop.

Thanks to...

I owe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author


Comments and Discussions

 
GeneralGood Job Pinmemberehab_nour9-May-14 7:45 
QuestionFix buffer overrun crash during realloc on any uncompressible input (like random data or 0-length input) Pinmembercniculescu19-Apr-12 10:58 
AnswerRe: Fix buffer overrun crash during realloc on any uncompressible input (like random data or 0-length input) PinmemberMember 417463925-Mar-14 9:00 
Question7x decompression speedup! PinmemberIvanovaIsGod23-Sep-11 8:02 
AnswerRe: 7x decompression speedup! [modified] Pinmembercniculescu19-Apr-12 11:14 
GeneralMy vote of 5 Pinmemberstefan popa25-Jul-11 5:06 
GeneralDecompress on a 64bit app Pinmemberrazvar1-Feb-11 22:10 
GeneralRe: Decompress on a 64bit app Pinmembercnjoin21-Aug-11 19:59 
Generalsorting!!!!! Pinmemberjanmojzis25-Aug-10 13:33 
Questionhuffman(verilog) Pinmemberelec_engineer14-Feb-09 4:01 
AnswerRe: huffman(verilog) PinmemberHatem Mostafa8-Feb-11 23:09 
Generaldownloads not available PinmemberAamir Mustafa25-Dec-07 6:08 
GeneralHelp me !! Pinmemberfxnmri12-Jan-07 5:09 
QuestionHow to compress many files in folder? PinmemberNewcc11-Oct-06 18:14 
Questionhow to get the data entropy Pinmembersaproll3-Jun-06 4:08 
GeneralRunning on PDA's Pinmembermgmoss27-Jan-06 14:47 
GeneralRe: Running on PDA's PinmemberHatemMostafa28-Jan-06 21:24 
GeneralRe: Running on PDA's Pinmembermgmoss30-Jan-06 15:53 
GeneralRe: Running on PDA's Pinmembermgmoss30-Jan-06 17:27 
GeneralRe: Running on PDA's PinmemberHatemMostafa30-Jan-06 21:26 
GeneralIt is fast PinmemberYves25-Dec-05 18:55 
GeneralFFT data compresion PinmemberGeorgi Petrov4-Jun-05 5:47 
GeneralHuffman in BC++V5.0 Pinsuss4020274526-Apr-05 4:55 
GeneralRe: Huffman in BC++V5.0 PinmemberHatemMostafa26-Apr-05 20:15 
GeneralCould not link. PinmemberWREY11-Dec-04 12:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140921.1 | Last Updated 4 Jan 2005
Article Copyright 2004 by Hatem Mostafa
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid