Click here to Skip to main content
Licence GPL3
First Posted 9 Apr 2008
Views 44,759
Downloads 2,973
Bookmarked 37 times

Huffman compression class in C++

By | 9 Apr 2008 | Article
The article provides a dynamic Huffman compression and decompression class and a console application written in C++.

Introduction

I've written, some many years ago, a dynamic Huffman algorithm to compress and decompress data. It is mainly targeted to data with some symbols occurring more often than the rest (e.g., having some data file consisting of three different symbols, and their total number of occurrences in that file is s1(1000), s2(200), s3(30), so the total length of the file is 1000+200+30=1230 bytes; it is encoded, assigning one bit to s1 and two bits to s2, s3, so the encoded length is 1*1000+2*(200+30)=1460 bits=182 bytes). In the best case, a file consisting of just one symbol will be encoded with a compression ratio of 1:8. Huffman coding is used in image compression; however, in JPEG2000, an arithmetic codec is employed.

Background

The article intends to provide the code only, and is not a Huffman tutorial. You may dig for online tutorials on the subject.

Using the code

The console is straightforward to use to encode a source file to a Huffman compressed one:

>>huffman.exe e sourcefile destinationfile

or to decode a Huffman compressed source file back to the original file:

>>huffman.exe d sourcefile destinationfile

The Huffman class provides three functions to use:

  • void encode(unsigned char *dest, int &csize, unsigned char *sour, int usize) - to encode
  • void decode(unsigned char *dest, int &usize, unsigned char *sour) - to decode
  • int get_uncompressed_size(unsigned char *sour) - to get the uncompressed file size from Huffman encoded data file

dest and sour are the destination and source buffers, usize and csize are the uncompressed size of the original data and the compressed size of the Huffman encoded data. To encode, just provide your data in the sour buffer and specify its size in usize. The csize will be given the size of the compressed Huffman array you need to provide in the dest pointer. Make sure it is allocated at least the size of your original data. To decode, just provide in sour your Huffman encoded data, and query with get_uncompressed_size(sour) the size of the dest buffer you need to allocate. Upon decompression completion, the usize will be overwritten with the uncompressed data size.

Points of interest

I know storing data symbols and their frequencies into Huffman encoded files is space consuming, and that more compact representations are present.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

About the Author

Chesnokov Yuriy

Engineer

Russian Federation Russian Federation

Member

Former Cambridge University postdoc (http://www-ucc-old.ch.cam.ac.uk/research/yc274-research.html), Department of Chemistry, Unilever Centre for Molecular Informatics, where I worked on the problem of complexity analysis of cardiac data.
 
As a subsidiary result we achieved 1st place in the annual PhysioNet/Computers in Cardiology Challenge 2006: QT Interval Measurement (http://physionet.org/challenge/2006/)
 
My research intrests are: digital signal processing in medicine, image and video processing, pattern recognition, AI, computer vision.
 
My recent publications are:
 
Complexity and spectral analysis of the heart rate variability dynamics for distant prediction of paroxysmal atrial fibrillation with artificial intelligence methods. Artificial Intelligence in Medicine. 2008. V43/2. PP. 151-165 (http://dx.doi.org/10.1016/j.artmed.2008.03.009)
 
Face Detection C++ Library with Skin and Motion Analysis. Biometrics AIA 2007 TTS. 22 November 2007, Moscow, Russia. (http://www.dancom.ru/rus/AIA/2007TTS/ProgramAIA2007TTS.html)
 
Screening Patients with Paroxysmal Atrial Fibrillation (PAF) from Non-PAF Heart Rhythm Using HRV Data Analysis. Computers in Cardiology 2007. V. 34. PP. 459–463 (http://www.cinc.org/archives/2007/pdf/0459.pdf)
 
Distant Prediction of Paroxysmal Atrial Fibrillation Using HRV Data Analysis. Computers in Cardiology 2007. V. 34. PP. 455-459 (http://www.cinc.org/archives/2007/pdf/0455.pdf)
 
Individually Adaptable Automatic QT Detector. Computers in Cardiology 2006. V. 33. PP. 337-341 http://www.cinc.org/archives/2006/pdf/0337.pdf)

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralExcellent code ! Pinmembertnl19721:03 12 May '10  
GeneralBorlandC ?! [modified] PinmemberBakaBug4:01 13 Apr '09  
AnswerRe: BorlandC ?! PinmemberChesnokov Yuriy4:53 14 Apr '09  
GeneralRe: BorlandC ?! PinmemberBakaBug8:40 14 Apr '09  
GeneralDelete it !!! PinmemberHatem Mostafa21:28 14 Apr '08  
AnswerRe: Delete it !!! PinmvpChesnokov Yuriy17:57 15 Apr '08  
GeneralRe: Delete it !!! PinmemberKarstenK3:41 22 May '08  
AnswerRe: Delete it !!! PinmvpChesnokov Yuriy21:01 26 May '08  
http://www.codenet.ru/progr/alg/huffcode.php[^]
 
chesnokov

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web01 | 2.5.120604.1 | Last Updated 9 Apr 2008
Article Copyright 2008 by Chesnokov Yuriy
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid