# How to compress and decompress a file using Huffman Algorithm

By , 14 Jun 2007

## Introduction

Compressing data is a very important issue when talking about handling data through digital technology, the purpose is to present the same data with the smallest size possible. However, there isn't a magical way which fits all cases meaning this can be done in many ways. It depends on our requirements and the way we store, transmit and check the data.

In my project I suggest one method which takes advantage of the fact that there are groups of files (I call them family) that are similar to each other in context.

## Family of files

In many cases we handle many files that came from the same source; for example two or more files with little differences like forms, templates or just some successive frames in video stream.

Instead of storing all the files independently we can find the shared part, isolate it from all the members of the family then link them back to it. In other words we store one shared copy and for every member and just the difference.

## But where is the catch?

Nothing new! But I tried to show how to use some techniques we learnt in class and implement that concept. Here is the flow:

1. Given file `f` and family `F = (f<sub>1</sub>,....,f<sub>n</sub>)` where `F` lies in our database.
2. Calculate the entropy of `f:H(f): `
3. For each couple `(f,f<sub>i</sub>) : H(f,f<sub>i</sub>)`, calculate the entropy value of `f X f<sub>i</sub>:H(f,f<sub>i</sub>)`
4. If `H(f) < H(f,f<sub>i</sub>) i = 1 .... n`, then compress `f` and finish.
Else compress `f X f<sub>k</sub> `where `f X f<sub>k </sub>= min{H(f,f<sub>i</sub>) i = 1 .... n} `and link it with `f<sub>k</sub>`.

(*) Compressing files is performed using Huffman's methods.

## Examples and Results

Given the following images (binary .bmp files)

 Original file Family file (the twin) * * * * . . . .

We got the following sizes (in Bytes):

 Original file 67014 Compression of the original file 12290 Compression of the Xor result of the above files 8377

That's means we got just 12.5% file size after compression using the twin file and 18.3% when using only compression. In other cases we got even better results. (see the download at the top of this article for further examples)

## Using the code

The code is written in C# under Visual Studio 2005. It contains the following main parts:

### class Decoding

The input is an array of bytes of the original file. The operations are:

• Calculating the entropy
• Calculating the entropy given another array of bytes (I called it twin but it should be a file from family)
• `Decoding`: decode the data using Huffman method

### class Encoding

The input is `DecodedInfo `(data after compression). The operations are:

• `Encoding `data by reconstructing the original file
• Saving the result

### class DecodedInfo

This is the structure of the compressed file. It contains the following fields:

• `decodedData`: array of bytes containing the result of compression
• `codeMap`: variable code which calculates using the Huffman's tree
• `totalBits`: size of the result
• `twinFileName`: the name of the file which is used to calculate the xor result (it's the file that give the lowest entropy)
• `originalFileName`: just the name of the original file I use it when saving the result

### class Hamming

The input is a histogram of the bytes in the original file. Using this information, we can build the tree which allow us to calculate the variable length code of the compressed data. This class is simply an implementation of the algorithm we learnt in class.

A list of licenses authors might use can be found here

 Hesham Yassin Instructor/Trainer Israel Member
Hesham Yassin is a C#,Assembler, C and VB programmer. He started programming since 2004. The first language he learn was C. between 2006 to 2007, He studied Electonics and built a wireless control device which activate a media player.

Votes of 3 or less require a comment

 Search this forum Profile popups    Spacing RelaxedCompactTight   Noise Very HighHighMediumLowVery Low   Layout Open AllThread ViewNo JavascriptPreview   Per page 102550
 First Prev Next
 Compressing Big Image Files Sachin Dubey 1 Oct '08 - 2:25
 Hi, I want to save image in database but image having big size are occupying lot of space.........so i have to compress it and then save it....i have to compress it upto 90%........   pls help and suggest me.........   thnx in advance Sign In·View Thread·Permalink
 Last Visit: 31 Dec '99 - 18:00     Last Update: 19 May '13 - 19:14 Refresh 1