65.9K
CodeProject is changing. Read more.
Home

Implementing the Huffman algorithm as a C# library

starIconstarIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIcon

2.59/5 (15 votes)

Sep 21, 2005

2 min read

viewsIcon

91634

downloadIcon

9074

Flexible HuffmanAlgorithm object, based on streams data forms.

Sample Image - Huffman_algorithm.jpg

Introduction

Huffman algorithm is quite simple (In theory at least), the idea is based to the fact that in
most files, some bytes(characters if you will)  probably appears more times them others.

Main steps:

- Scan the data source from the begining till the end, list in a table bytes that appears and
  how many times(that is their value in the table).

-Now we need to build some kind of tree(you'll get it later), take the 2 bytes that appeared
 less times in the data source than others, create a parent node to both of them,
 remove them from the list and add the parent node to the list instead(the parent value will 
 be the sum of times both his childs values). Continue this process until the list is completely
 empty. The last parent you've created is the root node of the tree.

-We will give each byte that was in the file different value, the number of right and   
 left turns when walking from the root of the tree to a leaf is the number of bits that we
 will use as a new value to that leaf(byte) we will say left turn = 0 right turn = 1
 (or vice versa). All we  left to do is to replace raech byte in several bits (left and right 
 urns) in most cases this should cost less space.

-Extracting is easier (Save the original table we made as the start, first of all before 
  archiving).  Read the table, rebuild the tree from the table, read the bits and start taking
  right and left turns down the tree root, when getting to a leaf, read the original byte,
  save it somewhere else and start over from the tree root, reading the next bit...

HuffmanAlgorithm object

-Uses huffman algorithm to extract\archive any types of data stream.
 Archived data contains info about the original data size, version, password and more.

-Each extracting\archiving function has vesion thats pops event handler each time  one 
 percent of the process is over.