Huffman algorithm is quite simple (In theory at least), the idea is based to the fact that in
most files, some bytes(characters if you will) probably appears more times them others.
- Scan the data source from the begining till the end, list in a table bytes that appears and
how many times(that is their value in the table).
-Now we need to build some kind of tree(you'll get it later), take the 2 bytes that appeared
less times in the data source than others, create a parent node to both of them,
remove them from the list and add the parent node to the list instead(the parent value will
be the sum of times both his childs values). Continue this process until the list is completely
empty. The last parent you've created is the root node of the tree.
-We will give each byte that was in the file different value, the number of right and
left turns when walking from the root of the tree to a leaf is the number of bits that we
will use as a new value to that leaf(byte) we will say left turn = 0 right turn = 1
(or vice versa). All we left to do is to replace raech byte in several bits (left and right
urns) in most cases this should cost less space.
-Extracting is easier (Save the original table we made as the start, first of all before
archiving). Read the table, rebuild the tree from the table, read the bits and start taking
right and left turns down the tree root, when getting to a leaf, read the original byte,
save it somewhere else and start over from the tree root, reading the next bit...
-Uses huffman algorithm to extract\archive any types of data stream.
Archived data contains info about the original data size, version, password and more.
-Each extracting\archiving function has vesion thats pops event handler each time one
percent of the process is over.
Born and raised in Israel, I've caught the programming virus at the age of 15.
Since than I can't stop coding.