Click here to Skip to main content
Click here to Skip to main content

Your Digital Fountain

By , 23 Aug 2012
 

Download source 



Introduction 

Implementation of Luby transform Code in C#, with Demo application in WPF and WCF.

Problem 

The traditional file transmission mechanism is as follows: 

  1.         A file is divided into equal length packets. 
  2.         Transmitter sends packets one by one.
  3.         Receiver acknowledges each received packet as well as lost ones.
  4.         Transmitter re-sends packets lost.

       
Now consider a case where a server is to distribute data to multiple receivers, the content can be operating system updates or media files, anything.

Using the traditional method, not only will the server have to keep a duplex communication with the clients, it will also have to maintain a list of which client got which packets, in order to transmit lost ones again.
       


Imagine you are a part of a NASA expedition on a mission to Mars, despite of the very high computing power, the connection to earth is very lossy, many of the packets sent to you are lost, and you cannot present any feedback of which packets to be re-transmitted.


Solution

In a digital fountain, it doesn't matter which packets are received and which are lost, it also doesn't matter the order of received packets.

The server transformers the data into unlimited number of encoded chunks, then the client can reassemble the data given enough number of chunks, regardless of which ones were get and which ones were missed.
        
That's why it was called a fountain, you don't care which drops you get from the fountain, as long as you get enough drops to fill your bottle.

How it works

Encoding

Decoding

 

Example  

Data

Split into k = 6 parts

 

One drop is generated by XORing parts 1, 2 and 3

 

Another drop is generated by XORing part 1, 5 and 6

Drop generation can be repeated over and over again, until necessary

Decode


This is the same as solving an equation system with k unknowns (data parts). Every drop is an equation of the system.

very well, so, if we know the file is divided into K blocks, then we need K drops (equations), right?

wrong! with only k drops, in the most favorable case, decoding is possible only 30% of times.

To increase the probability to decode, it is necessary to collect more than k drops, The number of excess drops is the overhead; the bigger the overhead, the higher the decoding probability.

Note that the overhead is independent of k: 20 excess drops to get 10-6 failure probability for any k (then better if K is large). 

Code Architecture  



The folder Algorithm is where the implementation logic is, MathHelper is used to solve the linear equations using Gauss–Jordan elimination, Server is a self-hosted WCF service exposing the algorithm, Client is the WPF application that receives the drops into a goblet then decodes it into original file.

Code Explanation

LubyTransform.Encode (The Server part)

At first there's the initialization, we split the data file into blocks.
Notic the variable degree, it determines the maximum number of blocks to be combined.
Here degree is equal to number of generated blocks divided by 2.

   
 public class Encode : IEncode
    {
        #region Member Variables

        readonly IList<byte[]> blocks;
        readonly int degree;
        readonly Random rand;
        readonly int fileSize;
        const int chunkSize = 2;

        #endregion

        #region Constructor

        public Encode(byte[] file)
        {
            rand = new Random();
            fileSize = file.Length;
            blocks = CreateBlocks(file);
            degree = blocks.Count() / 2;
            degree += 2;
        }

        #endregion


Then we have the Encode() method itself
Drop IEncode.Encode()
        {
            int[] selectedParts = GetSelectedParts();
            byte[] data;

            if (selectedParts.Count() > 1)
            {
                data = CreateDropData(selectedParts, blocks, chunkSize);
            }
            else
            {
                data = blocks[selectedParts[0]];
            }

            return new Drop { SelectedParts = selectedParts, Data = data };
        }

simple enough, we choose some blocks randomly (degree is our max number of blocks to be chosen) and combine them, generating the drop data. 

         private byte[] CreateDropData(IList<int> selectedParts, IList<byte[]> blocks, int chunkSize)
        {
            var data = new byte[chunkSize];

            for (int i = 0; i < chunkSize; i++)
            {
                data[i] = XOROperation(i, selectedParts, blocks);
            }

            return data;
        }

        private byte XOROperation(int idx, IList<int> selectedParts, IList<byte[]> blocks)
        {
            var selectedBlock = blocks[selectedParts[0]];
            byte result = selectedBlock[idx];

            for (int i = 1; i < selectedParts.Count; i++)
            {
                result ^= blocks[selectedParts[i]][idx];
            }

            return result;
        } 

The combination is done by XORing the parts together, Note that the number of blocks chosen is different each time a Drop is requested. Each Drop contains the data generated from combination, along with a list of which blocks were combined. 

    public class Drop
    {
        public IList<int> SelectedParts { get; set; }
        public byte[] Data { get; set; }
    }

LubyTransform.Decode (The Client part, putting it all together)

The first thing we do is collect drops, we need K+overhead drops
private string ReceiveMessage()
        {
            var blocksCount = encodeServiceClient.GetNumberOfBlocks();
            var fileSize = encodeServiceClient.GetFileSize();
            var chunkSize = encodeServiceClient.GetChunkSize();
            IList<Drop> goblet = new List<Drop>();

            for (int i = 0; i < blocksCount + overHead; i++)
            {
                var drop = encodeServiceClient.Encode();
                goblet.Add(drop);
            }

            var fileData = _decoder.Decode(goblet, blocksCount, chunkSize, fileSize);
            return Encoding.ASCII.GetString(fileData);
        }
 

From these drops we construct an Augmented Matrix, to be solved by Gauss–Jordan elimination

        byte[] IDecode.Decode(IList<Drop> goblet, int blocksCount, int chunkSize, int fileSize)
        {
            var matrix = BuildMatrix(goblet, blocksCount, chunkSize);
            matrixSolver.Solve(matrix);
            int columnsCount = matrix.GetLength(1);
            byte[] result = new byte[fileSize];

            for (int i = 0; i < result.Length; i++)
            {
                result[i] = (byte)matrix[i, columnsCount - 1];
            }

            return result;
        } 

and that's it, the file is reconstructed.

How to Use the application

  1. Start visual studio under administrator privilege.
  2. Set Server.Host as the startup project, make sure its up and running.
  3. Start a new instance of the Client.Receiver project (right click project-->Debug-->start new instance).
  4. Press the Start Receiving button, and you should get the message that was stored on the server.

Points of Interest

  • The XORing of blocks is a really expensive process, note the CreateDropData() function, each time a drop is requested, we are looping ChunkSize times. In this particular demo I've set the ChunkSize to 2, but in a real life application it will be no less than 4 Kb, maybe 16 Kb even, that will take an imaginable time! 
  • The Gauss-Jordan algorithm executes in O(mnm), that's roughly O(n3). 

Conclusion

As always, I hope I've delivered a clear explanation. If you have a comment\question about the implementation\Code structure please feel free to contact me ;]

Charts were created using http://zwibbler.com/

References  

History

  • 24 Aug 2012 Followed Paulo Zemek's advice of using IList instead of IEnumerable.
  • 21 Jul 2012  Initial Release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Omar Gameel Salem
Software Developer
Egypt Egypt
Member
Enthusiastic programmer/researcher, passionate to learn new technologies, interested in problem solving,data structures, algorithms and automation.
 
If you have a question\suggestion about one of my articles, or you want an algorithm implemented in C#, feel free to contact me.
 
Résumé
 
vWorker Account

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberChristoph Keller10 Jan '13 - 2:08 
Thanks for sharing this! Very interesting!
 
Greetings,
Chris
GeneralMy vote of 5memberChristian Amado24 Aug '12 - 16:48 
5
GeneralMy vote of 5memberNicolas Gordillo24 Aug '12 - 4:51 
Very clear. Thanks!
GeneralMy vote of 5memberYusuf Uzun16 Aug '12 - 1:06 
Hi, first of all very nice article, congratulations. I have a question about your encoding diagram. What is the idea under the "careful" pseudo random selection section. I have some thoughts about the algorithm's efficiency.
 
And again congratulations...
GeneralMy vote of 5memberMihai MOGA8 Aug '12 - 5:26 
This is a great inspiring article. I am pretty much pleased with your good work. You put really very helpful information. Keep it up once again.
QuestionDecoding and failure ratiomemberGalatei24 Jul '12 - 10:32 
Hi, you wrote that with 'k' number of packets, it's only like 30% of success rate when decoding. It's not true for systematic algorithms like Raptor/RaptorQ where possibility of failure is 1/mil with only k+2, and 1/10k with k+1, and 1% of failure with k number of packets received (Reference here - section 5.8[^]).
 
Are you saying that your algorithm is 30% effective with k number of packets ?
 
Good job
AnswerRe: Decoding and failure ratiomemberOmar Gamil24 Jul '12 - 23:18 
Galatei wrote:
It's not true for systematic algorithms like Raptor/RaptorQ

 
That's true, but this is the implementation of Luby transform code[^]
 
Other algorithms, (Raptor, Tornado) have different failure ratios.
 
Thank you ;]
QuestionAre these references for this processesmemberMicroImaging23 Jul '12 - 18:49 
I went looking for more theoretical references for this algorithms and came across these.

[^]
 
http://www.cs.bu.edu/fac/byers/courses/559/F07/SCRIBE/scribe-notes-9-17.pdf[^]
 
http://ftp.cs.ucla.edu/tech-report/2004-reports/040005.pdf[^]
 
Are these references good ?
Do you have better references ?
Thank yo so much for sharing your work.
AnswerRe: Are these references for this processesmemberOmar Gamil23 Jul '12 - 22:12 
I don't have better references, but of course you can Google fountain code you'll get more results.
 
Thank you for reading.
GeneralPerhaps a comparison would be in order...memberAndrew Rissing23 Jul '12 - 8:25 
The algorithm you presented definitely shines in a error prone communication environment, but perhaps it might be nice to better characterize the strengths of this sort of algorithm.
 
Perhaps, perform a comparison of a more straight forward approach with the fountain approach. Such as, just resending each data packet N number of times.
 
Using your image as an example...If you had a transmission failure rate of 50% and wanted less than a 90% success rate on your image being received, you would have to resend each section at least 4 times. With 6 sections, that would mean at least 24 packets. Can you work through an example using the prior with the fountain algorithm to show its strengths in this sort of scenario? Doing so, might make your article a bit stronger in explaining why this algorithm is useful to be aware of.
 
As an aside, how do you ensure that you have provided enough coverage of each data packet, since the generation of each packet is random. It would seem like you'd try to ensure at least each packet is used X number of times for creating a packet. Otherwise, you might send the first packet un-encoded once and never XOR it to another packet. If that one packet is lost, the overall transmission fails. It would seem like you'd want to ensure each packet is chosen so many times and then you can add the randomness to mix up which packets it gets chosen to be merged with.
 
Thoughts?
GeneralRe: Perhaps a comparison would be in order...memberOmar Gamil23 Jul '12 - 12:14 
you raised a few interesting points, let me take them down one by one
 

Andrew Rissing wrote:
Such as, just resending each data packet N number of times.

 
That was implied in the problems of the standard transfer model, in the distributor\consumer example, the server will have to send every packet N number of times to K clients, whilst with this technique every client is self aware of his own packets and requests on a need only basis.
 

Andrew Rissing wrote:
Using your image as an example...If you had a transmission failure rate of 50% and wanted less than a 90% success rate on your image being received, you would have to resend each section at least 4 times. With 6 sections, that would mean at least 24 packets. Can you work through an example using the prior with the fountain algorithm to show its strengths in this sort of scenario? Doing so, might make your article a bit stronger in explaining why this algorithm is useful to be aware of.

 
There's redundancy in two methods, no doubt, but consider this, in traditional approach (sending each packet N times),somehow we still didn't get the packet..
now what? how do i tell the server to resend? (remember that's a one way communication).
 
fountain code overcomes this, it can theoretically generate infinite droplets.
 

Andrew Rissing wrote:
As an aside, how do you ensure that you have provided enough coverage of each data packet, since the generation of each packet is random. It would seem like you'd try to ensure at least each packet is used X number of times for creating a packet. Otherwise, you might send the first packet un-encoded once and never XOR it to another packet. If that one packet is lost, the overall transmission fails. It would seem like you'd want to ensure each packet is chosen so many times and then you can add the randomness to mix up which packets it gets chosen to be merged with.

 
I can't ensure, that's the point, the algorithm never sends a specific package, but through careful random selection, and enough iterations, I should get all the pieces of the puzzle..theory of averages.
GeneralRe: Perhaps a comparison would be in order...memberAndrew Rissing23 Jul '12 - 15:31 
I think you didn't understand my initial counter algorithm. I was stating that you should contrast it with another algorithm. The algorithm I gave would be relatively close to the behavior of the fountain algorithm. You don't expect a response back from those listening. You just keep resending the packets over and over sequentially, until, based on probability, you can ensure that you have a high likelihood of success. The fountain algorithm doesn't ensure success, but just predicts it to some degree based on how many packets you send.
 
Furthermore, the point about not necessarily being sure of sending enough of any single packet still hasn't been properly answered. Are you talking about producing more than N*N number of packets (where N is the number of discrete data packets), such that you be able to pretty much guarantee each packet gets used at least 1-2 times? If so, the algorithm I spoke about earlier starts to look more appealing. It is by far simpler to implement and understand, so what makes the fountain algorithm stand out?
 
If you can answer that, then I think you've accomplish something more than just describing an algorithm but making people aware of how and when to use it.
GeneralRe: Perhaps a comparison would be in order...memberOmar Gamil23 Jul '12 - 22:10 
okay here's your comparison
 
lets say we have 1000 packets we wish to send
 
using your method:
Andrew Rissing wrote:
keep resending the packets over and over sequentially

lets say we will only send 2 times.
Thats 2000 packets sent.
 
using fountain code:
we will send the packets+overhead
that 1000+20=1020
GeneralRe: Perhaps a comparison would be in order...memberAndrew Rissing24 Jul '12 - 3:51 
Gotcha, now could elaborate on that and put it into the article. Big Grin | :-D
GeneralMy vote of 4memberSouthmountain21 Jul '12 - 10:38 
thanks for sharing
GeneralXOROperation... 2 problemsmvpPaulo Zemek21 Jul '12 - 5:07 
In XOROperation I see two problems:
1) You start from 1 and you use less than count. Shouldn't you use 0 as the start?
2) You are using ElementAt over an IEnumerable. That is, to get the item 99 it will go from the 0 (first) until 99. Then, to get the 100 it will go from the 0 again up to 100 (ok, there are some optimizations for array, I guess).
But, you can do better if:
2.a) You use an ICollection and use the list[index] getter or;
2.b) You simple set the index to 0 and do a foreach on the enumerable and, at each iteration, increment the index by one. That way you will only iterate the collection once.
GeneralRe: XOROperation... 2 problemsmemberOmar Gamil21 Jul '12 - 5:29 
First issue: I set the return result to very first block.
var selectedBlock = blocks.ElementAt(selectedParts[0]);
 
Then I loop the rest of blocks, no need to start from zero, we already have it in selectedBlock variable
 
Second issue: You are absolutely right, using ICollection or IList is better.
 
Thank you Paulo for reading and feedback Wink | ;)
GeneralRe: XOROperation... 2 problemsmvpPaulo Zemek21 Jul '12 - 5:48 
You are right about the one... you start with a normal item, then you XOR the others. Sorry, my bad for the 1.
And I am glad you liked my solution for the 2. Smile | :)
SuggestionCode blocksmvpMeshack Musundi20 Jul '12 - 22:46 
Hi Omar,
 
Please enclose your code blocks in <pre> tags eg.
<pre lang="cs">
  public class Drop
  {
    public IEnumerable<int> SelectedParts {get; set;}
    public byte[] Data {get; set;}
  }
</pre>
"As beings of finite lifespan, our contributions to the sum of human knowledge is one of the greatest endeavors we can undertake and one of the defining characteristics of humanity itself"

GeneralMy vote of 5memberHexxellor20 Jul '12 - 20:13 
Thank you for taking the time to write this up and introduce me to this intriguing algorithm!
GeneralMy vote of 5memberShaunak De20 Jul '12 - 19:28 
Thank you for this article, its well written, well illustrated, clear and concise.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 24 Aug 2012
Article Copyright 2012 by Omar Gameel Salem
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid