Click here to Skip to main content
6,932,810 members and growing! (15,678 online)
Email Password   helpLost your password?
 
Development Lifecycle » Design and Architecture » Frameworks     Intermediate License: The Code Project Open License (CPOL)

Distributed Programming Framework

By Yang Yu

Build distributed programs without worry about distribution implementation.
C# (C#1.0, C#2.0, C#3.0), .NET (.NET2.0, .NET3.0, .NET3.5, .NET4.0), Architect, DBA, Dev, Design
Revision:8 (See All)
Posted:9 Sep 2009
Updated:1 Mar 2010
Views:6,997
Bookmarked:57 times
Unedited contribution
printPrint Friendly   add Share
      Discuss Discuss   Broken Article?Report  
14 votes for this article.
Popularity: 5.24 Rating: 4.57 out of 5

1

2
2 votes, 14.3%
3
1 vote, 7.1%
4
11 votes, 78.6%
5

Distributed Programming Framework - Part 1 (Abstract)

Introduction  

The robustness and scalability of Software depends heavily on the separation of procedural, logical, functional, and physical components, and also the seperation of tiers within each component.

This article will introduce a method for abstracting sequential code by identifying and managing functional seperations and constructing a framework for writing distributed computer programs without worry about its distribution and execution implementation.

Part 1 of the article will focus on the theories behind this framework, while Part 2 will focus on the implementation of the pattern. 

Background

It is probably a good idea to read about:

  • n-tiered architecture

  • distributed computing 

  • Event Driven Architecture

  • Service Oriented Architecture

  • Threading 

  • Procedural programming language (C#)

  • Distributed Shared Memory (DSM)

Why distribute Code Sequence? 

    Distribution of tasks requires a clear understanding from its micro level and also its macro level. The process of abstraction to form higher order tasks demands this understanding. By abstracting tasks, one can not only effectively manage the performance with introducing division of labour, but also introduce parallelism through identifying its inner associativities and dependencies.   

How? 

In the simplest form, we could use a simple Data Structure such as LinkedList<Action<I>> that stores a sequence of code blocks with the only relationship being their order of execution.  We could then execute this list of actions by iterating through that LinkedList from the first Action<I> to the last. Assuming there are no dependencies, this is the same as an ThreadPool. If this is procedural actions, then then it is exactly what procedural programming is. Trivial and not interesting.

A slightly more interesting case is to pipeline functions with a LinkedList<Func<I,O>> where the output of one particular function is pipelined to the next functions input. The precondition for this scenario is that all functions have single input parameter (Sort of like ML or F#).  Again, this case fails to leverage any kind of gains over the traditional approach.    

What do we really want? 

What we really want is to leverage a distributed architecture through the usage of Asynchronize execution, multithreaded execution, and parallel execution on multiple processors. As we all know, coding in these areas can be extremely difficult in all aspects. And Furthermore, being able to not worry about these architectures, and just write good code.

Abstracts  

To implement this concept, our Data Structure can no longer be a LinkedList, which implied the order of execution, but rather we need to explicitly define dependencies on each Function to gain more flexibility and management.  

Functional Dependency 

Dependencies among functions can be understood as this rule: Given functions A and B, A is related to B if and only if either A depends on B, or B depends on A.

We can write this relationship using sets as {A, B, C | dp(B,A) ^ dp(B,C)}. This means given a set of functions {A, B}, B depends on A and C.

 

Examples:

  1. The sequence of functions <A, B, C> can be viewed as a set of functions {A, B, C} such that B depends on A, and C depends on B. The fact that C depends on A, in this case, is implied. Also, Since A does not depend on any other function, it can be executed. 
  2. Given a set of functions {A, B, C} such that B depends on A, and C depends on A or B. In this case, C can be executed once A or B is executed.
  3.  Given a set of functions {A, B}.
    There are no dependencies which means that A and B can happen in parallel. The Precondition assumes that there are no side-effects.
  4. {A, B | dp(A,B) ^ dp(B,A)}.This program would never execute as a result.
  5. {Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ dp(Stop, Walk)}.
    The program will run as follows:
    Stand -> Walk -> Stop
    -> Look 
    (Note: Stop will run after Walk or Look depending on which finishes first.) 
  6. {Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ (dp(Stop, Walk) ^ dp(Stop, Look))}.
    The program will run as follows:
    Stand -> Walk
              -> Look
                                    -> Stop

The concept of dependencies between distinct functions is especially useful because it enables us to partition the functions to individual groups.

(Note: Keep in mind, that we only care about the procedural distribution, rather than the functional distribution. In our example 5 above, "Walk" could be a recursive functional routin that happens for 10 minutes.)

Procedural Groups

The definition of such a group is defined as: Procedural group G1 is a sequence of functions <x0, x1,...xn> such that x0 depends on yi where yi does not exist in G1 and xn is depends on xn-1.

Example:
  1. {Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ dp(Stop, Walk)}.
    G0 = <Stand, Walk, Stop>
    G1 = <Look>
    Stand -> Walk -> Stop
              -> Look
  2. {Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ (dp(Stop, Walk) ^ dp(Stop, Look))}.

    G0 = <Stand, Walk>
    G1 = <Look>
    G2 = <Stop>

    Stand -> Walk
              -> Look
                         -> Stop

By definition, Procedural Groups functions independently. Thus, allowing us to create seperate threads/processors/computers per group. This seperation translates into a distributed n-tiered architecture, executable simontainously to increase over all performance dramatically. Given we utilize the proper communication techniques.

Type Inferencing and Parameters

Communication and the transfering of data is a key element of any distributed system. To ensure the flow of information, we need to establish strict type checking rules. Similar to ML, the Type of function A is denoted as A:a'->b' where a' is the input parameter type and b' is the output type of the function A. In this case, since we do not know the inner works of A, just by looking at A, we cannot say anything about its input and output types. So {A, B | dp(B,A)} => A:a'->b' and B:b'-c'. In this case, the return type of A must be the same type as the input type of B.

Examples:
  1. {A, B, C | dp(C,A) ^ dp(C,B)} => A:a'->b' and B:c'->d' and C:(b',d')->e'
  2. {A, B, C | dp(C,A) v dp(C,B)} => A:a'->b' and B:c'->b' and C:b'->d'
    In this case, both A and B must have the same return type.
  3. {A, B, C | dp(A,B) ^ (dp(C, B) v dp(C,A)) } => B:a'->b' and A:b'->b' and C:b'->c'
    because C depends on B or A and A depends on B, A must have the same input and output type.

By ensuring types uniformity of dependent functions, we can now shift our focus on the communication and transport of information.

Communication

Many methods for transport exists today. COM+, MSQ, Remoting, webservices etc. all deal with Service Oriented Distributed Architecture, and Event Driven Architecture. However, all this is great but we still need to configure and implement it. Products such as BizTalk, Sql Server Enterprise enables distributed computing with load balencing, clustering, and pararel processing without a line of code. If only this type of technology was available for normal compilers, which can generate distributable code clusters that not only execute functional code, but can communicate and synchronize with one another depending on the number of servers, processors etc. We could tuilize COM+ and MSQ to takecare of the transport layer for us if we really wanted to but the point is to investigate it ourselves!

Here is the Architecture
(Event Driven Syncronize Functional Processing)
 

DistributedNetwork.png

  • Each distributed unit (processor, computer) will act as an Agent.
  • Agents are all connected to a Service Bus.
  • There is a Program Coordinator connected to the Service Bus
  • The Coordinator and Agents do not know about eachother, the only interface is with the Service Bus.   

Events  

Since we are using a event driven architecture, we will need to define some events for both the Coordinator, and the Agents.
Coordinator
  • Distribute_Group
    Message:  Group
    Description: The coordinators issues out the next group in the stack.

Agents

  • Request_Group
    Message: null
    Description: The Agent is not working on any groups so it sends out a group request.
  • Processed_FunctionMessage: FunctionDescription: An Agent finished processing of a Group. 

Execution Preperation 

Coordinator
  1. Each Function is Assigned a Unique ID.
    IDictionary<int,Function> GenerateID(IEnumerable<Function>)
  2. The Coordinator figures out the Procedural Groups of the Program.
    IEnumerable<Group> DistributeFunctions(IDictionary<int,Functions>)
  3. Each Group is Assigned a Unique ID.
    IDictionary<int,Group> GenerateID(IEnumerable<Group>)

Agents

No preperation required.

Execution  

CoordinatorCoordinatorStates.png
  1. Wait for Request_Group event.
  2. Select a Group from the Dictionary.
  3. Send out Distribute_Group event with the Group as its message.

Agents

AgentStates.png
 

  1. If Agent is Free
    1. Send out Request_Group event
    2. Wait for Distribute_Group event
    3. Intercept event
    4. Agent is now Processing Group Functions
  2. If Agent is Processing Group Functions
    1. For each Function f(x) in Group sequence
      1. For each dependency d(x) of f(x) 
        1. Wait For Processed_Function event of the d(x)
      2. Process f(x) with event message 
      3. Send out a Processed_Function event with the output of f(x) as the message.
    2. Agent is now free 

 

Deployment

The main assembly consisting of the set of functional and relationships is deployed on the Coordinator. Once deployed, the coordinator will takecare of the rest.

As more and more agents join the service bus, the faster the overall performance will become. 

Function Compilation

Inorder for the Agents to execute the Functions recieved from the Coordinator, each Group needs to be its own assembly. This way the Agents themselves not need to contain any code at the start, but rather recieve different code at runtime. We will explorer this in more depth in Part 2 of this article. 

Summary

Once we have a clear seperation of functions, its dependencies, procedural groups, communication, we can put together a program that actually works, distributively!

Part 1 of this article covers the abstract of the framework. The next part will cover its implementation.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Yang Yu


Member
He has extensive business experience in numerous industries including Automotive, Education, Insurance, Financial Services, and Telecommunication. He is the co-founder and CIO of Prognex Corp (www.prognex.com).

He has over 9 years of experience in IT involved in Business Analysis, Architecture, Implementation of SOA, OOAD Enterprise Systems using .NET 1/2/3, SSIS, BEA Web logic, BlackBerry MDS, SQL Server 2000/2005/2008, Oracle 8/9/10g, VB 6, etc. to construct Enterprise Level Web Applications, Windows Services, COM+ Services, Smart Client Windows Applications, Pocket PC, Black Berry Applications, and Data Warehouses that are Scalable, Reliable, Robust, and Secure.

His emphasis is towards Business Objectives, Security and Integrity of Assets, and Clients.

LinkedIn: http://www.linkedin.com/in/mryuyang
Twitter: http://twitter.com/mryangyu
Occupation: Architect
Company: Prognex Corp.
Location: Canada Canada

Other popular Design and Architecture articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 9 of 9 (Total in Forum: 9) (Refresh)FirstPrevNext
GeneralMy Vote of 4 - Good work PinmemberRajkumar-Kannan18:34 7 Feb '10  
GeneralRe: My Vote of 4 - Good work PinmemberYang Yu10:25 1 Mar '10  
GeneralRobustness ? PinmemberSpatlabor21:33 28 Sep '09  
GeneralRe: Robustness ? PinmemberYang Yu18:33 29 Sep '09  
GeneralRe: Robustness ? PinmemberSpatlabor2:22 30 Sep '09  
GeneralVery interesting Pinmemberstevenlauwers228:10 15 Sep '09  
GeneralRe: Very interesting PinmemberYang Yu10:16 15 Sep '09  
GeneralNice Article, looking forward to the implementation Pinmembertheperm2:19 15 Sep '09  
GeneralRe: Nice Article, looking forward to the implementation PinmemberYang Yu10:16 15 Sep '09  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+PgUp/PgDown to switch pages.

PermaLink | Privacy | Terms of Use
Last Updated: 1 Mar 2010
Editor:
Copyright 2009 by Yang Yu
Everything else Copyright © CodeProject, 1999-2010
Web18 | Advertise on the Code Project