Distributed Programming Framework - Part 1 (Abstract)

Yang Yu

Rate me:

4.93/5 (17 votes)

1 Mar 2010CPOL8 min read

49.9K

Build distributed programs without worrying about distribution implementation

Introduction

The robustness and scalability of Software depends heavily on the separation of procedural, logical, functional, and physical components, and also the separation of tiers within each component.

This article will introduce a method for abstracting sequential code by identifying and managing functional separations and constructing a framework for writing distributed computer programs without worrying about its distribution and execution implementation.

Part 1 of the article will focus on the theories behind this framework, while Part 2 will focus on the implementation of the pattern.

Background

It is probably a good idea to read about:

n-tiered architecture
Distributed computing
Event Driven Architecture
Service Oriented Architecture
Threading
Procedural programming language (C#)
Distributed Shared Memory (DSM)

Why Distribute Code Sequence?

Distribution of tasks requires a clear understanding from its micro level and also its macro level. The process of abstraction to form higher order tasks demands this understanding. By abstracting tasks, one can not only effectively manage the performance with introducing division of labour, but also introduce parallelism through identifying its inner associativities and dependencies.

http://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Distributed-parallel.svg/260px-Distributed-parallel.svg.png

How?

In the simplest form, we could use a simple Data Structure such as LinkedList<Action<I>> that stores a sequence of code blocks with the only relationship being their order of execution. We could then execute this list of actions by iterating through that LinkedList from the first Action<I> to the last. Assuming there are no dependencies, this is the same as an ThreadPool. If this is procedural action, then it is exactly what procedural programming is. Trivial and not interesting.

A slightly more interesting case is to pipeline functions with a LinkedList<Func<I,O>> where the output of one particular function is pipelined to the next functions input. The precondition for this scenario is that all functions have single input parameter (sort of like ML or F#). Again, this case fails to leverage any kind of gains over the traditional approach.

What Do We Really Want?

What we really want is to leverage a distributed architecture through the usage of Asynchronize execution, multithreaded execution, and parallel execution on multiple processors. As we all know, coding in these areas can be extremely difficult in all aspects. And furthermore, being able to not worry about these architectures, and just write good code.

Abstracts

To implement this concept, our Data Structure can no longer be a LinkedList, which implied the order of execution, but rather we need to explicitly define dependencies on each Function to gain more flexibility and management.

Functional Dependency

Dependencies among functions can be understood as this rule: Given functions A and B, A is related to B if and only if either A depends on B, or B depends on A.

We can write this relationship using sets as {A, B, C | dp(B,A) ^ dp(B,C)}. This means given a set of functions {A, B}, B depends on A and C.

Examples

The sequence of functions <A, B, C> can be viewed as a set of functions {A, B, C} such that B depends on A, and C depends on B. The fact that C depends on A, in this case, is implied. Also, Since A does not depend on any other function, it can be executed.
Given a set of functions {A, B, C} such that B depends on A, and C depends on A or B. In this case, C can be executed once A or B is executed.
Given a set of functions {A, B}.
There are no dependencies which means that A and B can happen in parallel. The Precondition assumes that there are no side-effects.
{A, B | dp(A,B) ^ dp(B,A)}. This program would never execute as a result.
{Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ dp(Stop, Walk)}.
The program will run as follows:

Stand -> Walk -> Stop
-> Look
(Note: Stop will run after Walk or Look depending on which finishes first.)
{Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ (dp(Stop, Walk) ^ dp(Stop, Look))}.
The program will run as follows:

Stand -> Walk
-> Look
-> Stop

The concept of dependencies between distinct functions is especially useful because it enables us to partition the functions to individual groups.

(Note: Keep in mind that we only care about the procedural distribution, rather than the functional distribution. In our example 5 above, "Walk" could be a recursive functional routine that happens for 10 minutes.)

Procedural Groups

The definition of such a group is defined as: Procedural group G₁ is a sequence of functions <x₀, x₁,...x_n> such that x₀ depends on y_i where y_i does not exist in G₁ and x_n is depends on x_n-1.

Example:

{Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ dp(Stop, Walk)}.
G₀ = <Stand, Walk, Stop>
G₁ = <Look>

Stand -> Walk -> Stop
-> Look
{Stand, Walk, Look, Stop | dp(Walk,Stand) ^ (dp(Look,Stand) v dp(Look, Walk)) ^ (dp(Stop, Walk) ^ dp(Stop, Look))}.
G₀ = <Stand, Walk>
G₁= <Look>
G₂ = <Stop>

Stand -> Walk
-> Look
-> Stop

By definition, Procedural Groups function independently thus allowing us to create separate threads/processors/computers per group. This separation translates into a distributed n-tiered architecture, executable simultaneously to increase overall performance dramatically. Given we utilize the proper communication techniques.

Type Inferencing and Parameters

Communication and the transferring of data is a key element of any distributed system. To ensure the flow of information, we need to establish strict type checking rules. Similar to ML, the Type of function A is denoted as A:a'->b' where a' is the input parameter type and b' is the output type of the function A. In this case, since we do not know the inner works of A, just by looking at A, we cannot say anything about its input and output types. So {A, B | dp(B,A)} => A:a'->b' and B:b'-c'. In this case, the return type of A must be the same type as the input type of B.

Examples:

{A, B, C | dp(C,A) ^ dp(C,B)} => A:a'->b' and B:c'->d' and C:(b',d')->e'
{A, B, C | dp(C,A) v dp(C,B)} => A:a'->b' and B:c'->b' and C:b'->d'
In this case, both A and B must have the same return type.
{A, B, C | dp(A,B) ^ (dp(C, B) v dp(C,A)) } => B:a'->b' and A:b'->b' and C:b'->c'
because C depends on B or A and A depends on B, A must have the same input and output type.

By ensuring types uniformity of dependent functions, we can now shift our focus on the communication and transport of information.

Communication

Many methods for transport exists today. COM+, MSQ, Remoting, webservices, etc. all deal with Service Oriented Distributed Architecture, and Event Driven Architecture. However, all this is great but we still need to configure and implement it. Products such as BizTalk, SQL Server Enterprise enable distributed computing with load balancing, clustering, and pararel processing without a line of code. If only this type of technology was available for normal compilers, which can generate distributable code clusters that not only execute functional code, but can communicate and synchronize with one another depending on the number of servers, processors, etc. We could utilize COM+ and MSQ to take care of the transport layer for us if we really wanted to but the point is to investigate it ourselves!

Here is the Architecture
(Event Driven Synchronize Functional Processing)

Each distributed unit (processor, computer) will act as an Agent.
Agents are all connected to a Service Bus.
There is a Program Coordinator connected to the Service Bus.
The Coordinator and Agents do not know about each other, the only interface is with the Service Bus.

Events

Since we are using an event driven architecture, we will need to define some events for both the Coordinator, and the Agents.

Coordinator

Distribute_Group

Message: Group
Description: The coordinators issues out the next group in the stack.

Agents

Request_Group

Message: null
Description: The Agent is not working on any groups so it sends out a group request.

Processed_Function

Message: Function
Description: An Agent finished processing of a Group.

Execution Preparation

Coordinator

Each Function is assigned a Unique ID.

IDictionary<int,Function> GenerateID(IEnumerable<Function>)

The Coordinator figures out the Procedural Groups of the Program.
C#
```
IEnumerable<Group> DistributeFunctions(IDictionary<int,Functions>)
```

Each Group is Assigned a Unique ID.

IDictionary<int,Group> GenerateID(IEnumerable<Group>)

Agents

No preparation required.

Execution

Coordinator

Wait for Request_Group event.
Select a Group from the Dictionary.
Send out Distribute_Group event with the Group as its message.

Agents

If Agent is Free
1. Send out Request_Group event
2. Wait for Distribute_Group event
3. Intercept event
4. Agent is now Processing Group Functions
If Agent is Processing Group Functions
1. For each Function f(x) in Group sequence
  1. For each dependency d(x) of f(x)
    1. Wait For Processed_Function event of the d(x)
  2. Process f(x) with event message
  3. Send out a Processed_Function event with the output of f(x) as the message.
2. Agent is now free

Deployment

The main assembly consisting of the set of functional relationships is deployed on the Coordinator. Once deployed, the coordinator will take care of the rest.

As more and more agents join the service bus, the faster the overall performance will become.

Function Compilation

In order for the Agents to execute the Functions received from the coordinator, each Group needs to be its own assembly. This way, the Agents themselves do not need to contain any code at the start, but rather receive different code at runtime. We will explore this in more depth in Part 2 of this article.

Summary

Once we have a clear separation of functions, its dependencies, procedural groups, communication, we can put together a program that actually works, distributively!

Part 1 of this article covers the abstract of the framework. The next part will cover its implementation.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Yang Yu

Architect

Canada

Engineer, maker, food lover

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Distributed Programming Framework - Part 1 (Abstract)

Introduction

Background

Why Distribute Code Sequence?

How?

What Do We Really Want?

Abstracts

Functional Dependency

Examples

Procedural Groups

Type Inferencing and Parameters

Communication

Here is the Architecture(Event Driven Synchronize Functional Processing)

Events

Coordinator

Agents

Execution Preparation

Coordinator

Agents

Execution

Coordinator

Agents

Deployment

Function Compilation

Summary

License

Comments and Discussions

Here is the Architecture
(Event Driven Synchronize Functional Processing)