Click here to Skip to main content
15,353,818 members
Articles / Programming Languages / C#
Posted 13 Mar 2003


113 bookmarked

The Application Automation Layer - The Data Hub Implementation

Rate me:
Please Sign up or sign in to vote.
4.90/5 (31 votes)
13 Mar 200314 min read
The third installment of this series discusses the design and implementation of the Data Hub.

Previous Installments:

Noteworthy Changes From Previous Articles:

For performance reasons, modified the Component Manager to use delegates for dynamic method calls instead of reflection.

Table Of Contents

Defining The Problem
Transitive Data
A Case Study
Push vs. Pull
Using A Data Hub
Workflow Domains
Datum And Data Sets
Workflow Domain Definitions
The AAL In Action--A Walkthrough Of The Current Code
Create Workflow Domain
Load Initial Load Set
The Debug Output


The Data Hub is an integral component of the Application Automation Layer. It is one of the architectural foundation stones providing the ability to unify, organize and automate the data transactions between different components within an application. In this article, I will be discussing the design of the Data Hub. The design is based on an existing implementation in C++/MFC.

Defining The Problem

In my years of programming, I have come to the conclusion that there are advantages to viewing data from the point of view of a workflow as opposed to a window or dialog centric view. The document-view architecture in MFC abstracts the data from its representation, but at the same time it does not address issues related to workflow. Furthermore, as more technologies are utilized in an application, a uniform data representation and data conversions for specific technologies is required. Consideration must be made for:

  1. data lifetime
  2. data exchange
  3. formatting and conversion between technologies
  4. optimization
  5. instrumentation
  6. verification
  7. data driven processing
  8. data organization by workflow requirements


  1. Data is often component-centric. This means that data is formatted for a particular component’s needs. Updating or changing components often necessitates formatting the data differently and thus making extensive changes to the application;
  2. Data is handled in a point-to-point fashion. For example, an application translates a pair of (x, y) coordinates represented in edit boxes on the user interface into a line class comprising of points, to pass on to the drawing component, and then translates it again into a different format for storage to a database component. This kind of point-to-point data transfer is repeated without any consideration to abstraction or generalization and causes significant side effects when the structures are changed;
  3. Data transactions are almost impossible to instrument. It becomes very difficult to recreate a bug because the original data set is unknown;
  4. Programs are often exclusively GUI event driven. Rarely are processes invoked by the existence or change of data. This underutilizes an aspect of automation.
  5. Data is encapsulated by object oriented programming techniques. This results in complex class interdependencies in order to distribute vital data among all the necessary objects;
  6. Data verification is either not done at all or done on single elements of the data. Rarely is verification done with regards to the entire data set, and even more rarely is data verification abstracted so that it is applied continually, consistently and separately from its associated processes.


Object oriented programming introduced the concept of encapsulation and everyone glommed onto the idea as a solution to the problem of global data.

Image 1


Image 2

While data encapsulation has its benefits, it is a poor tool to use as a technique for managing data that must cross component or system boundaries. This now happens frequently as workflows incorporate multiple technologies, whereas in the early days of object oriented programming, applications tended to be less interconnected. I will go even farther and state that domain specific data encapsulation is inappropriate for anything but localized usage—usually within the object itself or with a very limited set of interrelated objects. Instead, most data needs to be encapsulated and managed in a generic manner.

It has been my experience that many software development teams do not recognize that data encapsulation is an inappropriate implementation for the now common situation in which data is often used in workflow processes that bridge technologies. It has also been my experience that a lot of what I end up doing is:

  • managing data
  • changing its presentation
  • operating on it
  • consuming it
  • creating it
  • instrumenting it
  • verifying it

In almost all cases, I can extract clearly defined workflows and I can determine the complete set of data on which the workflow operates. However, when I look at the code, it’s a big mess. It is difficult to determine:

  1. what the processes are
  2. the data that the processes use
  3. rules that the workflow utilizes

Transitive Data

Data lifetime is frequently different from process lifetime. When object oriented design came along, it resulted in the lifetime of data being closely coupled with the lifetime of the processes (methods) that operate on the data. This association is not the intent of OOD, but rather an unintentional effect due to insufficient understanding and consideration of transitive data issues.

I have classified data into three types based on lifetime:

  • Global data—data that persists for the lifetime of the application.
  • Transitive data—data that appears and disappears as events occur.
  • Local data—data that is tightly coupled with the persistence of a process.

While the Data Hub can manage global data, its main purpose in life is to manage transitive data. This kind of data usually is:

  1. Associated with events;
  2. Has a limited lifetime;
  3. Is utilized by multiple processes;
  4. Is by itself fairly meaningless, but when applied in conjunction with other transitive data, the entire data collection (or data set) acquires meaning on which a workflow can be based.

A Case Study

Here are the issues that must typically be considered in a point of sale system for one of my clients:

  1. the cost of the item
  2. the retail value of the item
  3. the quantity purchased
  4. the state tax
  5. whether the item is tax exempt (clothing < $50, consumables, etc.)
  6. whether the customer is tax exempt
  7. the customer’s discount
  8. the customer’s location (out of state doesn’t get taxed, except states where you have a presence)
  9. the purchasing method (Internet, phone call, walk-in, etc.), again for purposes of taxation (Internet taxes is inevitable)
  10. any current credits
  11. does the customer have an account or are payments made immediately
  12. the payment method (affects how returns are handled)
  13. the shipping method (relates to freight charges)
  14. hazardous material?

A typical implementation might consist of the following objects, each encapsulating some data and containing some processes:

Image 3

That looks reasonable, now doesn’t it? The data is nicely encapsulated by a related object. For example, the Part object maintains the purchase cost and retail cost of the part, whether it’s taxable, the weight and whether it is considered a hazardous material. The Customer object references an Account object that contains information about the customer’s account—things like whether the customer’s account is tax exempt, its current balance, etc. There’s a Discount object that contains different kinds of discounts, like “cost+10%” or “50% retail discount” and the rules to calculate the new price. There are several lookup tables to help us determine taxes, freight charges, etc. All in all, it looks like a reasonable model. Looking at just the objects one is working with, the whole process should be rather simple, right?

Now let’s look at the same diagram, this time with consideration to process and data flow:

Image 4

This looks very confusing. Data is being pulled from some objects using queries and pushed into other objects as parameters to functions. Intermediate values exist for which there is no encapsulation, and there are two functions GetTax and GetTotalCharge that are hanging out in space that don’t have an object associated with them. Thus design meets the reality of implementation.

Aside: If the first diagram looks a lot like a database schema, well, it’s a good candidate. One of the problems in OO implementation is that programmers often mirror the database schema in their OO model. This leads to duplication in effort, maintenance problems, etc. A “view” is a better way of accessing data, but it does nothing for the storage of data. The latest trend involves moving the storage and retrieval of data to stored procedures. This provides a much needed compartmentalization between the application and the database and the other has benefits as well. However, the primary drawback is the hard coding of parameters passed to the stored procedures and the overhead in maintaining and coordinating between the application and the database procedures.

Push vs. Pull

This example has a mixture of pulling data from resources and pushing data onto processes. This results in a confusing architecture with undocumented object dependencies. Even the programmer who wrote the code can get easily confused as to which objects are data sources and which are data sinks, and which are both! A simple change to this system can have a ripple affect that doesn’t surface until several bus stops later, requiring the programmer to trace the path of the data from source to destination to determine where the problem originates; much like a hardware engineer must trace an electronic signal.

Using A Data Hub

Image 5

An alternate approach is to use the Data Hub and related components to manage transitive data, organize the data into data sets and trigger workflows on completed data sets. In such a system, processes can also be set up as worker threads. The above diagram is a general description of the system. The following diagram illustrates how the Data Hub is used in conjunction with Data Sets and the Workflow Manager.

Image 6

Let’s look at what we’ve accomplished here:

  1. clearly identified data dependencies
  2. clearly identified necessary data sets to compute other information
  3. seven processes is now suitable for worker threads
  4. additional computations can be easily defined and added
  5. intermediate values are managed

Workflow Domains

Image 7

The Data Hub incorporates a concept called “workflow domain”. A workflow domain is created to manage all the data, data sets and events associated with a workflow. For example, all of the data in the example above could be created in a domain called “sales order”. Any worker threads spawned by the workflow, reference the workflow’s data domain, but I won’t get into the details of that here because that is under the prevue of the Workflow Manager.

The benefit of a domain is that it:

  1. Eliminates naming collision;
  2. Creates a mechanism in which data can be completely removed from the Data Hub once the workflow is no longer needed;
  3. Creates a mechanism to distinguish between true global data and transitive data.

However, it creates its own set of issues:

  1. It must be clearly defined as to what data (if any) is transferred from the application wide domain into the workflow domain when the workflow begins, and vice-versa, when the workflow ends;
  2. Under certain circumstances, it might be necessary to share data between two domains;
  3. For the time being, the implementation does not support hierarchical domains. This should not be an issue because workflow processes are not hierarchical either.
  4. Multiple instances of the same workflow is not yet supported. This is a major issue which I will be addressing in the next article, as part of the Workflow Manager.

Datum And Data Sets

Image 8

A Data Set is a collection of datum that has some meaning when it is collected into a set. Processes operate on datum and/or data sets. Borrowing from the concepts of SQL triggers, datum can have associated “insert, update, select, and query” events and a data set can have an associated “ready” event. Used in conjunction with the Workflow Manager, this architecture allows applications to take full advantage of the multithreading capability of modern operating systems.

The data set is really nothing more than an abstract concept for collecting datum and firing an event when the entire set exists or a value in a completed set is changed. Changing data in the data domain can cause a data set to fire its associated “data set ready” event again. For the time being, I don’t see any benefit in any other events associated with the data set.

Workflow Domain Definitions

Workflow domain definitions exist in an XML file and its corresponding XSD file. In two previous articles, I documented an XSD editor and generic XML data editor, both of which have been used for creating the schema and sample definition used in the example provided in this article.

The schema also defines a structure for specifying initial load values for a workflow domain. This is useful for testing the application throughout its development.

The AAL In Action--A Walkthrough Of The Current Code

Image 9

Instead of presenting all the code for the Data Hub, et al, I think it would be more interesting to see the specific code in a sample walkthrough of a workflow domain creation and data set load. The example illustrates how the Data Hub and related components calculate the sale price of an item, given the cost, retail price, quantity and discount information.

In the previous article I described the mechanisms for loading and initializing AAL components, so I’ll skip that part and proceed directly to the point where the sample workflow domain is loaded.

This happens in the bootstrap loader for now (having no better place to put it at this point):

idh.LoadInitialLoadSet("SingleItemSaleOrder", "RetailPrice");

The first step in this process is to load the domain definition file. In the project download, there are two files, dataDomain.xsd which specifies the file format, and dataDomain.xml, for the data domain definitions. I have included the XSD file here so that it can be used as reference when we look at some XPath queries later on in the walkthrough.


<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns:xs="">
    <xs:documentation>Data Domain Schema Definition</xs:documentation>
  <xs:element name="WorkflowDomain" type="WorkflowDomainType" />
  <xs:complexType name="WorkflowDomainType">
      <xs:element name="WorkflowDomainName" type="xs:string" />
      <xs:element name="Description" type="xs:string" />
      <xs:element name="DataSet" type="DataSetType" />
      <xs:element name="SingletonDatum" type="DatumType" />
      <xs:element name="InitialLoadSet" type="InitialLoadSetType" />
  <xs:complexType name="DataSetType">
      <xs:element name="DataSetName" type="xs:string" />
      <xs:element name="Description" type="xs:string" />
      <xs:element name="DSREvent" type="xs:string" />
      <xs:element name="Datum" type="DatumType" />
  <xs:complexType name="DatumType">
      <xs:element name="DatumName" type="xs:string" />
      <xs:element name="Description" type="xs:string" />
      <xs:element name="InsertEvent" type="xs:string" />
      <xs:element name="UpdateEvent" type="xs:string" />
      <xs:element name="SelectEvent" type="xs:string" />
      <xs:element name="DeleteEvent" type="xs:string" />
  <xs:complexType name="InitialLoadType">
      <xs:element name="DatumName" type="xs:string" />
      <xs:element name="Value" type="xs:string" />
  <xs:complexType name="InitialLoadSetType">
      <xs:element name="SetName" type="xs:string" />
      <xs:element name="InitialLoad" type="InitialLoadType" />

As indicated, the first step performed by the bootstrap loader is to tell the Data Hub to load the domain specification. This is a straightforward implementation:

private XmlDataDocument doc;
public void LoadDomainSpecifications(string xmlFileName)
  XmlTextReader tr=new XmlTextReader(xmlFileName);
  Dbg.Assert(tr != null, new DbgKey("MissingXMLFile"));

Create Workflow Domain

Next, the desired workflow domain is created. While multiple domains can of course be created, the demonstration illustrates loading only a single domain defined in the XML file, called "SingleItemSaleOrder".

public void CreateWorkflowDomain(string workflowName)
  WorkflowDomain wd=CreateDomain(workflowName);
  Dbg.WriteLine("Creating Data Set Data");

  DataTable dt=AAL.Lib.Xml.GetTable(doc,
   workflowName+"\"]//DataSet/DataSetName | 

The first segment of this code instantiates a WorkflowDomain object and loads a DataTable based on an XPath query. The code for AAL.Lib.Xml.GetTable is very similar to what is being used in my article A Dynamically Generated XML Data Editor, so I won't go into the detail of it here. However, careful observation of the above XPath statement reveals that two separate pieces of information, DataSetName and DSREvent are queried. In a standard XPath query, these are returned as two separate "rows" per data set entry. However, the GetTable method has some smarts in it so that instead, a single row with two columns per data set entry is returned. I have found this mechanism to be a useful enhancement to convert a hierarchical view to a relational view.


// For each data set in the workflow domain...
foreach(DataRow row in dt.Rows)
  // ... create the data set
  string dataSetName=row["DataSetName"] as string;
  Dbg.Assert(dataSetName != null, new DbgKey("NullValueError"));
  string dsrEvent=
    dt.Columns.Contains("DSREvent") ? (string)row["DSREvent"] : null;
  DataSet ds=wd.CreateDataSet(dataSetName, dsrEvent);

  // This query extracts the datum associated with a specific data set
  DataTable dt2=AAL.Lib.Xml.GetTable(doc,

  // For each datum in the data set...
  foreach(DataRow row2 in dt2.Rows)
    string datumName=row2["DatumName"] as string;
    Datum datum=new Datum(wd, datumName,
     dt2.Columns.Contains("InsertEvent") ? (string)row2["InsertEvent"] : null,
     dt2.Columns.Contains("UpdateEvent") ? (string)row2["UpdateEvent"] : null,
     dt2.Columns.Contains("SelectEvent") ? (string)row2["SelectEvent"] : null,
     dt2.Columns.Contains("DeleteEvent") ? (string)row2["DeleteEvent"] : null);
    wd.CreateDatum(dataSetName, datum);

The above code uses an XPath query to extract all the datum for a specific data set.

Lastly, this code loads any singleton data associated with the workflow domain that is not associated with any particular data set.

  // For each singleton datum defined in the workflow domain...
  Dbg.WriteLine("Creating Singleton Data");
  foreach(DataRow row in dt.Rows)
    // ... create the datum. Note that it is not
    // associated with any data set.
    string datumName=row["DatumName"] as string;
    Datum datum=new Datum(wd, datumName,
     dt.Columns.Contains("InsertEvent") ? (string)row["InsertEvent"] : null,
     dt.Columns.Contains("UpdateEvent") ? (string)row["UpdateEvent"] : null,
     dt.Columns.Contains("SelectEvent") ? (string)row["SelectEvent"] : null,
     dt.Columns.Contains("DeleteEvent") ? (string)row["DeleteEvent"] : null);

Load Initial Load Set

A lot happens here because once the data sets are filled, various events are automatically triggered. Taking it one step at a time, the function itself is very straightforward:

public void LoadInitialLoadSet(string workflowName, string loadSetName)
  Dbg.WriteLine("Loading initial data for "+
            workflowName+" set: "+loadSetName);
  DataTable dt=AAL.Lib.Xml.GetTable(doc,
  string guid=(string)publicDomainNames[workflowName];
  WorkflowDomain wd=(WorkflowDomain)domains[guid];
  foreach(DataRow row in dt.Rows)
    string datumName=row["DatumName"] as string;
    string val=row["Value"] as string;
    wd.SetDatum(datumName, val); 

Things get interesting during the wd.SetDatum... calls. The first step on the path invokes the WorkflowDomain::SetDatum method:

public void SetDatum(string name, object val)
  Dbg.Assert(data.ContainsKey(name), new DbgKey("BadKey"));

The Datum object provides a setter and getter for the Val attribute:

public object Val
    return val;
    // update all associated data sets
    for (int i=0; i<dataSetAssociations.Count; i++) 

As illustrated in the above code, an event is fired for both the get (query) and the set (update) functions. Furthermore, in the setter, each Data Set to which this datum is associated is notified that the value has been set (which also implies modified). In the example, the Data Set has an associated "Data Set Ready" event. Observing this code (part of the DataSet class):

internal void Set(Datum datum)
  if (dsrEvent != null) 
    if (datumAssignments.Count==datumAssociations.Count)
      Hashtable datumSet=new Hashtable();
      foreach (Datum dt in datumAssociations)
    DataHub.FireEvent(dsrEvent, new EventData(wd.Name, name, datumSet));

a Hashtable is used to track unique datum. When the count of datum that has been initialized equals the count of datum associated with the data set, the "Data Set Ready" event is fired through the Data Hub. First, all the associated datum are compiled into a single Hashtable, using the NonEventValue method which specifically avoids invoking the "query" event. Next, the event is triggered. This code:

internal static void FireEvent(string eventName, EventData eventData)
  iwm.FireEvent(eventName, eventData);

is merely a pass-through to the Workflow Manager. This may be eliminated in the future but serves as a useful placeholder for right now.

Similarly, the Workflow Manager provides a simple pass-through to the Component Manager to perform the actual invocation. This part of the Workflow Manager will be developed further in the next article, because here is where the worker thread would be initialized prior to the invoking of the event.

Skipping the Component Manager (discussed in the previous articles), the test component code is invoked:

public static object Calc(EventData eventData)
  Dbg.WriteLine("Data Set = "+eventData.containerName);
  AAL.Lib.Debug.Dump( as Hashtable);
  Hashtable as Hashtable;
  decimal rMinus=Convert.ToDecimal(ht["RetailMinusPercent"]);
  decimal cPlus=Convert.ToDecimal(ht["CostPlusPercent"]);
  decimal cost=Convert.ToDecimal(ht["Cost"]);
  decimal retail=Convert.ToDecimal(ht["Retail"]);
  decimal qty=Convert.ToDecimal(ht["Qty"]);
  decimal salePrice=qty*((cost*(1+cPlus/100))+(retail*(1-rMinus/100)));
  idh.SetDatum(eventData.workflowDomainName+".SalePrice", salePrice);
  return null;

This function is straightforward enough, saving the calculated sale price back to the workflow domain. The "SalePrice" datum has an "update" event associated with it which specifies the TestComponent::SetSalePrice method. As soon as the value is set, this event is fired which calls into the test code:

public static object SetSalePrice(EventData eventData)
  Dbg.WriteLine("Sale Price = ";
  return null;

which does nothing more than emit a debug statement.

The Debug Output

The resulting debug output appears as follows:

AAL: Loading component: DataHub.dll/AAL.DataHub
  Loaded 'c:\\csaal\bin\debug\datahub.dll', 
  Symbols loaded.
AAL: Loading component: WorkflowManager.dll/AAL.WorkflowManager
  Loaded 'c:\\csaal\bin\debug\workflowmanager.dll', 
  Symbols loaded.
AAL: Loading component: TestComponent.dll/Testbed.TestComponent
  Loaded 'c:\\csaal\bin\debug\testcomponent.dll', 
  Symbols loaded.
'BootstrapLoader.exe': Loaded 

  No symbols loaded.
Workflow Domain Created: App
AAL: Phase 1 - WorkflowManager
AAL: Phase 1 - TestComponent
AAL: Registering TestComponent.Test
AAL: Registering TestComponent.CalculateRetailPrice
AAL: Registering TestComponent.SetSalePrice
AAL: Phase 2 - WorkflowManager
AAL: Phase 2 - TestComponent
AAL: Invoking TestComponent.Test
Workflow Domain Created: SingleItemSaleOrder
Creating Data Set Data
Data Set Created: CalcRetailPrice
Datum Created: Cost
Datum Created: Retail
Datum Created: Qty
Datum Created: CostPlusPercent
Datum Created: RetailMinusPercent
Data Set Created: CalcItemTax
Datum Created: TaxExempt
Datum Created: Taxable
Datum Created: TaxRate
Data Set Created: CalcFreightCharge
Datum Created: ShippingMethodID
Datum Created: Hazmat
Datum Created: Weight
Datum Created: LocaleID
Data Set Created: CalcTotal
Datum Created: RetailPrice
Datum Created: FreightCharge
Datum Created: Tax
Datum Created: AdditionalFees
Creating Singleton Data
Datum Created: DiscountID
Datum Created: PurchaseMethodID
Datum Created: SalePrice
Loading initial data for SingleItemSaleOrder set: RetailPrice
Event: TestComponent.CalculateRetailPrice
AAL: Invoking TestComponent.CalculateRetailPrice
Data Set = CalcRetailPrice
Event: TestComponent.SetSalePrice
AAL: Invoking TestComponent.SetSalePrice
Sale Price = 47


The next step is to further develop the Workflow Manager so that events can be invoked in their own thread space. As this happens, there will undoubtedly be some review of the existing code to determine, in particular, whether the association of datum to workflow domains via the thread is both useful and appropriate. Additional issues involve thread pooling, revisiting performance and adding performance benchmarking.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Marc Clifton
Architect Interacx
United States United States
Home Page:

All my life I have been passionate about architecture / software design, as this is the cornerstone to a maintainable and extensible application. As such, I have enjoyed exploring some crazy ideas and discovering that they are not so crazy after all. I also love writing about my ideas and seeing the community response. As a consultant, I've enjoyed working in a wide range of industries such as aerospace, boatyard management, remote sensing, emergency services / data management, and casino operations. I've done a variety of pro-bono work non-profit organizations related to nature conservancy, drug recovery and women's health.

Comments and Discussions

GeneralWow! Pin
redmoon19734-Nov-04 21:09
Memberredmoon19734-Nov-04 21:09 
GeneralRe: Wow! Pin
Michael P Butler4-Nov-04 21:53
MemberMichael P Butler4-Nov-04 21:53 
GeneralRe: Wow! Pin
Marc Clifton5-Nov-04 1:12
mvaMarc Clifton5-Nov-04 1:12 
QuestionSo where is the data hub? Pin
Henk S3-Nov-03 9:52
MemberHenk S3-Nov-03 9:52 
AnswerRe: So where is the data hub? Pin
Marc Clifton3-Nov-03 14:31
mvaMarc Clifton3-Nov-03 14:31 
QuestionHow to fill DataGrid? Pin
mdv23-Apr-03 19:53
Membermdv23-Apr-03 19:53 
AnswerRe: How to fill DataGrid? Pin
Marc Clifton24-Apr-03 0:08
mvaMarc Clifton24-Apr-03 0:08 
GeneralBravo! Pin
Kant17-Mar-03 4:33
MemberKant17-Mar-03 4:33 
GeneralRe: Bravo! Pin
Marc Clifton17-Mar-03 4:37
mvaMarc Clifton17-Mar-03 4:37 
GeneralMissing file Pin
MikeBeard14-Mar-03 9:19
MemberMikeBeard14-Mar-03 9:19 
GeneralRe: Missing file Pin
Marc Clifton14-Mar-03 11:32
mvaMarc Clifton14-Mar-03 11:32 
GeneralMissing file Pin
Juergen Posny14-Mar-03 9:18
MemberJuergen Posny14-Mar-03 9:18 
GeneralRe: Missing file Pin
Marc Clifton14-Mar-03 11:33
mvaMarc Clifton14-Mar-03 11:33 
GeneralWOW!!! Pin
Chopper14-Mar-03 6:31
MemberChopper14-Mar-03 6:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.