![]() |
Languages »
C / C++ Language »
General
Intermediate
Organic Programming Environment (OPEN)By Marc CliftonOPEN is a prototype development exploring a different paradigm for data management. Instead of applications being process-centric, in which processes drive data transfer, the Organic Programming environment uses a data-centric approach. In this paradigm, data initiates processes. |
VC6Win2K, WinXP, Visual Studio, MFC, Dev
|
|
Advanced Search |
|
|
|
||||||||||||||||
GUI applications are event driven, in that they respond to user and hardware events. These events perform one or more of the following processes:
Collecting data includes such activities as:
Data transformation consists of any process that manipulates the data, combining, splitting, formatting, etc. Data dispersion is the opposite of data collection, during which data is written to a database, a GUI object is updated, or a hardware register is written.
In this paradigm, data is usually very localized, contained within a single function or managed by a single object. Interface methods of various complexity exist to share data between objects. Ultimately, a large application must manage a diverse amount of data and perform complicated tasks on this data. This leads to complexities in managing various aspects of data:
These tasks are left to the programmer, and are in my experience mostly ignored.
In the Organic Programming Environment (OPEN), a data-centric approach is taken. A Data Pool (D-Pool) manages all data. Processing functions register themselves with the D-Pool Manager (DPM), indicating the data on which they operate, and the data that they produce. When data is placed into the D-Pool, the DPM automatically initiates all interested parties as threads. When all threads have completed, the data is automatically removed from the D-Pool. The DPM can also be instructed regarding data lifetime, version/format information, and read/write access.
I came up with this name because I wanted to describe a process similar to how proteins are transported out of the nucleus of a cell and manipulated into more interesting molecules, that are then either used by the cell itself or transported out of the cell, to be used by some other cell or organ. This process seems to me to be inherently data-centric, with the DNA/RNA being the data, and various processes (enzymes, etc) attaching themselves to the data as it appears in the cellular fluid. Please note that this architecture is not based on either genetic programming (http://www.genetic-programming.org/) nor the Organic Programming Language Gaea (http://www.carc.aist.go.jp/gaea/).
There are several benefits to this architecture:
There are also several drawbacks to this architecture (I'm sure I haven't thought of them all!)
There are four design elements of OPEN:
This architecture will quickly create hundreds, if not thousands, of small processes that receive some data,
manipulate it, and place the result back into the D-Pool. Registering all these processes manually becomes a
very cumbersome task, so it behooves us to make this as simple for the programmer as possible.
There process must implement three things:
Each process can operate on more than one datum. If, for example, a process uses two data, the DPM must check every permutation of datum currently in the pool to determine if a process should be triggered. This will lead to disaster as the number of datum increases in the D-Pool. To avoid this, a specialized container is implemented that collects datum. When the collection is complete, the process is triggered. This collection is implemented as an STL multi-map. A datum can be associated with one or more collections-processes. The collection acquires the same name as the collection's process, thus associating the collection with its process. When a datum is placed into the D-Pool, the DPM iterates through the multi-map of datum, adding the datum to each associated collection. Every process whose collection has a completed datum list is then triggered.
This design has undesirable side affects:
For this prototype, these side affects are ignored.
Finally, the D-Pool does not know what the data type is, for each datum in the pool. Using templates is not feasible because templates require the type to be known at compile time, imposing very annoying requirements on the programmer. Instead, the D-Pool maintains a collection of generic data containers. The data container is smart enough to convert from and to various built-in types and includes a virtual base class for custom derivations. Since the data container is not really the topic of this paper, its design and implementation is ignored. Feel free to browse through the source.
To implement this design, all processes must be declared as classes and derived from
the OPEN_Process class.
class OPEN_Process { public: virtual ~OPEN_Process() {} void RegisterDNames(void); bool SetData(const CString& dataName, const DataContainer& dc) { dataNameList[dataName]=dc; // this is quite the kludge to see if all datum for // the process has been set return nameList.size() == dataNameList.size(); } virtual void Run(void)=0; protected: OPEN_Process(const CString& s1, const CString& s2) : pName(s1), dName(s2) {} OPEN_Process(void) {}; OPEN_Process(const OPEN_Process& p) : pName(p.pName), dName(p.dName), nameList(p.nameList), dataNameList(p.dataNameList) {} protected: CString pName; CString dName; std::vector<CString> nameList; std::map<CString, DataContainer> dataNameList; };
This class is implemented as a virtual base class. Other than the constructor (which must be invoked
from a derived class), the only method of real interest to the user is Run(void), which must
be implemented in the derived class. Note also that this class encapsulates the process name, the original datum name
list, the datum name list separated out into a vector, and an STL map associating the datum
name with the container that maintains the actual value. This information can be used to generate debugging information
or a data diagram (for example, in Visio) of the data flow. Describing output data is currently not a requirement
but could be easily added.
Each process is managed by a process pool. The OPEN_ProcessPool essentially encapsulates an STL
map that associates the process name with a pointer to the process. Other objects in OPEN also use
this class to interface to a specific process object.
class OPEN_ProcessPool { public: OPEN_ProcessPool(void) {} virtual ~OPEN_ProcessPool() {} void Register(const CString& processName, class OPEN_Process* proc) { processList[processName]=proc; } bool SetData(const CString& processName, const CString& dataName, const DataContainer& dc) { ASSERT(processList.find(processName) != processList.end()); bool trigger=processList[processName]->SetData(dataName, dc); return trigger; } void Trigger(const CString& processName) { ASSERT(processList.find(processName) != processList.end()); AfxBeginThread(OPEN_ProcessPool::StartProcess, processList[processName]); } protected: static UINT StartProcess(void*); public: static OPEN_ProcessPool pool; protected: std::map<CString, class OPEN_Process*> processList; };
There is no particular reason from the programmer to interface to this class directly.
The OPEN_DataCollection implements the one to many association between the datum name and the processes
that are interested in that datum. This is implemented as an STL multimap, and this class is
essentially a wrapper for the multimap, providing registration and iteration methods.
class OPEN_DataCollection { public: OPEN_DataCollection(void) {} virtual ~OPEN_DataCollection() {} void Register(const CString& datumName, const CString& collName) { collectionList.insert(std::pair<const CString, CString>(datumName, collName)); } bool FindFirst(const CString& dataName, CString& collName) { iter=collectionList.find(dataName); ASSERT(iter != collectionList.end()); collName=(*iter).second; return iter != collectionList.end(); } bool FindNext(const CString& dataName, CString& collName) { ++iter; if (iter==collectionList.end()) { return false; } collName=(*iter).second; return (*iter).first == dataName; } public: static OPEN_DataCollection coll; protected: std::multimap<const CString, CString> collectionList; std::multimap<const CString, CString>::iterator iter; };
The OPEN_DataPool implements a wrapper around yet another STL map. This map associates
the datum name with the actual value. It is this object with which the application interfaces to place data into
the D-Pool. This class also implements a semaphore that unblocks the DPM, which then parses through the data in
the D-Pool, removing it and placing it into the corresponding process container. This class also implements a
CRITICAL_SECTION to ensure that the DPM (running as a thread) can read and write the collectionList
without colliding with other threads that are potentially writing and erasing this collection also.
class OPEN_DataPool { public: OPEN_DataPool(void) { InitializeCriticalSection(&cs); dpSem=CreateSemaphore(NULL, 0, 0x7FFF, "OPEN_DP_SEM"); ASSERT(dpSem); } virtual ~OPEN_DataPool() { DeleteCriticalSection(&cs); CloseHandle(dpSem); } void Add(const CString& dataName, const DataContainer& dc) { EnterCriticalSection(&cs); dataPoolList[dataName]=dc; LeaveCriticalSection(&cs); ReleaseSemaphore(dpSem, 1, NULL); } void RemoveDatum(CString& s, DataContainer& d) { EnterCriticalSection(&cs); std::map<CString, DataContainer>::iterator iter = dataPoolList.begin(); ASSERT(iter != dataPoolList.end()); s=(*iter).first; d=(*iter).second; dataPoolList.erase(iter); LeaveCriticalSection(&cs); } public: static OPEN_DataPool pool; protected: std::map<CString, DataContainer> dataPoolList; CRITICAL_SECTION cs; HANDLE dpSem; };
The OPEN_Mgr implements the DPM. This class is very simple:
class OPEN_Mgr { public: OPEN_Mgr(void) {} virtual ~OPEN_Mgr() {} void Run(void); public: static OPEN_Mgr mgr; protected: HANDLE dpSem; };
What is of more interest is the implementation of the Run method:
void OPEN_Mgr::Run(void) { dpSem=OpenSemaphore(SYNCHRONIZE, FALSE, "OPEN_DP_SEM"); ASSERT(dpSem); while (1) { DWORD ret=WaitForSingleObject(dpSem, INFINITE); if (ret==WAIT_OBJECT_0) { CString dataName; CString processName; DataContainer data; OPEN_DataPool::pool.RemoveDatum(dataName, data); bool ret=OPEN_DataCollection::coll.FindFirst(dataName, processName); while (ret) { bool trigger=OPEN_ProcessPool::pool.SetData(processName, dataName, data); if (trigger) { OPEN_ProcessPool::pool.Trigger(processName); } ret=OPEN_DataCollection::coll.FindNext(dataName, processName); } } else { break; } } }
Implemented as a thread, this function waits for datum to be placed into the data pool, upon which the thread is released. It iterates through the data pool, removing each datum and value from the pool and storing it in the each process that is interested in the datum. The DPM then "triggers" the process as a thread when all the datum that the process requires has been instantiated.
To support a somewhat more readable implementation of processes, several macros are defined:
#define DECLARE_OPEN(x, y) \ class x : public OPEN_Process \ { \ public: \ x(void) : OPEN_Process(#x, y) {} \ virtual void Register(void) \ { \ RegisterDNames(); \ OPEN_ProcessPool::pool.Register(#x, this); \ } \ virtual void Run(void); \ static x _##x; \ }; \ x x::_##x; #define IMPLEMENT_OPEN(x) \ void x::Run(void) { #define FINISH_OPEN \ dataNameList.erase(dataNameList.begin(), dataNameList.end()); } #define REGISTER_OPEN(x) \ x::_##x.Register()
Thus, the implementation of a process would look something like this:
DECLARE_OPEN(AddCost, "itemCost"); IMPLEMENT_OPEN(AddCost) { double cost; dataNameList["itemCost"].Get(cost); double total=atof(dlg->total)+cost; OPEN_DataPool::pool.Add("totalCost", DataContainer().Set(AutoString(total))); } FINISH_OPEN
And somewhere in the initialization section of the application, the process must be instantiated:
REGISTER_OPEN(AddToList);
The issue of process instantiation is an annoying one. You will notice that almost all of the OPEN classes automatically instantiate a singleton, implemented as a public static member of each class. This is also done with the processes. However, the actual registration of the datum names cannot be done at program startup because other necessary initialization (for STL, for example) has not occurred. To my knowledge, there is no way of pre-determining the initialization order of global or static data.
The demonstration program is a simple and rather poor example of how this paradigm works. The "Add Part", "Remove Part",
and "Clear Part" events each place data into the D-Pool which is picked up by the DPM. The most interesting thing
about this example is that the datum "itemCost" triggers two events: one to update the list box and the other to
update the running total. In actuality, this example is a poor one because the process handlers are so tightly
coupled to the dialog object, which is definitely not a desirable thing to have happen in real life. Also note that
I over-rode the AssertValid method so that I could update the dialog from a worker thread while still
running in debug mode. Good 'ol MFC.
For this model to really be effective, the developer needs to completely rethink how applications are designed and implemented.
For example, a function might do something like this:
In the spirit of this architecture, the function should be recoded into several processes:
DB query Q1
DB query Q2
Operation A
DB query Q3
DB query Q4
Operation B
DB update QQ
DB update RR
As you can see from the above architecture, the program now automatically performs the database queries and updates simultaneously in separate threads. This can dramatically improve program performance and it is achieved by simply using a different paradigm for data management.
I consider this paradigm a significant enhancement to the existing process-centric programming styles. It results in:
There are complexities in this model that are not fully understood. My challenge to the reader is to identify these complexities and design solutions for them, ultimately making this paradigm robust and easy to use. For example, in this prototype implementation, the DPM, D-Pool, and other objects are implemented as global singletons. It seems instead more reasonable that an application would have several data pools at different scales. This would extend the entire concept of organic programming? For example, (pardon the analogy), program organs.
Please credit the author, Marc Clifton, for the core of OPEN in any application that you build using it (I'm probably dreaming, right?). The author (me) also requests that he is provided with the source code and list of all enhancements that you make to the architecture of OPEN, so that they may be included in future versions for the benefit of all.
The Organic Programming Environment is not a replacement for process-centric modeling. However, the OPEN is a significant enhancement in the programmer's toolset because there are many cases where a data-centric model is superior to a process-centric one.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 26 May 2002 Editor: Nishant Sivakumar |
Copyright 2002 by Marc Clifton Everything else Copyright © CodeProject, 1999-2009 Web15 | Advertise on the Code Project |