![]() |
General Programming »
Threads, Processes & IPC »
Third Party libraries
Intermediate
License: The Code Project Open License (CPOL)
Distributed computing in small and middle officesBy hax_Introduction to open-source hxGrid library for distributed computing. Main benefits of the library: cluster is using only idle time of Windows 2000/XP/Vista workstation (no dedicated workstations required); easy to use; free. |
C++ (VC6, VC7, VC7.1, VC8.0), C, Windows (Win2K, WinXP, Win2003, Vista), Win32, Architect, Dev
|
||||||||
|
Advanced Search |
|
|
|
||||||||||||||||






libGlass [8]. Despite easy programming model, I have put aside this candidate, because there were no new versions of this library for a long time.

Alchemy[9]. The most promising candidate. Provides easy to use programming model, works on .NET framework, which, buy the way, in theory can allow even Windows Mobile smartphone, like Motorola MPX 200, to became a node in the cluster :). Contains wonderful examples and continues to evolve.
Unfortunately, during my testing, some examples crashed and left orphan processes on nodes.
After unsuccessful search, I made a decision to develop library myself.
During development, I was taking into consideration the following additional requirements:
To minimize development time, I have chosen to implement library on Delphi. Library is using fake COM interfaces [11]. Library is using Jedy Visual Code Library[15] and ZLIB[27].
Binary package should be downloaded to install cluster[10].
The cluster software consists of three components:
1. Coordinator. Coordinator should be installed on workstation, which is always online. Coordinator is responsible for maintaining a list of computational resources, and distribution of this list to grid users. Coordinator should be installed on only one workstation in local network.
Agents and Users can find Coordinator by sending broadcast packets in local network. Thus no configuration is required if Coordinator is moved to PC with other IP address.
Users connect to Coordinator only to get list of agents. Later, users connect to agents directly, unlike other grid solutions, where coordinator distributes tasks among agents. . This allows to establish fastest communication between user and agents.
2. Agents should be installed on every workstation in local network. Agent allows utilizing idle CPU time of workstation.
3. Grid user – some application, written with hxGrid library.
hxGrid allows to run tasks (execute procedures) on cluster workstations. Grid application places tasks into queue and waits for completion.
Input data for tasks are written into stream (IGenericStream inStream). After successful execution of task, output data is received also in stream (IGenericStream outStream).
If necessary, an agent, executing a task, can request additional data from grid application.
Every task is function with the following signature:
C++
typedef bool (__cdecl TTaskProc)(IAgent* agent, DWORD sessionId, IGenericStream* inStream, IGenericStream* outStream);
Delphi:
type TTaskProc = function(agent: IAgent; sessionId: DWORD;
inStream: IGenericStream; outStream: IGenericStream): boolean; cdecl; where sessionId – unique session id. This Id is required to request additional data from grid application (described later). Function should return false, if interrupted (described later).
Such procedures should be placed into DLL. hxGrid delivers the code by transferring DLL to other workstation. TTaskProc function should be thread-safe.
After successful execution of task on agent, library sends output stream back to user, and calls FinalizeTask()callback:
C++:
typedef void (__cdecl TFinalizeProc)(IGenericStream* outStream);
Delphi:
type TFinalizeProc = procedure(outStream: IGenericStream); cdecl;
This function should be thread-safe.
If single threaded application is processing large array of independent elements, the obvious way to increase performance is to divide array into small ranges and process them on cluster nodes.
When started, each agent looks for coordinator, sending broadcast packets over LAN. When connection with coordinator has been established, agent starts sending self status periodically (percentage of free CPU time, amount of free physical memory, number of queued tasks).
To begin cluster usage, grid application initializes hxGrid library. Library connects to coordinator (sending broadcast packets, if required), requests list of agents and connects to them directly. From this moment, library is ready to execute remote tasks. At the same time, library continues to request list of agents, so new agents are able to join running session.
Grid application adds tasks to queue. Library continuously sends tasks to agents in background thread. If connection with some agent is lost, library is able to reassign it’s tasks to other agent.
When library is deinitialized, it disconnects from the agents.
To increase efficiency, the following technical decisions were applied:

PlayStation 2 cluster[16].
Library is distributed in form of DLLs: hxGridUserDLL.dll, zlib.dll.
To start session, grid application should create IGridUser object (see examples\picalculator\C++ и examples\picalculator\Deplhi):
C++:
#include "hxGridInterface.h" void main() { IGridUser* user = CreateGridUserObject(IGridUser::VERSION); ...
Delphi:
Uses T_GenericStream, I_GridUser; var user: IGridUser; begin IGridUser_Create(user); ...
From this moment, it is possible to execute tasks on cluster nodes:
C++:
IGenericStream* stream = CreateGenericStream(); DWORD d=1+i*9; stream->Write(&d,4); user->RunTask("picalculator_task.dll","RunTask",stream,Finalize,&d,true);
Delphi:
//write input data to stream stream := TGenericStream.Create(); d := 1+i*9; stream.write(d,4); //add task to queue user.RunTask('picalculator_task.dll','RunTask',stream,Finalize,d,true); //now library owns stream, release our ref pointer(stream):=nil;
RunTask() method has the following parameters:
If task can not be added to queue immediately (due to limitations of queue length or queue input streams size), and blocking flag is set, method will not return until task is added to queue. Otherwise method will return S_FALSE, and application can wait for good moment with User->WaitForCompletionEvent()).
Please note, that some dependent DLLs, for example, VC++ runtime libraries, can be missing on remote workstation. It is advised to build with static libraries, and always check dependencies with dumpbin utility. To send additional DLLs to workstation, their names should be specified in RunTask() method, separated with comma:
user.RunTask('GridGMP_task.dll,GMPPort.dll','RunTask',stream,Finalize,d,true);
Task function should call the following agent method periodically, to allow agent handle abortion and suspension events:
Delphi:
if (agent.TestConnection(sessionId)<>S_OK) then begin result := false; exit; end;
This is very fast method, and can be called from inner cycle even with ~10000Hz frequency.
User->WaitForCompletion() is used to wait until all queued tasks are done.
To close session, application should destroy IGridUser object:
C++:
user->Release();
Delphi:
user:=nil;
This small chapter has shown everything needed to use hxGrid.

XBOX cluster[17].
In some circumstances, tasks, running on agents, should access some global data, common to all tasks in session.
If global data size is high, it is inefficient to pass global data in task input stream. Agent provides the following method to request global data from user:
C++:
virtual HRESULT __stdcall GetData(DWORD sessionId, const char* dataDesc, IGenericStream** stream) = 0;
Delphi:
function GetData(sessionId: DWORD; dataDesc: pchar;
var stream:IGenericStream): HRESULT; stdcall;
dataDesc – symbolic identifier of data, f.e. ‘geometry’;
stream – address of variable to receive address of stream with data (stream is created by agent, and should be freed by task).
To handle requests, hxgrid application should register the following callback:
C++:
typedef void (__cdecl TGetDataProc)(const char* dataDesc, IGenericStream** outStream); user->BindGetDataCallback(GetDataCallback);
Delphi:
type TGetDataProc = procedure(dataDesc: pchar; var stream: IGenericStream); cdecl;
User->BindGetDataCallback(callback: TGetDataProc); stdcall;
TGetDataproc should create stream and fill it with data. Ownership of stream is passed to library.
Usually, task is trying to access some global cache, and, if data is not in there, requests data from hxgrid application. Cache should be indexed with sessionId and data symbolic description.
This methods are used in examples\normalmapper-3-2-2 to request hi-poly mesh octree.

PlayStation 2 cluster[16].
C++:
virtual void __stdcall GetSettings(TGridUserSettings settings); virtual void __stdcall SetSettings(TGridUserSettings settings);
Delphi:
User.GetSettings(var settings: TGridUserSettings); stdcall; User.SetSettings(var settings: TGridUserSettings); stdcall;
Methods allow to change settings programmatically. For example, application can disable stream compression, if it fills streams with compressed data.
C++:
virtual HRESULT __stdcall CompressStream(IGenericStream* stream);
Delphi:
User.CompressStream(stream: IGenericStream): HRESULT; stdcall;
This method allows to compress stream. For example, it is good practice to prepare and compress stream with global data ahead, so library will not have to compress stream each time GetDataCallback() is called. Library is able to find out if stream is compressed, by examining signature. Compressed stream should not be unpacked by task – agent does it automatically. See examples\normalmapeer-3-2-2.
C++:
virtual HRESULT __stdcall FreeCachedData(DWORD sessionId, const char* dataDesc);
Delphi:
HRESULT __stdcall IAgent::FreeCachedData(DWORD sessionId, const char* dataDesc) = 0;
Agent is caching global data, requested during session. This method allows to free cache and minimize memory usage, if it is known, that data will not be requested again during session.
For example: task is requesting global data, builds some structures from it, and stores them in some custom global cache. Other tasks will use structures from this cache and will not call agent.GetData() during session anymore.
C++:
typedef void (__cdecl EndSession)(IAgent* agent, DWORD sessionId);
Delphi:
type TEndSessionProc = procedure(agent: IAgent; sessionId: DWORD); cdecl;
Agent calls EndSession() callback at session end. Usually this callback is used to free global session data cache.
Callback should be exported by name from DLL, which was specified to IGrudUser->RunTask(). This callback is optional.
There are two patterns to use library:
1) tasks are added to queue with IGridUser->RunTask(blocking = true). Then, application waits for completion with IGridUser->WaitForCompletion().
2) Tasks are added to queue with IGridUser->RunTask(blocking = false). If method returns S_FALSE, because of queue is full, application waits for queue progress with IGridUser->WaitForCompletionEvent(). When all tasks are submitted, application waits for tasks completion, calling IGridUser->IsComplete() periodically.
Delphi:
for s:=1 to 200 do begin ... while (user.RunTask('GridGMP_task.dll,GMPPort.dll', 'RunTask',stream,Finalize,d,false)<>S_OK) do begin Application.ProcessMessages(); user.WaitForCompletionEvent(100); if state=ST_CANCELING then begin break; end; end; if state=ST_CANCELING then break; ... end; ... bl:=true; while (bl) do begin Application.ProcessMessages(); Sleep(100); if (state=ST_CANCELING) then begin user.CancelTasks(); break; end; user.IsComplete(bl); end;
Application can cancel all tasks at any moment by calling IGridUser->CancelTasks().
Despite bigger complexity, second pattern has the following advantages: application can perform some tasks while waiting; there are ability to cancel tasks and ability to show progress in UI.
For details of second pattern, see examples\GridGMP\.
The most simple example. This is Pi value calculation with specified accuracy. Each task is calculating [n…n+8] number.
Mandelbrot fractal explorer. There are three versions of application:
SingleCPUExtended – this example is using FPU native 10-byte floating point type and runs on single CPU. Luck of accuracy does not allow high zoom levels.
SingleGMP – this example is using 256-byte «big numbers» on single CPU. A Delphi port of GMP[13] library is used.
GridGMP – the example is using 256-byte «big numbers» and hxGrid. Each agent is assigned to calculate one vertical line of the image. This application also shows how badly partitioned calculations can decrease overall performance: vertical lines can require very different amount of processing power, up to 50:1. A better solution would be to assign each agent to calculate every N’s pixels of image’s horizontal scanning.

ATI NormalMapper 3.2.2[14], ported to hxgrid.
Normalmapper is ideal application for distributed computing. Each texel of normal map can be calculated independently from others. Total time to calculate normal map with occlusion term and bent normals enabled, can take up to 12 hours.
The modified version forms tasks to calculate 100 pixels of normal map. Each agent requests serialized hi-poly mesh octree by calling IAgent->GetData(). Octree size can be up to 300Mb, so sending it in task input stream is inefficient.
In addition, modified version contains mutex, which allows to run only one instance of normalmapper at a time. It is possible to run several instances with different models, and they will run in queue automatically.
It should be mentioned, that library does not warranty the order of tasks completion. Since it is important to calculate texels in strict order (because of intersecting mapping and border texels), applications performs special procedures to update normal map texels in required order.
System requirements:
A couple of PCs running Windows 2000/XP SP1/SP2/Vista, connected via local network.
1. Download hxGrid Coordinator[10] and install on any workstation in the local network. Coordinator should be installed only on one workstation in the local network. This PC should always be available to coordinate hxGrid.
2. Download hxGrid Agent[10] and install on all workstations in the local network.The more PCs are running the agent, the more faster will hxGrid apps work.
3. Download ATI Normalmapper for hxGrid[10] and install on any PC in the local network.
No additional configuration is required.
Hint: If everything above are installed on single workstation, and it does not have network adapter with IP address, "Microsoft loopback adapter" can be installed and assigned some IP address.
Workstations configurations:
Measurements were made for calculation of car_low.nmf+car_high.nmf meshes, 4096x4096, Bent normals, Ambient occlusion.
Normalmapper.exe –on carlow.nmf carhigh.nmf 4096 4096 test.tga
For experiments, original version of ATI Normalmapper has been recompiled with VS 2005 with aggressive optimizations (no PGO).
Experiments show, that hxGrid can be successfully used to take advantage of multicore CPUs even on single workstation (agent, coordinator and grid application is running on single workstation).

Normalmap calculation, hours
In other worlds, calculation of normalmap on cluster took 8 minutes, comparable to almost 4 hours on single PC (performance improvement – 27,4 times).
HT CPU showed expected 12% speed improvement, but I can’t explain 35% speed up of Core 2 Duo (I expected ~80%). Probably, the bottleneck here is shared L2 cache, since Pentium D Presler showed 73% speedup.

State of cluster (end of working day)
Because of high energy consumption, cluster room should be equipped with ventilation and cooling. How much does energy consumption of office increase while agents are active?

100% processor load increases energy consumption approximately by 30 Watt. Accuracy of my measurements can be questionable, but even on 50 workstations office overall energy consumption increases only for 1.5kWt, which is less then good teapot (1.8kWt).
The following problems have not been solved yet:
Currently, xNormal[29] is using hxGrid to accelerate normalmap generation.
1. General-Purpose Computation Using Graphics Hardware
http://www.gpgpu.org/
2. Ambient occlusion - From Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Ambient_occlusion
3. NVIDIA CUDA Homepage
http://developer.nvidia.com/object/cuda.html
4. Incredibuild by Xoreax software
http://www.xoreax.com/main.htm
5. Globus Toolkit Homepage
http://www.globus.org/toolkit/
6. Кластерная система Condor
http://www.osp.ru/os/2000/07-08/178077/
7. MPICH2
http://www-unix.mcs.anl.gov/mpi/mpich/
8. libGlass – distributed computing library
http://libglass.sourceforge.net/download.php
9. Alchemy – distributed computing library
http://www.alchemi.net/index.html
10. hxGrid binaries
http://sourceforge.net/project/showfiles.php?group_id=236427
11. Программирование с использованием COM-подобных интерфейсов.
http://www.dtf.ru/articles/read.php?id=44995
12. То, что вам никто не говорил о многозадачности в Windows
http://www.dtf.ru/articles/read.php?id=39888
13. The GNU MP Bignum Library
http://gmplib.org/
14. ATI Normalmapper
http://ati.amd.com/developer/tools.html
15. Jedy Visual Code Library
http://homepages.borland.com/jedi/jvcl/
16. PlayStation 2: Computational Cluster
http://arrakis.ncsa.uiuc.edu/ps2/cluster.php
17. Unmodified Xbox Cluster
http://www2.cs.uh.edu/~bguillot/xbox/home.html
18. Информационно-аналитический центр parallel.ru
http://www.parallel.ru/
19. Распределенные вычисления: поиск лекарства от рака
http://www.3dnews.ru/reviews/software/cure-for-cancer/
20. РАСПРЕДЕЛЕННЫЕ МОЗГИ
http://www.fuga.ru/articles/2003/01/distributed.htm
21. Взлом NTV+ с помощью распределенных вычислений
http://www.xakep.ru/post/20600/default.asp
22. Знаете ли вы, что большинство времени ресурсы компьютера используются менее чем на 5%?
http://distributed.ru/?what-is
23. Распределенные вычисления с минимальными затратами
http://msk.nestor.minsk.by/kg/2002/07/kg20708.html
24. Распределенные вычисления на FreePascal под Windows.
http://freepascal.ru/article//raznoe/20051207110629/
25. Распределенные вычисления - паразитные вычисления
http://center.fio.ru/method/resources/judina/05-03/news/parazit.htm
26. Sony рассматривает возможность продавать процессорное время PS3
http://www.gamasutra.com/php-bin/news_index.php?story=13476
27. ZLIB
http://www.zlib.org
28. NVIDIA Texture Tools 2 Alpha
http://developer.nvidia.com/object/texture_tools.html
29. hxGrid source code
http://sourceforge.net/projects/hxgrid/
30. xNormal
http://www.xnormal.net/
31. Author’s page
http:/www.deep-shadows.com/hax/
Version 1.09d
------------------------
- fixed critical bug: unable to complete last few tasks,
if task takes more then 20 sec;
Version 1.09c
------------------------
hxGridUser and agent are changed in this release.
- new option: 'allowDiscardCoordinatorIP' - seel hxgrid.ini for description;
- fixed bug: hxGridUser is unable to run tasks, if single task size is larger then
allowed memory usage for hxGridUser.
Version 1.09b
------------------------
- added method "IGridUser->GetConnectionStatus()";
Version 1.09a
------------------------
- added Windows 2000 support;
Version 1.09
------------------------
This is very stable release with Windows Vista support.
- fixed critical bug: memory corruption in agent;
- fixed critical bug in scktcomp.pas delphi component (thread-safety issue);
- fixed critical bug: agent is unable to load task dll in Windows Vista;
- fixed thread-safety bug in agent and griduser (AddRef()/Release() should use InterlockedXXX);
- fixed memory leak in agent (IdUDPServer not freed);
- fixed IGridUser->GetSettings() and IGridUser->SetSettings() .h headers;
- IGridUser->isComplete(var complete:boolean) now returns TRUE
when no tasks is running (was: FALSE);
- hxgriduseddll.dll updated in PICalculator\C++ example;
- fixed AutoUpgrade example: unable to upgrade
due to relevance to current directory;
- fixed memory leak in GridGMP example;
- fixed DebugWrite bug: exeption on invalid strings causes agent to lost connection;
- fixed possible thread-safety issue in agent.filecache;
- new method: IAgent->GetSessionCacheDirectory();
- new method: IAgent->GetGlobalCriticalSection();
- new IGridUser setting: failed_agent_suspend_timeout;
- updated TAgentSettings in IAgent.h;
- IAgent.h and I_Agent.pas documented in English;
- Normalmapper_hxGrid_Setup installs 3ds MAX and Maya NMF export plugins;
Version 1.08a
------------------------
- article translated to English;
- headers are documented in English;
- cancellation support;
- fixed bug in IAgent->GetData();
- fixed bug: freezing on 99.99%;
- fixed bug: lost tasks if IAgent->GetData() failed;
- fixed bug: hxgrid application is unable to work,
if workstation has not been rebuted more than 25 days;
- settings tuned for large IAgent->GetData() size;
- fixed GridGMP sample (all drawing is done in main thread);
- network and memory overload tracking: at the start of the session,
too many agents request data from user. Library tracks network and memory load;
- GridGMP example shows cancellation support and progress display;
- ATI Normalmapper installer installs NMF export plugins
to plugin directories of 3DS MAX and Maya;
- more accurate progress display in ATI Normalmapper;
- NMFExport plugins for Maya 7.0, 8.0 and 8.5 included;
Note: agent and coordinator are modified, but have the same interace versions.
Upgrade recommended (see Examples\Autoupgrade\);
Version 1.08 beta
------------------------
- initial public release;
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 23 Aug 2008 Editor: |
Copyright 2008 by hax_ Everything else Copyright © CodeProject, 1999-2009 Web09 | Advertise on the Code Project |