The Code Project brings a handful of resources to the community and many, including myself, benefit from it. Since the site has been very beneficial to me, I thought it was about time I contributed something in return to the community by posting an article of my own. I wanted to post for a while but never found (or should I say, took) the time to do so. Well, here we go for my first post; please feel free to provide your feedback and questions which I will try to answer to the best of my knowledge.
The .NET Framework has greatly improved support for multithreaded applications, thereby making it easier for the programmer to create such applications. Even though the .NET Framework provides an enormous amount of tools and resources to accomplish a multitude of tasks, there are still many
programmers using older libraries such as WTL/ATL/MFC or even the good old Win32 API to accomplish their tasks.
For some tasks I'm part of the latter group of programmers and I sometimes find it difficult as well as time and energy consuming to switch from using the .NET Framework. For this reason, and until C++0x brings multithreading support to a compiler near you, I thought I would re-create in C++ one of the features I use most from the .NET Framework: Asynchronous Design Patterns, in particular, the ability to invoke a function asynchronously (e.g. on a thread pool) using a delegate and then retrieve its result later.
The .NET Framework Asynchronous Design Patterns is also characterized by an interface to synchronize a function call on a specific thread (ISynchronizeInvoke
) which I also included in this implementation.
The main goal of this implementation is to bring a very basic implementation of some of the .NET Asynchronous Design Patterns to the C++ language. Since I mostly use WTL/ATL when I develop native Win32 applications, I made this implementation blend well with its respective style. By no means does this implementation provide all the answers and should be considered only a starting point and a learning project for future ideas and enhancements.
Keeping in mind that dependencies are sometimes painful to manage and considering the size of this project, I made the implementation fit inside a single header file, making it very easy to include as part of an existing project. If I wasn't worried about extra dependencies, I could have used the great Boost C++ libraries to benefit from their extended (and portable) support for functors, tuples and threads, however I decided to start lean in dependencies for this tiny project. Besides, Boost wouldn't have blended well with the WTL/ATL style because of its naming convention and its heavy use of templates following the STL guidelines. The only library dependency enforced by this implementation is the use of ATL (specifically CThreadPool
) which can easily be removed by re-implementing the CDelegate
class to use a custom thread pool (or any other means to execute the delegates).
You should also note that the .NET Framework benefits from the CLR features, therefore forcing my implementation to simulate some behaviors using different techniques and mechanisms. Additionally, some features have just not been implemented (e.g. support for multicast delegates, variadic arguments, etc.).
More specifically here is what this implementation provides:
- Documented source code to an example implementation of a portion of the Asynchronous Design Patterns
- Capability to asynchronously call a global or class function and retrieve its result at a later time
- Support for callback upon completion of a delegate asynchronous call
- A means to detect if an asynchronous call is required and capability to synchronize that call on a specific thread
- Automatic memory management for delegates and asynchronous resources
And doesn't provide:
- Thread safety
- Delegate argument type safety
- Variadic arguments (this implementation has a pre-defined delegate signature)
- Multicast delegates
- Exception support
- Error handling
To better understand this article you should be familiar with asynchronous design patterns. There are many great articles about them right here on The Code Project and here are some that
caught my attention:
In addition, the .NET Asynchronous Design Patterns are fully documented on the MSDN website:
Asynchronous Programming Design Patterns
Since this implementation uses Windows I/O Completion Ports you should also consider understanding how they work. You will find relevant documentation on MSDN:
I/O Completion Ports (Windows)
There is also use of the ATL Worker Archetype in this project to execute the delegates asynchronously through the CThreadPool
class, also documented on MSDN:
ATL Server Library Reference - Worker Archetype
The core of this implementation relies on IAsyncResult
, ISynchronizeInvoke
and IDelegate
. IDelegate
doesn't exist in .NET but I needed an interface to define a delegate, which doesn't exist in C++ (and function pointers are much lower level than delegates). I was guided by the .NET Framework for implementing IAsyncResult
(which encapsulates all the resources of an asynchronous call) and created 2 classes, CAsyncResult
and CThreadMethodEntry
to be used respectively by CDelegate
and CSynchronizeInvoke
Let's continue with a class diagram (an image is worth a thousand words):

NOTE: CMainDlg
is part of the demo application, not the implementation.
The diagram should be self explanatory and the design follows some of the semantics from similar .NET classes, but here are some more details:
- While
, IDelegate
and IClassDelegate
could find use in an application (IDelegate
more than the others), they were designed to be used internally by this implementation. IAsyncResult
is used for keeping a reference to an asynchronous call. It is important to keep this asynchronous call reference and call EndInvoke
with it otherwise it will result in a memory leak unless it is called as a "fire & forget" type of call (further details on that topic below).- It is important to mention that
and ISynchronizeInvoke
are closely related and both have a BeginInvoke
and EndInvoke
method, however their meaning is very different.
is used to begin an asynchronous call for the function wrapped by the delegate implementing the IDelegate
interface while ISynchronizeInvoke::BeginInvoke
is meant to asynchronously call the provided delegate on the thread the object implementing the ISynchronizeInvoke
interface is living on (or any thread it decides to call it on for that matter). So the delegate passed to IDelegate::BeginInvoke
is a callback, while the delegate passed to ISynchronizeInvoke::BeginInvoke
is the delegate to call on the thread selected by the object implementing ISynchronizeInvoke
. IAsyncResult
follows the same semantics as its .NET counterpart aside from the fact that it is missing the CompletedSynchronously
getter, which didn't have a use in this implementation.CAsyncResult
and CThreadMethodEntry
are respectively used to encapsulate the management of resources associated with an asynchronous call from a CDelegate
and a CSynchronizeInvoke
. CAsyncRequest
is simply an abstract class implementing common portions of CAsyncResult
and CThreadMethodEntry
Before moving forward with the interface and implementation, I would like to address the question of whether calling EndInvoke()
is required or not. In the .NET implementation, if you don't call EndInvoke()
on a delegate asynchronous call, it is stated that you will leak resources (documented on MSDN), but if you don't call EndInvoke()
on a synchronized call (as used by WinForms), apparently no leak will occur (not officially documented).
Since the documentation of the interface states that EndInvoke()
should always be called, it is up to the implementer to officially document any exceptions to the interface rules. Since no official documentation exists to state that EndInvoke()
is optional in the WinForms implementation, one can assume it is required. (This still can be discussed as the WinForms implementation doesn't really follow the interface definition of the pattern.)
To get around this issue, I modified the interface definition to include an extra parameter to state whether the call is a "fire & forget" type or if EndInvoke()
will be called to retrieve the return value (and free up allocated resources). In order to follow some similarity to the .NET implementation I included a default value to "fire & forget" for a synchronized invoke and not "fire & forget " for a delegate asynchronous call.
Below are the rules around the "fire & forget" scenario in this implementation:
- For non-"fire & forget" calls, always call
for both delegate asynchronous and synchronized calls, otherwise resources will be leaked. - For "fire & forget" calls, never call
for delegate asynchronous or synchronized calls as the resources associated with the call have already been disposed.
Below are the signatures and argument details for the core functionalities:
IAsyncResult *BeginInvoke(ULONG_PTR ulParam = 0, IDelegate *pCallback = NULL, LPVOID pvState = NULL, BOOL bFireAndForget = FALSE)
- The delegate parameter; this will be passed to the function wrapped by the delegate classpCallback
- The callback delegate; an optional delegate that wraps a function to be called upon completing the asynchronous call. In order to simplify the use of callbacks (any resources associated with delegates) they are auto deleted upon completion. (Always use the provided delegate macros to create new delegates. See below for details)pvState
- An optional state associated with the asynchronous call; could be anything you want to keep in contextbFireAndForget
- Whether this call is a "fire & forget" scenario or not.
IAsyncResult *BeginInvoke(IDelegate *pDelegate, ULONG_PTR ulParam = 0, BOOL bFireAndForget = TRUE)
- The delegate wrapped function to call; this call will be made on the thread the object implementing ISynchronizeInvoke
decides to make the call on, typically the thread the object implementing the interface lives on; a GUI thread for the .NET Framework implementationulParam
- The delegate parameter, passed to the function wrapped by pDelegate
- Whether this call is a "fire & forget" scenario or not.
and ISynchronizeInvoke::EndInvoke
ULONG_PTR EndInvoke(IAsyncResult **ppAsyncResult)
- The reference pointer of the asynchronous call to end; this has been implemented as a double pointer since the resulting pointer is set to NULL
to signal that an IAsyncResult
reference pointer must never be used after calling EndInvoke
(as all associated resources have been deallocated).- The return value of
is the return value of the function wrapped by the delegate. Clients must use a cast to recover the original type. EndInvoke
has the same signature for both an IDelegate
or ISynchronizeInvoke
. To retrieve the result of an asynchronous call one must call EndInvoke
. As stated above EndInvoke
must always be called to claim back allocated resources for the asynchronous call unless BeginInvoke
was called as a "fire & forget" type of call. If EndInvoke
is called before the asynchronous call has time to complete, it will block until the call has completed. It is also worth mentioning that callbacks are never part of an asynchronous call time frame; that is, calling EndInvoke
will not block until the callback has completed. This is important because otherwise calling EndInvoke
inside a callback would result in a deadlock.
An important note about EndInvoke
: Calling EndInvoke
with another IAsyncResult
than the one returned by the matching BeginInvoke
call will result in an InvalidOperationException
in the .NET version. The same rule applies to this implementation and the IAsyncResult
passed to EndInvoke
must be the one returned by the matching BeginInvoke
. This is important as delegates use a CAsyncResult
as IAsyncResult
while synchronized calls use a CThreadMethodEntry
as IAsyncResult
Implementing IDelegate
is pretty straightforward, however there are some issues (you know, there always are some issues, oops, I mean challenges). First, C++'s support for variadic arguments is quite limited; I know of only va_list
and the use of overloaded templates to achieve the wanted behavior. Second, function pointers are very low level and they offer no flexibility, making it difficult to adapt them in this context. (C++0x may change all of this.) After investigating many options (e.g. one could have used the Boost C++ libraries) I decided I wouldn't break my head on this and opted for a lazy fix to solve both issues: use only one argument and make the delegate signature pre-defined. The side effects are limited to:
- casting the argument and result back and forth, and
- clients are forced to use the pre-defined signature for all delegates, including callbacks (which are also delegates).
These limitations are enforced by my implementation, not by the design of the pattern.
Creating new delegates is the same as using the .NET Framework, simply pass a function pointer to the constructor. To make a class instance member function delegate, however,you will need CClassDelegate
. (See below for more details.)
Since this implementation queues asynchronous delegate calls on a thread pool, the first thing a delegate must do is initialize the thread pool. This is basically done on the first asynchronous call of any delegate, however one could make the initialization function public and call it upon starting the application.
To queue a new asynchronous call, the implementation simply posts the new request on the thread pool:
IAsyncResult *BeginInvoke(ULONG_PTR ulParam = 0, IDelegate *pCallback = NULL,
LPVOID pvState = NULL, BOOL bFireAndForget = FALSE)
CAsyncResult *pAsyncResult = new CAsyncResult(this, ulParam, pCallback,
pvState, bFireAndForget);
BOOL bRes = pAsyncResult->Initialize();
bRes = m_cThreadPool.QueueRequest(pAsyncResult);
return pAsyncResult;
The machinery behind processing queued requests lives within the ATL Worker Archetype compliant class CDelegateWorker
. This is a very simple, self explanatory class where Initialize
and Terminate
are called once for each thread in the pool and Execute
is called every time the thread is cycled. (You can read the header file atlutil.h for implementation details; it is part of ATL.)
class CDelegateWorker
typedef CAsyncResult * RequestType;
BOOL Initialize(LPVOID ) throw()
return TRUE;
VOID Execute(
CAsyncResult *pAsyncResult,
VOID Terminate(LPVOID ) throw()
By now you may have noticed I haven't mentioned the details of passing a class member function pointer to a new CDelegate
, which is another challenge ;-) . This is where CClassDelegate<T>
comes into play. In order to accept a class member function pointer, CDelegate
would need to use a template, which I didn't like considering the fact that it is a central class in this design implementation and would have affected many other classes because of template dependencies. Consequently I decided to create a tiny class to wrap around the details of keeping a class member function pointer and since that class uses a template, using it within CDelegate
would require adding a template to CDelegate
which would defeat the purpose of this class. So an interface definition IClassDelegate
was created to decouple from the template dependency. The interface basically provides a mean to invoke the class member function and delete the CClassDelegate
wrapper automatically. All of this means you can do something like this:
VOID SomeClass::SomeFunction()
CDelegate *pDelegate = new CDelegate(new CClassDelegate<SomeClass>(this,
without worrying about the new CClassDelegate<T>
memory allocation. In fact, you shouldn't even worry about the memory allocation for pDelegate
in a "fire & forget" scenario, since it will also be auto-deleted when its time comes. This is all very nice but creating a delegate for a class instance member function makes every declaration very lengthy so I also created some macros to help make things more concise:
VOID SomeClass::SomeFunction()
MAKECLSDELEGATE(CMainDlg, SomeOtherFunction)->BeginInvoke(0,
MAKECLSDELEGATE(CMainDlg, SomeCallback));
There are three macros to ease the creation of delegates, they are self explanatory:
(new CDelegate(&MEMBER))
(new CDelegate(new CClassDelegate<class>(this, &CLASS::MEMBER)))
(new CDelegate(new CClassDelegate<class>(INSTANCE, &CLASS::MEMBER)))
For implementing ISynchronizeInvoke
, I thought it would be a great opportunity to use Windows I/O Completion Ports. Even though I/O completion ports are much more useful than indicated in this implementation, they still provide an efficient and easy way of queuing a request to be processed in a different thread. This makes implementing ISynchronizeInvoke
a breeze. Below is the core of queuing/processing the asynchronous calls. (Note the similarity to queuing an IDelegate
asynchronous call.)
IAsyncResult *BeginInvoke(IDelegate *pDelegate, ULONG_PTR ulParam = 0,
BOOL bFireAndForget = TRUE)
CThreadMethodEntry *pMethod = new CThreadMethodEntry(this, pDelegate, ulParam,
BOOL bRes = pMethod->Initialize();
bRes = m_cPort.PostQueuedCompletionStatus(0, (ULONG_PTR)pMethod, NULL);
::PostThreadMessage(m_dwThreadId, WM_THREADCALLBACK, 0, 0);
return pMethod;
Processing pending requests is done by calling ProcessPendingInvoke()
which is called by IsThreadCallbackMessage
from within a loop mechanism, typically a message pump (e.g. from PreTranslateMessage
in WTL). By default, the function processes two asynchronous calls in order to minimize the disturbance of performing a function call on the receiving thread, potentially a GUI thread. You can pass a higher number of requests to process if you're not receiving the calls on a GUI thread.
BOOL IsThreadCallbackMessage(MSG *pMsg)
if (pMsg->message == WM_THREADCALLBACK)
return TRUE;
return FALSE;
VOID ProcessPendingInvoke(DWORD dwCount = 2)
if (m_cPort.GetQueuedCount() < 1 )
DWORD dwBytes = 0;
ULONG_PTR ulKey = 0;
while (dwCount-- > 0 && m_cPort.GetQueuedCompletionStatus(&dwBytes, &ulKey,
&pOverlapped, 0))
CThreadMethodEntry *pMethod = reinterpret_cast<cthreadmethodentry>(ulKey);
As you can see, this is pretty straightforward. Using an I/O completion port is a personal choice and one could use a different mechanism (i.e. pass pMethod
). I also looked at using an APC with QueueUserWorkItem
, however it works only when the calling and receiving threads are the same, which is useless in this context.
and CThreadMethodEntry
These two classes are the core of an asynchronous call; they contain all the information relating to the call such as the call wait handle, a pointer to the caller, etc. Note that their implementation was modeled to fit with CDelegateWorker
's style (Initialize
, Execute
, Terminate
Interestingly enough, it is mentioned in the .NET implementation that clients can cast a delegate's BeginInvoke
resulting IAsyncResult
to AsyncResult
for accessing additional resources linked to the asynchronous call. See http://msdn.microsoft.com/en-us/library/system.runtime.remoting.messaging.asyncresult.aspx for more details. In this same manner, you can cast an IAsyncResult
to CAsyncResult
for delegate calls, which is very useful, especially for retrieving the delegate pointer using GetAsyncDelegate()
so you can call EndInvoke()
to retrieve the call's return value.
The only real challenge of creating these two classes was implementing their Terminate()
member function. Once an asynchronous call has been executed it must be terminated; that is, it must check whether the call has completed, deallocate resources and, optionally, call EndInvoke()
in a "fire & forget" scenario. Also, since CAsyncResult
is used for a delegate asynchronous call, it is also necessary to support a callback function.
Below are the implementations of CAsyncResult
and CThreadMethodEntry
's Terminate()
member functions respectively:
VOID CAsyncResult::Terminate()
ATLASSERT(m_bIsCompleted == FALSE);
m_bIsCompleted = TRUE;
ATLASSERT(m_hAsyncWaitEvent != NULL);
BOOL bRes = ::SetEvent(m_hAsyncWaitEvent);
if (m_pCallback != NULL)
if (m_bFireAndForget)
delete this;
if (m_bFireAndForget)
VOID CAsyncResult::EndInvoke()
IAsyncResult *pAsyncResult = this;
ATLASSERT(m_pDelegate != NULL);
ATLASSERT(pAsyncResult == NULL);
VOID CThreadMethodEntry::Terminate()
ATLASSERT(m_bIsCompleted == FALSE);
m_bIsCompleted = TRUE;
ATLASSERT(m_hAsyncWaitEvent != NULL);
BOOL bRes = ::SetEvent(m_hAsyncWaitEvent);
if (m_bFireAndForget)
IAsyncResult *pAsyncResult = this;
ATLASSERT(m_pCaller != NULL);
ATLASSERT(pAsyncResult == NULL);
If you have used this pattern with the .NET Framework you should find it very easy to use this implementation with C++. Since the implementation takes care of delegate allocations, you are not required to keep a delegate reference pointer, which makes it even closer to the .NET implementation. Also, because of the added support for the "fire & forget" scenario, it is easy to just call a function in order to avoid blocking the current thread. However, it is very important to keep track of all the asynchronous calls made one way or another, especially during application shutdown, otherwise unexpected behavior may occur.
Asynchronous delegate calls
To call a delegate asynchronously create a CDelegate
function wrapper, call BeginInvoke
to start the call and EndInvoke
to terminate it. Typically you would call EndInvoke
from within the callback, if one is provided. You typically would create all delegates using the provided macros. So assuming you have the following member functions defined in class CMainDlg
ULONG_PTR CMainDlg::Test1(ULONG_PTR ulParam)
ULONG_PTR CMainDlg::Test1Callback(ULONG_PTR ulParam)
You would make an asynchronous call like this:
MAKECLSDELEGATE(CMainDlg, Test1)->BeginInvoke(0, MAKECLSDELEGATE(CMainDlg, Test1Callback));
and in the callback function, you would then retrieve the function's return value:
CAsyncResult *pAsyncResult = (CAsyncResult *)ulParam;
IDelegate *pDelegate = pAsyncResult->GetAsyncDelegate();
UINT nRes = (UINT)pDelegate->EndInvoke((IAsyncResult **)&pAsyncResult);
ATLASSERT(pAsyncResult == NULL);
For "fire & forget" scenarios, simply pass TRUE
for bFireAndForget
and the delegate will take all responsibility for deallocating all resources once the asynchronous call has completed:
MAKECLSDELEGATE(CMainDlg, Test1)->BeginInvoke(0, MAKECLSDELEGATE(CMainDlg, Test1Callback),
Synchronized calls
To use synchronized calls (similar to the WinForms implementation) with WTL, the first step is to derive your window class from CSynchronizeInvoke
class CMainDlg :
public CDialogImpl<cmaindlg>,
public CWinDataExchange<cmaindlg>,
public CMessageFilter,
public CSynchronizeInvoke
Then, once your class has been added as a CMessageFilter
, you need to check for thread callbacks as part of the message pump. This can be done be overriding PreTranslateMessage
virtual BOOL PreTranslateMessage(MSG *pMsg)
if (CSynchronizeInvoke::IsThreadCallbackMessage(pMsg))
return TRUE;
Now all the machinery to execute the synchronized calls is in place. To make a synchronized call simply call BeginInvoke:
BeginInvoke(MAKECLSDELEGATE(CMainDlg, Test2), 0);
And the function Test2()
will be executed on the same thread on the next iteration of the message pump loop.
As you can see, there are differences between this implementation and the .NET Framework's implementation for this portion of the Microsoft asynchronous design patterns. The goal was not to create an identical implementation but a similar one, to provide a very similar pattern implementation to be used in C++.
Many areas of the implementation have not been discussed, such as waiting for an asynchronous call to complete using the wait handle, polling an asynchronous call to determine whether it has finished or not, etc. Even though they are very straightforward to use, these features are worth being mentioned. Please refer to the demo application for more examples on how to use this implementation and its features. In most cases, if you have used asynchronous calls in .NET, you will find this implementation's features very easy to use since they mostly mimic the .NET implementation.
This implementation could really benefit from having error handling and exception support. Routing exceptions raised in threads is very important and should be handled by calling EndInvoke
within try
statements as done when using .NET.
Also, this implementation provides only the basics of working asynchronously. Of course, some questions are raised such as "How can I cancel an asynchronous operation?" or "How can I report progress from the asynchronous operation?", etc. A good communication between the calling and receiving threads is very important and ISynchronizeInvoke
helps in achieving good results, but its use it still quite low level. Looking forward, one may implement a BackgroundWorker class which could provide a solution to some of the questions on cancelling an asynchronous operation and progress reporting.
There is plenty of information on the web regarding the Asynchronous Design Patterns, here are a few links of interest from my bookmarks list:
- 23th August, 2009: Initial post