Cross thread calls in native C++

einaros

4.85/5 (38 votes)

Dec 10, 2006

Apache

20 min read

120507

1215

An article which discusses the need for synchronization in multi-threaded applications, and features a generic framework for making calls across threads: ThreadSynch.

Download source files + Example projects - 271.4 Kb

Preface

Thanks to those of you who take the time to read through the article. If you feel like dropping off a vote (and particularily if it's a low one), please include a comment which mentions what the problem was. I've been getting mostly high votes for this article, apart from the odd 1's or 2's, and I'd really like to know what bothered those voters the most. Feedback is what drives improvement.

Introduction

This article and attached library + demo projects aim to describe an approach to cross thread synchronized function calls. Cross thread calls is in essence the process of having one thread instruct another thread to call a function.

Serializing calls to certain code sections, functions and classes is very much the reality of multi-thread programming, but there are ways to go about this effort without introducing too much code or complex logics. If you're interested, take the time to read through the following paragraphs. It's an introduction to the concept, as well as a description of my way of solving it. If you're short on time -- feel free to skip ahead to the section "Using the code", which gives a quick sum-up of the library, ThreadSynch.

ThreadSynch requires

VC++ 8 (2005). Not tested on previous versions, but unlikely to compile on VC6 due to it's standard conformance issues.
The excellent boost libraries (http://www.boost.org). Compiled and tested on version 1.33.1.
Love.

ThreadSynch features

A generic way of doing cross thread calls, easily extended by pickup policies.
C++ Exception transport across threads.
Parameter and return value transport across threads.
A guarantee that cross thread calls won't occur if the scheduling times out -- it's safe to postpone the job.

Background

An introduction to synchronized calls

I assume that you, the reader, is at least vaguely familiar with threads, and all the pitfalls they introduce when common data is being processed. If you're not, feel free to read on, but you may find yourself stuck wondering what all the fuzz was about in the first place.

A classical example is the worker thread which fires off a callback function in a GUI class, to render some updated output. There are a bunch of different approaches, let alone patterns (e.g. Observer), to use in this case. I'll completely disregard the patterns, and focus on the actual data and notification. The motivation for doing cross thread calls is; 1. to simplify inter-thread notifications, and 2. avoid cluttering classes and functions with more synchronization code than what's absolutely necessary.

Imagine the worker class Worker, and the GUI class SomeWindow. How they are associated makes little or no difference, what's important is that Worker is supposed to call a function, and/or update data in SomeWindow. The application has two threads. One "resides" in Worker, and the other in SomeWindow. Let's say that at a given point in time, the Worker object decides to make a notification to SomeWindow. How can this be done? I can sum up a few of the possible approaches, including major pros/cons.

Worker accesses, and updates, a data member in SomeWindow.
- Pros: It's quick.
- Cons: It's dirty. More specifically, it breaks encapsulation. If this operation is done without some kind of interlocking (mutex / criticalsection / semaphore / etc.), the worker and window threads may both try to access the data member at once, and that is most certain to wreak havoc on our application. If we're lucky, it'll just cause an access violation. If SomeWindow exposes an object for interlocking, we break the encapsulation even further, unleashing ghosts such as deadlocks.
Worker calls a function within SomeWindow, which updates a data member for us.
- Pros: Granted the proper interlocking, it's relatively safe.
- Cons: SomeWindow will be bloated with code for interlocking, in the worst possible case, one lock object per updatable piece of member data. It also arguably weakens the cohesion, by introduction of those very locks. Dealing with the complexities of threads, interlocking and synchronization in a verbose way is simply not very ideal in a GUI class.
Worker sends a Window Message to SomeWindow, with the update data in a structure. SomeWindow deals with the message and somehow handles the data.
- Pros: Relatively safe, if SendMessage is used.
- Cons: Cohesion slightly weakened. Parameter translation and transport can become tiresome, as custom or generic structures are needed for each unique value lineup. The most prominent drawback of this approach is the link to window messages; it's not really practical for non-GUI scenarios.
Worker calls a function within SomeWindow, which updates a data member for us, by use of a synchronized re-call.
- Pros: Safe. Relatively effective. No bloat worth mentioning.
- Cons: Cohesion slightly weakened. The code fundament is a wee bit more complex than it would be without the threads, but it's by no means incomprehensible, and the end-of-the-line code will be quite pleasant.

A framework is born

Throughout the last few years, I've had a number of approaches to this field of problems. Usually, I've ended up using a mix of #2 and #3 as listed above. While I've made a few abstractions, and integrated this in a threading library, there was nothing major about it. It wasn't till I had a crack at the .NET framework, and more specifically the InvokeRequired / BeginInvoke techniques, that I started pondering doing the same in a native framework. The .NET framework approach really is appealing from a usage point of view, as it introduces a bare minimum of alien code to, say, the business logic. While many would argue that the ideal approach would be to avoid synchronization altogether, and rely on the operating system to deal with the complexities related to cross thread calls and simultaneous data access; that's not likely be part of any efficiency focused application anytime soon.

I won't go into the details of my first few synchronization frameworks, but rather be focusing on the one I typed up specially for this read. It is, as mentioned, based on the ideas from the .NET framework, but it's not quite the same. Granted the differences between native and managed code, as well as the syntactical inequalities, the mechanics have to be a little different, and so is the use. The motivation of the framework is obviously to simplify cross thread calls, which may or may not access shared resources. It goes to great lengths to be safe, flexible, and reliable in terms of its promises to the user. The flexibility is achieved through the introduction of templated policies for the notifications made across the threads, as well as functors and parameter bindings from Boost. I'll get back to the reliable part in a jiffy.

The base principle is quite simple. Thread A needs to update or process data logically related to Thread B. To do this, A wants to issue a call in context of B. Thread B is of a nature which allows it to sleep or wait for commands from external sources, so that'll be the window in which A can make it's move. Thread B would ideally be GUI related, a network server / client, an Observer (as in the Observer Pattern) or similar.

What needs to be done is:

Thread A must call a function to schedule execution in Thread B, with or without parameters.
While the call waits to be executed, Thread A must be suspended. If the call doesn't end within a critical period of time, the control must be given back to Thread A, with a notification that the call failed. If A is notified of a call timeout, the call must be guaranteed not to take place.
Thread B is notified that a call should be executed. We'll call this the PickupPolicy, since B will have to pickup an instruction from A to do some task. This is where the policy comes in.
Thread B will execute the scheduled call, which may or may not return a value, and continue about its business.
Thread A returns the resulting value, and also picks up where it left off.

The pickup policy, or more specifically the way Thread A delivers the notification to Thread B, can involve a number of different techniques. A couple worth mentioning are UserAPCs (user-mode asynchronous procedure call) and Window Messages. QueueUserAPC() allows one to queue a function for calling in context of a different thread, and relies on the other thread to go into alertable wait for the call to be made. Alertable waits have their share of problems, but I'll disregard those for now. In terms of the GUI type thread, window messages are a better alternative. The pickup policies make up a fairly simple part of this play, but they are nevertheless important in terms of flexibility.

A synchronization example using cross thread calls

Ok, so we've covered the motivation, as well as some of the requirements. It's time to give off an example of how the mechanism can be used. For the sake of utter simplicity, I will not bring classes and objects into the puzzle just yet. Just imagine the following simple console program:

char globalBuffer[20];

DWORD WINAPI testThread(PVOID)
{
    // Keep sleeping while the event is unset
    while(WaitForSingleObjectEx(hExternalEvent, INFINITE, TRUE) != 
          WAIT_OBJECT_0)
    {
        Sleep(10);
    }

    // Alter the global data
    for(int i = 0; i < sizeof(globalBuffer) - 1; ++i)
    {
        globalBuffer[i] = 'b';
    }
    globalBuffer[sizeof(globalBuffer) - 1] = 0; // null terminate

    // Return and terminate the thread
    return 0;
}

int main()
{
    DWORD dwThreadId;
    CreateThread(NULL, 0, testThread, NULL, 0, &dwThreadId);
    ...

There's nothing out of the ordinary so far. We've got the entry point, main, and a function, testThread. When main is executed, it will create and spawn a new thread on testThread. All testThread does in this example, is to wait for an external event to be signaled, and then alter a data structure, globalBuffer. What's important is that the thread is waiting for something to happen, and while it's waiting we can instruct it to do some other stuff. Our objective is therefore to have the thread call another function, testFunction:

string testFunction(char c)
{
    for(int i = 0; i < sizeof(globalBuffer) - 1; ++i)
    {
        globalBuffer[i] = c;
    }
    globalBuffer[sizeof(globalBuffer) - 1] = 0; // null terminate
    return globalBuffer;
}

testfunction will alter the global buffer, setting all elements except the last to the value of the char parameter c, then null terminate it and finally return a new string with the global buffer's content. What we can tell straight away, is that testFunction and testThread may alter the same buffer. If our main thread executed testFunciton directly, it could get around to alter the first 10 or so elements of the global before being swapped out of the CPU. If the external event in testThread were to be signaled at this point, that thread would also start altering the buffer. The string returned from testFunction would obviously contain anything but what we expect it to.

While this example doesn't make much sense in terms of a real world application as it is, the concept is very much realistic. Imagine, if you wish, that the global buffer represents the text in an edit box within a dialog, and that testThread is supposed to alter this text based on a timer. At certain intervals, external threads may also wish to update the same edit box with additional information, so they call into the GUI's class (which in this simplistic example is represented by testFunction). To avoid crashes, garbled text in the text box, or other freaky results, we want to synchronize the access. We don't want to add a heap of mutexes or ciritcal sections to our code, but rather just have the GUI thread call the function which updates the text. When the GUI thread alone is in charge of updating its resources, we're guaranteed that all operations go about in a tidy order. In other words: there will be no headache-causing crashes and angry customers.

So, instead of adding a whole lot of interlocking code to both testThread and testFunction, which both update the global buffer, we use a cross thread call library to have the thread which owns the shared data do all the work.

int main()
{
    DWORD dwThreadId;
    CreateThread(NULL, 0, testThread, NULL, 0, &dwThreadId);
  
    CallScheduler<APCPickupPolicy>* scheduler = 
        CallScheduler<APCPickupPolicy>::getInstance();
  
    try
    {
        // Create a boost functor with a bound parameter.
        // The functor returns a string, and so will the
        // synchronized call.
        boost::function<string()> callback = 
            boost::bind(testFunction, 'a');

        // Make the other thread call it. The return value
        // is deduced from the functor.
        string dataString = scheduler->syncCall
            (
                dwThreadId,                     // Target thread
                callback,                       // Functor with parameter
                500                             // Milliseconds to wait
            );

        cout << "testFunction returned: " << 
                dataString << endl;
    }
    catch(CallTimeoutException&)
    {
        // deal with the problem
    }
    catch(CallSchedulingFailedException&)
    {
        // deal with the problem
    }
  
    return 0;
}

CallScheduler makes all the difference here. Through fetching a pointer to this singleton class, with the preferred pickup policy (in this case the APCPickupPolicy), we can schedule calls to be made in context of other threads, granted that they are open for whatever mechanism the pickup policy uses. In our current example, we know that the testThread wait is alertable, and that suits the APC policy perfectly. To attempt to execute the call in the other thread, we call the syncCall function, with a few parameters. The template parameter is the return type of the function we wish to execute, in this case a string. The first parameter is the id of the thread in which we wish to perform the operation, the second parameter is a boost functor, and the third is the number of milliseconds we are willing to wait for the call to be initiated. The use of boost functors also allows us to bind the parameters in a timely fashion. As you can see in the above call, testFunction should be called with the char 'a' as its sole parameter.

At this point, we wait. The call will be scheduled, and will hopefully be completed. If the pickup policy does its work, the call will be executed in the other thread, and we are soon to get a string from testFunction as returned by syncCall. Should the pickup fail or timeout, an exception will be thrown. Consider the example -- it really should make it all pretty clear. In the attached source, there are two example projects. One consists of the code shown above, and the other is a few inches closer to real world use -- in a GUI / Worker thread application.

Limitations, always with the limitations

There are a few restrictions on the use of a framework such as the one described here. Some are merely points to be wary of, while others are showstoppers.

The parameters passed to a function which will be called from another thread, should not use the TLS (Thread Local Storage) specifier - that goes without saying. A variable declared TLS (__declspec(thread)) will have one copy per thread it's accessed from. In terms of the previous example, the main thread would not necessarily see the same data as testThread, even with the value passed through the synchronized call mechanism to testFunction. In short: there's nothing stopping you from passing TLS, but through doing so you are bound to see some odd behavior. The general guideline is to be thoughtful. Don't pass anything between threads without knowing exactly what the consequences are. Even though the mechanism, or rather principle, of cross thread synchronized calls goes to great lengths to keep the task simple; there are always ways to stumble.

A couple of guidelines and requirements regarding parameter passing and returning, in an example scenario where Thread A does a synchronized call to Function F through Thread B:

If Function F has to return pointers or references, make them const so they cannot be touched by Thread A. Even when they are const, Thread B can free them, and thus make reads from Thread A crash. Don't use pointers or references unless you are absolutely sure this won't happen. Returned pointers or references belong to Thread B.
If Function F accepts pointers or references as parameters from Thread A, make sure that they aren't referenced by Thread B, neither read nor written, once F returns. Passed pointers or references belong to Thread A.
Class types returned by-value from Function F to Thread A must provide a public copy constructor, either compiler generated or user implemented.
Exceptions thrown from Thread B to Thread A must provide a public copy constructor, and not be corrupted by copies.

Also, though this really goes without saying, synchronized code blocks in multi-threaded environments are more or less bound to cause somewhat of a traffic jam. In some cases, a redesign may be the solution, while in other cases; it's simply unavoidable. If a multi-threaded application is to safely access the same resources, interlocking really cannot be omitted, so it's all up to how you make the best of the situation. A quick metaphor, if you will:

Picture yourself having to transport packages in a bunch of cars, back and forth between A to B. It's a tough task to be the driver of all five cars at once, so you ask (read: trick) some friends to help out. A couple of the cars carry lighter loads than the others, so they are able to drive through the entire stretch a wee bit quicker than the rest. After some time, these racing drivers have actually passed the group, gone a full circle, and caught up with the rest of you again. This is of course perfectly alright, if only it weren't for the stretch of the road with only one lane. The quicker cars cannot pass you, until the road offers multiple lanes once more. Horribly inefficient, obviously, so it's decided that the speeders should come to a stop whenever they are forced to slow down. Once they stop, they can do a bunch of other tasks they wouldn't otherwise find time to do, such as surf articles on the web. When the slower drivers are nearing the end of the single lane road, they call the ones who are waiting -- telling them that the road is clearing up, and for them to speed on by.

Though this example may seem silly in the real world terms, it makes perfect sense for a multi-threaded application. If a piece of code cannot be accessed by thread A, because a block has been placed by thread B, A may be better off doing some more calculations or operations before re-attempting the lock. So, even though the aforementioned traffic jams are pretty much unavoidable, there are ways to make the best of them.

Using the code

We've just about covered everything there is to say. The attached file does have two example projects, in addition to documented code and generated HTML from doxygen tags. I'll quickly sum up the basics, though.

The library is strictly header-based, so in essence all you have to do is include ThreadSynch.h and the pickup policies of your choosing.

To obtain an instance of the scheduler, using the included APCPickupPolicy

The WMPickupPoilcy will notify the target thread through use of UserAPCs, granted that the target enters alertable wait before the specified call timeout runs out.

CallScheduler<APCPickupPolicy>* scheduler = 
    CallScheduler<APCPickupPolicy>::getInstance();

There's an example of this in the attached ThreadSynchTest project.

To obtain an instance of the scheduler, using the included WMPickupPolicy

The WMPickupPoilcy will notify the target thread through a user defined window message.

typedef WMPickupPolicy<WM_USER + 1> WMPickup;
CallScheduler<WMPickup>* scheduler = 
    CallScheduler<WMPickup>::getInstance();

The template parameter passed to WMPickupPolicy indicates which message to post to the thread. The receiving thread should deal with this message code in its message loop, such as:

while(GetMessage(&msg, NULL, 0, 0)) 
{ 
    if(msg.message == WMPickup::WM_PICKUP)
    {
        WMPickup::executeCallback(msg.wParam, msg.lParam);
        continue;
    }

    TranslateMessage(&msg);
    DispatchMessage(&msg);
}

There's an example of this in the attached ThreadSynchWM project.

To cross-call a function through another thread

scheduler->syncCall(dwThreadId, function, timeoutInMS);

To cross-call a class member function which returnins an int, with a const string reference for parameter

// Class in which our target function resides
MyClass classInstance;

// String parameter
string myString = "hello world";

// Init functor
boost::function<int(const string&)> myFunctor = 
    boost::bind(&MyClass::someFunction,     // Function
                &classInstance,             // Instance
                boost::ref(myString));      // Parameter
    
// Make the call
// The return value template specification can be ommited
// in this case, as it's also deduced from the boost functor.
// I've included it here to show how it can be specified,
// and how it must be specified if mere function pointers
// are used in place of the functors.
int x = scheduler->syncCall
    <
        int            // Return type.
    >
    (
        dwThreadId,    // Target thread
        myFunctor,     // Functor to call from target thread
        timeoutInMS    // Number of milliseconds to wait 
                       // for the call to begin
    );

To cross-call a class member function which returns a string, with a const string reference for parameter, and expect a few exceptions might be thrown

// Class in which our target function resides
MyClass classInstance;

// String parameter

string myString = "hello world";

// String for the return value
string myReturnedString;

try
{
    // Init functor
    boost::function<string(const string&)> myFunctor = 
        boost::bind(&MyClass::someFunction,    // Function
                    &classInstance,            // Instance
                    boost::ref(myString));     // Parameter
    
    // Make the call
    myReturnedString = scheduler->syncCall
        <
            string,        // Return type
            ExceptionTypes<std::exception, MyException>
        >
        (
            dwThreadId,    // Target thread
            myFunctor,     // Functor to call from target thread
            timeoutInMS    // Number of milliseconds to 
                           // wait for the call to begin
        );
}
catch(CallTimeoutException&)
{
    // The call timed out, do some other stuff and try again
}
catch(CallSchedulingFailedException&)
{
    // The call scheduling failed, 
    // probably caused by the pickup policy not doing its job
}
catch(std::exception& e)
{
    // Deal with e
}
catch(MyException& e)
{
    // Deal with e
}
catch(UnexpectedException&)
{
    // We didn't expect this one. 
    // It's time to read someFunction's docs.
}

You obviously won't have to catch all these exceptions all the time, but if you feel like it, you may. It's up to you, really. Whether or not you want an exception-safe application, that is.

In most cases, you will want to have a function re-call itself in context of the function's "owner thread", rather than call specific functions such as shown above. The ThreadSynchTestWM example attached shows how to do this in the updateText function.

Inner workings

The ThreadSynch library heavily uses templates and preprocessor macros (through e.g. boost's MPL). If you wish to understand exactly how (and why) the library works; you should read through the source code. That being said, I will cover some of the basics here.

There are two main players in each cross thread call, the "client" Thread A and the "target" Thread B. Right before a cross thread call, Thread B is in an unknown state. It's up to the PickupPolicy to either forcefully change that state, or gracefully take care of the scheduled calls when Thread B becomes available (such as enters an alert wait state).

Thread A

Thread A will call CallScheduler::syncCall with a set of template parameters, as well as a target thread, functor and timeout. To get a quick idea of what happens next, consider this activity diagram.

The CallScheduler::syncCall function will essentially allocate a CallHandler instance, which is the structure that takes care of the actual call, once the other thread has picked up on the notification made by the PickupPolicy. CallHandler includes wrapper classes for exception- and return value capturing within Thread B. syncCall's newly created CallHandler instance is enqueued to the specific target thread's call queue, which can be seen in this activity diagram.

When the call has been enqueued, and the PickupPolicy has been notified, syncCall will wait for an event or timeout to occur. Regardless of which happens first, syncCall will follow up by locking the CallHandler. If the scheduled call had already begun executing (but not completed) when the timeout passed, this lock will wait for the calls completion. Upon getting the lock, the state of the scheduled call will be checked. In case of completion, the result be passed back to syncCall's caller -- that is either an exception being re-thrown, or a return value returned. If, however, the call had not yet been completed nor begun when the CallHandler lock was obtained, the call will be removed from the target thread's queue. This guarantees that return values, exceptions and parameters aren't lost. The status returned by syncCall will be the accurate status of what's gone down in Thread B.

Thread B

To rewind a bit, Thread B is going about it's business as usual. Then, at some arbitrary (though policy defined) point in time, the PickupPolicy steps in and makes the thread call a function within CallScheduler. That function, executeScheduledCalls, will fetch and execute each and every CallHandler callback scheduled for the current thread, in a first-in-first-out order. See this activity diagram for CallScheduler::executeScheduledCalls.

The scheduled calls will be fetched through the function getNextCall, until no more are found. See this activity diagram for CallScheduler::getNextCall. The key part to this function is the locking of the CallHandler. As opposed to all other lock types used in the library, this one will return immediately if the CallHandler is already locked. The only reason for the lock to be found in place at this point, is that the call has timed out, and syncCall is about to delete it. This de-queue and delete will take place as soon as Thread A obtains a lock on both the CallHandler and the thread queue, which it will when getNextCall returns (and thus releases its scoped lock).

For each executed CallHandler, there are two layers. One utility class takes care of exception trapping (ExceptionExpecter), and another takes care of return value capturing (FunctorRetvalBinder). The results of both these layers will be placed in the CallHandler and processed by Thread A when Thread B completes the call, and drops its lock. I won't go into the details of either of these layers, as it's documented in the attached code.

Further studies

I strongly suggest that you take the time to read through the source code, if you are to use this library. It shouldn't be too hard to pick up on the flow of things, given the information in this article. If you find any specific section confusing, please do post a comment here. That also goes for this article -- any suggestions are welcome.

History

December 11th, 2006 (Article): Uploaded article, and library version 0.7.0.
December 12th, 2006 (Article): A few clarifications and extensions.
December 13th, 2006 (Bugfix):
- Added missing return value destructor to FunctorRetvalBinder.
- Library version 0.7.1.
December 15th, 2006 (Article): Fixed formatting for lower resolutions.
March 13th, 2007 (Article & Bugfix):
- Updated an example in the article.
- Fixed minor problems with the syncCall template signature.
- Added a unit test project.
- Library version 0.7.2.
July 15th, 2007 (Bugfix):
- Fixed scoped_lock bug in CallScheduler, which could have caused undefined behavior upon a call timeout.
- Fixed CallHandler destructor, which failed to cleanup exception expecter instance (the effect was a minor memory leak).
- Fixed scoped_try_lock loop to correctly iterate already locked CallHandlers in CallScheduler::getNextCallFromQueue.
- Library version 0.7.3.