Click here to Skip to main content
15,867,330 members
Articles / Programming Languages / C++
Article

Multithreading Tutorial

Rate me:
Please Sign up or sign in to vote.
4.80/5 (89 votes)
28 Dec 2006CPOL20 min read 598.2K   13.1K   250   54
This article demonstrates how to write a multithreaded Windows program in C++ using only the Win32 API.

Background

When you run two programs on an Operating System that offers memory protection, as Windows and UNIX/Linux do, the two programs are executed as separate processes, which means they are given separate address spaces. This means that when program #1 modifies the address 0x800A 1234 in its memory space, program #2 does not see any change in the contents of its memory at address 0x800A 1234. With simpler Operating Systems that cannot accomplish this separation of processes, a faulty program can bring down not only itself but other programs running on that computer (including the Operating System itself).

The ability to execute more than one process at a time is known as multi-processing. A process consists of a program (usually called the application) whose statements are performed in an independent memory area. There is a program counter that remembers which statement should be executed next, and there is a stack which holds the arguments passed to functions as well as the variables local to functions, and there is a heap which holds the remaining memory requirements of the program. The heap is used for the memory allocations that must persist longer than the lifetime of a single function. In the C language, you use malloc to acquire memory from the heap, and in C++, you use the new keyword.

Sometimes, it is useful to arrange for two or more processes to work together to accomplish one goal. One situation where this is beneficial is where the computer's hardware offers multiple processors. In the old days this meant two sockets on the motherboard, each populated with an expensive Xeon chip. Thanks to advances in VLSI integration, these two processor chips can now fit in a single package. Examples are Intel's "Core Duo" and AMD's "Athlon 64 X2". If you want to keep two microprocessors busy working on a single goal, you basically have two choices:

  1. design your program to use multiple processes (which usually means multiple programs), or
  2. design your program to use multiple threads.

So, what's a thread? A thread is another mechanism for splitting the workload into separate execution streams. A thread is lighter weight than a process. This means, it offers less flexibility than a full blown process, but can be initiated faster because there is less for the Operating System to set up. What's missing? The separate address space is what is missing. When a program consists of two or more threads, all the threads share a single memory space. If one thread modifies the contents of the address 0x800A 1234, then all the other threads immediately see a change in the contents of their address 0x800A 1234. Furthermore, all the threads share a single heap. If one thread allocates (via malloc or new) all of the memory available in the heap, then attempts at additional allocations by the other threads will fail.

But each thread is given its own stack. This means, thread #1 can be calling FunctionWhichComputesALot() at the same time that thread #2 is calling FunctionWhichDrawsOnTheScreen(). Both of these functions were written in the same program. There is only one program. But, there are independent threads of execution running through that program.

What's the advantage? Well, if your computer's hardware offers two processors, then two threads can run simultaneously. And even on a uni-processor, multi-threading can offer an advantage. Most programs can't perform very many statements before they need to access the hard disk. This is a very slow operation, and hence the Operating System puts the program to sleep during the wait. In fact, the Operating System assigns the computer's hardware resources to somebody else's program during the wait. But, if you have written a multi-threaded program, then when one of your threads stalls, your other threads can continue.

The Jaeschke Magazine Articles

One good way to learn any new programming concept is to study other people's code. You can find source code in magazine articles, and posted on the Internet at sites such as CodeProject. I came across some good examples of multi-threaded programs in two articles written for the C/C++ Users Journal, by Rex Jaeschke. In the October 2005 issue, Jaeschke wrote an article entitled "C++/CLI Threading: Part 1", and in the November 2005 issue, he wrote his follow-up article entitled "C++/CLI Threading: Part 2". Unfortunately, the C/C++ Users Journal magazine folded shortly after these articles appeared. But, the original articles and Jaeschke's source code are still available at the following websites:

You'll notice that the content from the defunct C/C++ Users Journal has been integrated into the Dr. Dobb's Portal website, which is associated with Dr. Dobb's Journal, another excellent programming magazine.

You might not be familiar with the notation C++/CLI. This stands for "C++ Common Language Infrastructure" and is a Microsoft invention. You're probably familiar with Java and C#, which are two languages that offer managed code where the Operating System rather than the programmer is responsible for deallocating all memory allocations made from the heap. C++/CLI is Microsoft's proposal to add managed code to the C++ language.

I am not a fan of this approach, so I wasn't very interested in Jaeschke's original source code. I am sure Java and C# are going to hang around, but C++/CLI attempts to add so many new notations (and concepts) on top of C++, which is already a very complicated language, that I think this language will disappear.

But, I still read the original C/C++ Users Journal article and thought Jaeschke had selected good examples of multi-threading. I especially liked how his example programs were short and yet displayed data corruption when run without the synchronization methods that are required for successful communication between threads. So, I sat down and rewrote his programs in standard C++. This is what I am sharing with you now. The source code I present could also be written in standard C. In fact, that's easier than accomplishing it in C++ for a reason we will get to in just a minute.

This is probably the right time to read Jaeschke's original articles, since I don't plan to repeat his great explanations of multitasking, reentrancy, atomicity, etc. For example, I don't plan to explain how a program is given its first thread automatically and all additional threads must be created by explicit actions by the program (oops). The URLs where you can find Jaeschke's two articles are given above.

Creating Threads Under Windows

It is unfortunate that the C++ language didn't standardize the method for creating threads. Therefore, various compiler vendors invented their own solutions. If you are writing a program to run under Windows, then you will want to use the Win32 API to create your threads. This is what I will demonstrate. The Win32 API offers the following function to create a new thread:

C++
uintptr_t _beginthread( 
   void( __cdecl *start_address )( void * ),
   unsigned stack_size,
   void *arglist 
);

This function signature might look intimidating, but using it is easy. The _beginthread() function takes three passed parameters. The first is the name of the function which you want the new thread to begin executing. This is called the thread's entry-point-function. You get to write this function, and the only requirements are that it take a single passed parameter (of type void*) and that it returns nothing. That is what is meant by the function signature:

void( __cdecl *start_address )( void * ),

The second passed parameter to the _beginthread() function is a requested stack size for the new thread (remember, each thread gets its own stack). However, I always set this parameter to 0, which forces the Windows Operating System to select the stack size for me, and I haven't had any problems with this approach. The final passed parameter to the _beginthread() function is the single parameter you want passed to the entry-point-function. This will be made clear by the following example program:

#include <stdio.h>
#include <windows.h>
#include <process.h>     // needed for _beginthread()

void  silly( void * );   // function prototype

int main()
{
    // Our program's first thread starts in the main() function.

    printf( "Now in the main() function.\n" );

    // Let's now create our second thread and ask it to start
    // in the silly() function.


    _beginthread( silly, 0, (void*)12 );

    // From here on there are two separate threads executing
    // our one program.

    // This main thread can call the silly() function if it wants to.

    silly( (void*)-5 );
    Sleep( 100 );
}

void  silly( void *arg )
{
    printf( "The silly() function was passed %d\n", (INT_PTR)arg ) ;
}

Go ahead and compile this program. Simply request a Win32 Console Program from Visual C++ .NET 2003's New Project Wizard and then "Add a New item" which is a C++ source file (.CPP file) in which you place the statements I have shown. I am providing Visual C++ .NET 2003 workspaces for Jaeschke's (modified) programs, but you need to know the key to starting a multi-threaded program from scratch: you must remember to perform one modification to the default project properties that the New Project Wizard gives you. Namely, you must open up the Project Properties dialog (select "Project" from the main Visual C++ menu and then select "Properties"). In the left hand column of this dialog, you will see a tree view control named "Configuration Properties", with the main sub-nodes labeled "C/C++", "Linker", etc. Double-click on the "C/C++" node to open this entry up. Then, click on "Code Generation". In the right hand area of the Project Properties dialog, you will now see listed "Runtime Library". This defaults to "Single Threaded Debug (/MLd)". [The notation /MLd indicates that this choice can be accomplished from the compiler command line using the /MLd switch.] You need to click on this entry to observe a drop-down list control, where you must select Multi-threaded Debug (/MTd). If you forget to do this, your program won't compile, and the error message will complain about the _beginthread() identifier.

A very interesting thing happens if you comment out the call to the Sleep() function seen in this example program. Without the Sleep() statement, the program's output will probably only show a single call to the silly() function, with the passed argument -5. This is because the program's process terminates as soon as the main thread reaches the end of the main() function, and this may occur before the Operating System has had the opportunity to create the other thread for this process. This is one of the discrepancies from what Jaeschke says concerning C++/CLI. Evidently, in C++/CLI, each thread has an independent lifetime, and the overall process (which is the container for all the threads) persists until the last thread has decided to die. Not so for straight C++ Win32 programs: the process dies when the primary thread (the one that started in the main function) dies. The death of this thread means the death of all the other threads.

Using a C++ Member Function as the Thread's Entry-Point-Function

The example program I just listed really isn't a C++ program because it doesn't use any classes. It is just a C language program. The Win32 API was really designed for the C language, and when you employ it with C++ programs, you sometimes run into difficulties. Such as this difficulty: "How can I employ a class member function (a.k.a. an instance function) as the thread's entry-point-function?"

If you are rusty on your C++, let me remind you of the problem. Every C++ member function has a hidden first passed parameter known as the this parameter. Via the this parameter, the function knows which instance of the class to operate upon. Because you never see these this parameters, it is easy to forget they exist.

Now, let's again consider the _beginthread() function which allows us to specify an arbitrary entry-point-function for our new thread. This entry-point-function must accept a single void* passed param. Aye, there's the rub. The function signature required by _beginthread() does not allow the hidden this parameter, and hence a C++ member function cannot be directly activated by _beginthread().

We would be in a bind were it not for the fact that C and C++ are incredibly expressive languages (famously allowing you the freedom to shoot yourself in the foot) and the additional fact that _beginthread() does allow us to specify an arbitrary passed parameter to the entry-point-function. So, we use a two-step procedure to accomplish our goal: we ask _beginthread() to employ a static class member function (which, unlike an instance function, lacks the hidden this parameter), and we send this static class function the hidden this pointer as a void*. The static class function knows to convert the void* parameter to a pointer of a class instance. Voila! We now know which instance of the class should call the real entry-point-function, and this call completes the two step process. The relevant code (from Jaeschke's modified Part 1 Listing 1 program) is shown below:

class ThreadX
{
public:

  // In C++ you must employ a free (C) function or a static
  // class member function as the thread entry-point-function.

  static unsigned __stdcall ThreadStaticEntryPoint(void * pThis)
  {
      ThreadX * pthX = (ThreadX*)pThis;   // the tricky cast

      pthX->ThreadEntryPoint();    // now call the true entry-point-function

      // A thread terminates automatically if it completes execution,
      // or it can terminate itself with a call to _endthread().

      return 1;          // the thread exit code
  }

  void ThreadEntryPoint()
  {
     // This is the desired entry-point-function but to get
     // here we have to use a 2 step procedure involving
     // the ThreadStaticEntryPoint() function.

  }
}

Then, in the main() function, we get the two step process started as shown below:

hth1 = (HANDLE)_beginthreadex( NULL, // security
                      0,             // stack size
                      ThreadX::ThreadStaticEntryPoint,// entry-point-function
                      o1,           // arg list holding the "this" pointer
                      CREATE_SUSPENDED, // so we can later call ResumeThread()
                      &uiThread1ID );

Notice that I am using _beginthreadex() rather than _beginthread() to create my thread. The "ex" stands for "extended", which means this version offers additional capability not available with _beginthread(). This is typical of Microsoft's Win32 API: when shortcomings were identified, more powerful augmented techniques were introduced. One of these new extended capabilities is that the _beginthreadex() function allows me to create but not actually start my thread. I elect this choice merely so that my program better matches Jaeschke's C++/CLI code. Furthermore, _beginthreadex() allows the entry-point-function to return an unsigned value, and this is handy for reporting status back to the thread creator. The thread's creator can access this status by calling GetExitCodeThread(). This is all demonstrated in the "Part 1 Listing 1" program I provide (the name comes from Jaeschke's magazine article).

At the end of the main() function, you will see some statements which have no counterpart in Jaeschke's original program. This is because in C++/CLI, the process continues until the last thread exits. That is, the threads have independent lifetimes. Hence, Jaeschke's original code was designed to show that the primary thread could exit and not influence the other threads. However, in C++, the process terminates when the primary thread exits, and when the process terminates, all its threads are then terminated. We force the primary thread (the thread that starts in the main() function) to wait upon the other two threads, via the following statements:

WaitForSingleObject( hth1, INFINITE );
WaitForSingleObject( hth2, INFINITE );

If you comment out these waits, the non-primary threads will never get a chance to run because the process will die when the primary thread reaches the end of the main() function.

Synchronization Between Threads

In the Part 1 Listing 1 program, the multiple threads don't interact with one another, and hence they cannot corrupt each other's data. The point of the Part 1 Listing 2 program is to demonstrate how this corruption comes about. This type of corruption is very difficult to debug, and this makes multi-threaded programs very time consuming if you don't design them correctly. The key is to provide synchronization whenever shared data is accessed (either written or read).

A synchronization object is an object whose handle can be specified in one of the Win32 wait functions such as WaitForSingleObject(). The synchronization objects provided by Win32 are:

  • event
  • mutex or critical section
  • semaphore
  • waitable timer

An event notifies one or more waiting threads that an event has occurred.

A mutex can be owned by only one thread at a time, enabling threads to coordinate mutually exclusive access to a shared resource. The state of a mutex object is set to signaled when it is not owned by any thread, and to nonsignaled when it is owned by a thread. Only one thread at a time can own a mutex object, whose name comes from the fact that it is useful in coordinating mutually exclusive access to a shared resource.

Critical section objects provide synchronization similar to that provided by mutex objects, except that critical section objects can be used only by the threads of a single process (hence they are lighter weight than a mutex). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. There is no guarantee about the order in which threads will obtain ownership of the critical section; however, the Operating System will be fair to all threads. Another difference between a mutex and a critical section is that if the critical section object is currently owned by another thread, EnterCriticalSection() waits indefinitely for ownership whereas WaitForSingleObject(), which is used with a mutex, allows you to specify a timeout.

A semaphore maintains a count between zero and some maximum value, limiting the number of threads that are simultaneously accessing a shared resource.

A waitable timer notifies one or more waiting threads that a specified time has arrived.

This Part 1 Listing 2 program demonstrates the Critical Section synchronization object. Take a look at the source code now. Note that in the main() function, we create two threads and ask them both to employ the same entry-point-function, namely the function called StartUp(). However, because the two object instances (o1 and o2) have different values for the mover class data member, the two threads act completely different from each other. Because in one case isMover = true and in the other case isMover = false, one of the threads continually changes the Point object's x and y values while the other thread merely displays these values. But, this is enough interaction that the program will display a bug if used without synchronization.

Compile and run the program as I provide it to see the problem. Occasionally, the print out of x and y values will show a discrepancy between the x and y values. When this happens, the x value will be 1 larger than the y value. This happens because the thread that updates x and y was interrupted by the thread that displays the values between the moments when the x value was incremented and when the y value was incremented.

Now, go to the top of the Main.cpp file and find the following statement:

//#define WITH_SYNCHRONIZATION

Uncomment this statement (that is, remove the double slashes). Then, re-compile and re-run the program. It now works perfectly. This one change activates all of the critical section statements in the program. I could have just as well used a mutex or a semaphore, but the critical section is the most light-weight (hence fastest) synchronization object offered by Windows.

The Producer/Consumer Paradigm

One of the most common uses for a multi-threaded architecture is the familiar producer/consumer situation where there is one activity to create packets of stuff and another activity to receive and process those packets. The next example program comes from Jaeschke's Part 2 Listing 1 program. An instance of the CreateMessages class acts as the producer, and an instance of the ProcessMessages class acts as the consumer. The producer creates exactly five messages and then commits suicide. The consumer is designed to live indefinitely, until commanded to die. The primary thread waits for the producer thread to die, and then commands the consumer thread to die.

The program has a single instance of the MessageBuffer class, and this one instance is shared by both the producer and the consumer threads. Via synchronization statements, this program guarantees that the consumer thread can't process the contents of the message buffer until the producer thread has put something there, and that the producer thread can't put another message there until the previous one has been consumed.

Since my Part 1 Listing 2 program demonstrates a critical section, I elected to employ a mutex in this Part 2 Listing 1 program. As with the Part 1 Listing 2 example program, if you simply compile and run the Part 2 Listing 1 program as I provide it, you will see that it has a bug. Whereas the producer creates the five following messages:

1111111111
2222222222
3333333333
4444444444
5555555555

the consumer receives the five following messages:

1
2111111111
3222222222
4333333333
5444444444

There is clearly a synchronization problem: the consumer is getting access to the message buffer as soon as the producer has updated the first character of the new message. But the rest of the message buffer has not yet been updated.

Now, go to the top of the Main.cpp file and find the following statement:

//#define WITH_SYNCHRONIZATION

Uncomment this statement (that is, remove the double slashes). Then, re-compile and re-run the program. It now works perfectly.

Between the English explanation in Jaeschke's original magazine article and all the comments I have put in my C++ source code, you should be able to follow the flow. The final comment I will make is that the GetExitCodeThread() function returns the special value 259 when the thread is still alive (and hence hasn't really exited). You can find the definition for this value in the WinBase header file:

#define STILL_ACTIVE   STATUS_PENDING

where you can find STATUS_PENDING defined in the WinNT.h header file:

#define STATUS_PENDING    ((DWORD   )0x00000103L)

Note that 0x00000103 = 259.

Thread Local Storage

Jaeschke's Part 2 Listing 3 program demonstrates thread local storage. Thread local storage is memory that is accessible only to a single thread. At the start of this article, I said that an Operating System could initiate a new thread faster than it could initiate a new process because all threads share the same memory space (including the heap) and hence there is less that the Operating System needs to set up when creating a new thread. But, here is the exception to that rule. When you request thread local storage, you are asking the Operating System to erect a wall around certain memory locations in order that only a single one of the threads may access that memory.

The C++ keyword which declares that a variable should employ thread local storage is __declspec(thread).

As with my other example programs, this one will display an obvious synchronization problem if you compile and run it unchanged. After you have seen the problem, go to the top of the Main.cpp file and find the following statement:

//#define WITH_SYNCHRONIZATION

Uncomment this statement (that is, remove the double slashes). Then, re-compile and re-run the program. It now works perfectly.

Atomicity

Jaeschke's Part 2 Listing 4 program demonstrates the problem of atomicity, which is the situation where an operation will fail if it is interrupted mid-way through. This usage of the word "atomic" relates back to the time when an atom was believed to be the smallest particle of matter and hence something that couldn't be further split. Assembly language statements are naturally atomic: they cannot be interrupted half-way through. This is not true of high-level C or C++ statements. Whereas you might consider an update to a 64 bit variable to be an atomic operation, it actually isn't on 32 bit hardware. Microsoft's Win32 API offers the InterlockedIncrement() function as the solution for this type of atomicity problem.

This example program could be rewritten to employ 64 bit integers (the LONGLONG data type) and the InterlockedIncrement64() function if it only needed to run under a Windows 2003 Server. But, alas, Windows XP does not support InterlockedIncrement64(). Hence, I was originally worried that I wouldn't be able to demonstrate an atomicity bug in a Windows XP program that dealt only with 32 bit integers. But, curiously, such a bug can be demonstrated as long as we employ the Debug mode settings in the Visual C++ .NET 2003 compiler rather than the Release mode settings. Therefore, you will notice that unlike the other example programs inside the .ZIP file that I distribute, this one is set for a Debug configuration.

As with my other example programs, this one will display an obvious synchronization problem if you compile and run it unchanged. After you have seen the problem, go to the top of the Main.cpp file and find the following statement:

static bool interlocked = false;    // change this to fix the problem

Change false to true, and then re-compile and re-run the program. It now works perfectly because it is now employing InterlockedIncrement().

The Example Programs

In order that other C++ programmers can experiment with these multithreaded examples, I make available a .ZIP file holding five Visual C++ .NET 2003 workspaces for the Part 1 Listing 1, Part 1 Listing 2, Part 2 Listing 1, Part 2 Listing 3, and Part 2 Listing 4 programs from Jaeschke's original article (now translated to C++). Enjoy!

Conclusion

This is my second submission to CodeProject. The first demonstrated how to use Direct3D 8 to model the Munsell color solid so that you could then fly through this color cube as in a video game. I also have a website where I offer a complete introduction to programming, including assembly language programming. My home page is www.computersciencelab.com.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionDuo core aware? Pin
Nguyen Luong Son29-Sep-06 22:38
Nguyen Luong Son29-Sep-06 22:38 
GeneralThanks Pin
Hamid_RT15-Aug-06 20:53
Hamid_RT15-Aug-06 20:53 
GeneralEvent Description is missing [modified] Pin
prasikumbhare21-Jul-06 2:21
prasikumbhare21-Jul-06 2:21 
GeneralThanks Pin
ssanand18-Jul-06 3:26
ssanand18-Jul-06 3:26 
GeneralC++ 2005 Pin
Mr. Say_how_it_is18-Jul-06 3:15
Mr. Say_how_it_is18-Jul-06 3:15 
QuestionCache miss? Pin
Thief^17-Jul-06 22:04
Thief^17-Jul-06 22:04 
AnswerRe: Cache miss? Pin
John Kopplin19-Jul-06 18:33
John Kopplin19-Jul-06 18:33 
GeneralRe: Cache miss? [modified] Pin
Thief^20-Jul-06 0:37
Thief^20-Jul-06 0:37 
You seem to be confusing cache and system memory:
A cache is typically very small. The word "cache" is usually used to mean cpu cache, which is 128kB-4MB in size. The OS has no control over the contents of the cache, it is instead filled automatically by the cpu. EDIT: This means that when there's a cache miss windows can't swap another thread in, because it doesn't even know the cache exists, let alone know about cache misses happening. Also, after running one thread for a short ammount of time the cache will likely hold ONLY data from that one program (especially with a 128-kB cache cpu), so there wouldn't be ANY data from another program in the cache, let alone "pages" of it.

A memory page is a section of memory that windows has allocated to a program. It can either be in ram or in the virtual memory file on a hard disk, and is swapped between the two as is needed. If a program unexpectedly accesses a memory location that's in a memory page which is currently not in ram, windows could transfer execution to another thread while it's loading the page to ram (I don't know if it does or not). This is called a "page fault", not a cache miss: http://en.wikipedia.org/wiki/Page_fault[^]

A code page is a windows localization thing (think character set), and has nothing to do with what you're talking about.

EDIT: Basically, change:
Most programs can't perform very many statements before they experience a cache miss which requires a read from main memory. This is a very slow operation...
to:
Most programs can't perform very many statements before they need to read the hard-disk. This is a very slow operation...

And lastly you might want to mention that if you enable "Multi-threaded Debug (/MTd)" or "Multi-threaded (/MT)" it causes the runtime to use multiple threads, potentially speeding up even single-threaded programs on a multi-core system (although by very very little). Also, VC++ 2005 no longer has a single-threaded runtime at all, it's multithreaded or multithreaded debug.

EDIT2: Or maybe you are thinking of hyperthreading, which runs two "virtual cores" per physical core, and actually DOES have the ability to switch to another thread (via the other virtual core) if it hits a cache miss? But still, the way that paragraph is worded suggests that you are talking about single-core processors (lets not argue about whether a hyperthreaded cpu is single or dual core, because it's technically a hybrid with some features of each), and you say the OS can switch the threads when a cache miss happens, which is not true even with hyperthreading. Yet again, cache misses are dealt with entirely at the hardware level.

-- modified at 8:58 Monday 24th July, 2006
GeneralRe: Cache miss? Pin
John Kopplin29-Dec-06 7:50
John Kopplin29-Dec-06 7:50 
GeneralRe: Cache miss? Pin
Jon Wold2-Jan-07 5:32
Jon Wold2-Jan-07 5:32 
GeneralRe: Cache miss? Pin
Thief^2-Jan-07 22:39
Thief^2-Jan-07 22:39 
GeneralRe: Cache miss? Pin
Jeffrey Walton28-Dec-06 21:33
Jeffrey Walton28-Dec-06 21:33 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.