The Practical Guide to Multithreading - Part 2

Ajay Vijayvargiya

4.96/5 (112 votes)

Apr 4, 2010

CPOL

43 min read

156160

4988

More of practical situations to use multithreading!

Prolonging...

This article continues the Practical Guide to Multithreading. Please read the first part where I explain the basics of multithreading, synchronous/asynchronous calls, event primitives, and locking/unlocking etc.

For readers who hit this page for the first time, without reading or knowing about the first part, please note that I am not explaining how to use multithreading, nor the syntax, and not how multithreading works. You can read them somewhere else. My primary emphasis is on when, how, and why to use multithreading.

Like the first part, I will be giving practical examples for multithreading. Thus, the article may not follow the usual sequence of exploring the different programming primitives used for multithreading (events, monitors, mutexes, data passing between threads, locks, interlocked operations etc.). It is not needed for you to know these terms in either a theoretical or practical sense. I would explain the terms as they would arrive for the first time.

Practical Examples

In the previous article, I covered an example where the program does some lengthy processing and allows the user to Stop or Suspend/Resume the processing. I will be covering a few more real life programming scenarios where multithreading would be utilized.

The code examples within this article's content are in pseudo-code language. Putting in two different languages breaks the flow for the reader who isn't looking for other languages. Also, this article is quite big; putting actual examples in multiple languages would make it more long. The attached Project Suites have the complete source code in your programming language.

A humble request to all readers: Please do not read this article in one attempt - read, fully understand, look at the attached code, code comments, change the code as you please, run at least 10-15 times with your changes - each and every project/example. Once you gain a firm understanding of a particular example, only then should you move forward. The article stays here!

Multithreaded programming is not about learning thread basics, what different thread primitives are, how the Operating System schedules them etc. It is about practicing, programming, and learning by hands-on experiences. Please have patience - give some time to this nifty thing - multithreading!

The first four examples are not multithreaded per se, but shows how multithreading can be used to achieve them. Rest of the examples utilize the true gleam of multithreading. Each example follows the project details that are in the Project Suite.

[•] Practical Example 4: One Instance of an Application

You want that the user may run only one instance of your application. Well, there are different approaches for the same:

Solution 1: Use the FindWindow API to find the running application, by caption. For example, if your application title is "Advanced MP3 Tag Editor", you can pass this string to the API and it would return the handle to the target window.

This will not work if:

A users creates a folder with the same name as your application title, browses to that folder in Windows Explorer, and the FindWindow would return the window-handle of Explorer!
With some Desktop Tweaking tools, your application window might be hidden (in tray, for instance), or on another virtual desktop!
Your application is at startup phase, and before it actually shows up, the user (the impatient one, like me!) launches the application once again. Whoa! FindWindow fails, and two (or more) instances are now running.

Solution 2: Use the same API to find the window by the window-class (see RegisterClass in MSDN). It would succeed for point 1 above, but would fail for other points. It also requires you to register a window class by some unique name, create and show the window with that name; or knowing the class-name if you use the predefined class. As you can guess, another application might be running with the same predefined window-class (like a dialog box, having "#32770" as the class-name).

Both of the above solutions will not work if your application is a Console or doesn't have any windows. Also, the Elevation and Integrity levels, as with Windows Vista and higher, may cause problems. This article is too short to discuss 'how'.

Solution 3: Enumerate all running processes and find if your application is running. You just match the process name and find your process. Well, anyone can rename any other application (like notepad.exe) to your executable's name (say, advmp3.exe) and run it. Yeah, you got the hint!

Another problem is with multiple users logged in (via Fast User Switching, or by Remote Desktop). Your application might be running on another desktop, but with your user credentials, or vice versa - you (the current user) might have run your application in another user's token! It depends on your application how to react in such a scenario; but FindWindow is absolutely ruled out!

Solution 4: This is an advanced solution and should not be used, and I've not tested/read-about this technique for the points mentioned above. With a linker switch on Visual C++, you can make a global variable shared among all instances of your running processes. Not a reliable solution! Do not know about other compilers.

The best solution is to use a Named Mutex.

What are Mutexes?

Mutex is a thread-synchronization primitive that is used to control access to a shared resource (a variable, an array, a file object, device context, window handle, socket, or any other resource). As mentioned in the first part, a synchronization primitive is not the resource you want to have controlled-access, but it is the lock to control simultaneous access.

A mutex:

Can be named or unnamed. Can be used across processes if it is named (ignore inheritance!).
If a thread acquires a lock to a mutex, another thread (or a thread in another process) cannot lock/gain the mutex. The other thread has to wait for the mutex to become free/unlocked.
Only one thread can have the exclusive access to the mutex. Acquiring a lock means getting exclusive access.
Unlike some other primitives, you can also specify a time-limit to wait for gaining access to a mutex. For example, you can say "wait 5 seconds". You can also specify an infinite timeout. Depending on the return value of the mutex-wait-function, you can find if you've got access to the mutex and act accordingly.
Acquiring a successful lock to a mutex makes the mutex non-signaled. Releasing the lock changes the state to signaled. Read the first part of this article to get a good understanding about signaled and non-signaled states.

You might wonder what all this has to do with preventing an application from running twice? Well, in this series, I first introduced mutex, so I must detail it!

To prevent another instance, you do the following things in your application at startup (i.e., in main):

(Try to) Create a named-mutex. The name should be unique. Like "advmp3$mutex-singleinstance". You can use the GUIDGEN utility to get an absolutely unique mutex name.
If the creation succeeds, this instance is the first instance of the application.
If the creation fails, another instance is already running. Depending on your application, you can:

Set the focus to an existing application (show if it's hidden).
Just show a message-box saying another instance is already running.
Do not give any visual clue, and exit.
In all cases, however, you must exit from the newly created instance.

Here, I present a screenshot from Process Explorer. It shows the mutex used by Windows Media Player, ending with 'OtherInstanceMutex'. Run Process Explorer in Administrator mode, close the handle to this mutex, and you can now run another instance of Windows Media Player and listen to multiple songs!

Does it mean that anyone can delete (close handle) the mutex, and thereby jeopardize the application's internal state? Well, yes - if you create the mutex (or any other kernel object) with the default security attributes (remember the LPSECURITY_ATTRIBUTES parameter of CreateMutex, CreateProcess, which you always set to NULL?) and someone has Admin rights! The .NET Framework also has an option to set security attributes to kernel objects. You can set security-attributes to deny/allow actions to different users, but describing them is not in this article's domain!

The name of the mutex is controlled and stored by the OS. The name of the mutex, along with the mutex, will be removed by the system if the process deletes the mutex, or if the process exits (normally or abnormally). For all kernel objects, the OS keeps track, and would actually delete only if all reference-count is zero. Mutex is a Kernel object, thus they can be shared among processes. Critical Sections (we'll see these later) are user objects, and they cannot be shared between processes. Using a Kernel object requires a round trip from user-mode to kernel mode, and thus they are slower than user-mode synchronization objects.

Further, if you want only one instance of your application, irrespective of the number of logged in users, you must prefix "Global\" to the name of the mutex. Otherwise, it is treated as a local-named mutex. More details can be found in MSDN.

Pseudo-code

if ( OpenMutex ("MyMp3Application_SingleInstance") == true)
{
    // Notify the user that another instance is running 
    // Exit from this instance immediately.
}
else
{ 
    CreateNamedMutex("MyMp3Application_SingleInstance"); // Assume success. Keep mutex open 
}

The following table lists the programming elements for mutex in different programming environments:

What / Where	Windows API	MFC Library	.NET Framework
Mutex data-type	`HANDLE`	`CMutex` class	`Mutex` class
Creating mutex	`CreateMutex`	`CMutex` constructor	new `Mutex`
Locking/ Acquiring mutex	`WaitForSingleObject`	`CMutex::Lock` method	`WaitOne` method
Unlocking/ Releasing mutex	`ReleaseMutex`	`CMutex::Unlock` method	`Mutex.ReleaseMutex` method
Waiting for more than one mutex (all, one)	`WaitForMultipleObects`, `bWaitAll` specifies All/One predicate	Construct `CMultiLock` with multiple `CMutex` objects, then call `Lock` with the required wait-predicate.	`WaitAll` and `WaitAny` methods

Project Details

MFC project: SingleInstance_MFC
,NET project: SingleInstance_NET

[•] Practical Example 5: Lengthy Operation in Background

You must have used Windows Calculator, and gave some input to the Calculator to process that would take a long time. Like calculating the factorial of 100,000! If you try to calculate the factorial of the above value, you'd encounter the following message:

How to Do it?

Before the processing thread starts calculating (or performing any other operation, like code compiling), it starts another thread to watch for this thread. Also, before it launches the other thread (the watcher-thread), it creates a mutex object and locks it. The other thread waits for the same mutex specifying some time interval (say, 5 seconds). Now both threads are running. (You can also use an event-object instead of the mutex, in this case).

If the calculation is done within a specified time (5 sec.), it immediately releases the lock. This gives the lock to the watcher-thread. If it doesn't finish processing in the specified time-limit, it keeps on processing and does not release the lock.

The watcher thread's wait-call, will, anyhow, return in the specified time (here, 5 seconds). It then checks the return value of the wait-call. If it finds that the lock cannot be acquired, it throws up the message box - asking the user to terminate the long processing.

Well, that was one part of the long-processing termination mechanism. The actual-processing thread has to know about the termination. The simplest thing is to terminate the thread forcefully! If you are absolutely sure what the thread is doing, like (here) just calculating based on some numbers, you can do so. But if the processing-thread uses some resource (file, window, DC, socket, file, and so on), you need to request it to exit gracefully. You can use events to do the same, as explained in the previous article. The attached projects have the complete code.

As a last note, you should understand that this example is no more different than the 'Cancellable' example given in the previous part of this article. The only difference is that the watcher-thread is requesting to stop, instead of the user.

Also note that, in the calculator, the processing-thread is actually the main/GUI-thread - that's why the Calculator window doesn't respond and becomes (Not Responding). This is, of course, my assumption, based on the visual feedback and the number of threads I see in the Task Manager!

Project Details

MFC project: LengthCalculation_MFC
.NET project: LengthyCalculation_NET

[•] Practical Example 6: Waiting for an External Program to Finish

Your application launches some external program, like Notepad.exe, and would like the wait for Notepad to finish before you can proceed further. Why, You might ask. Well, in this example, you launched Notepad to allow the modification to some settings file, and expect the user to do changes in a file, save the file, and exit Notepad. As soon as you find out Notepad has exited, you re-read the setting file and act appropriately.

The 'Program and Features' on Windows Vista and higher also performs a similar thing. You launch some setup program (that might be any kind of Installer, not just MSI), and then you try to launch another setup program (Uninstall, Change, or Update), and it gives following error message:

How Can you Achieve the Same?

Well, that's pretty simple! You just call the WaitForSingleObject API or the Process.WaitForExit method on the newly created process' handle, specifying an infinite timeout. Yes, you may need to put this call on another thread; otherwise, your GUI would become unresponsive.

Also note that you can use the same method to wait for any thread you have created within your program. The Calculator example given above may be implemented more efficiently as:

The GUI thread creates two threads, one for the actual number calculation, and another that waits for the calculation-thread to finish within the specified time.
The watcher-thread now waits for the calculation-thread (instead of some event/mutex) for the specified timeout. The rest of the functionality would remain the same. See more comments below.
The GUI-thread would display something like 'Calculating...', disabling all controls, and (optionally) showing/enabling a button to Cancel the calculation.
Now we have three threads in total; the first one just shows 'Calculating...'. The second one performs the actual calculation (this thread utilizes the CPU, not the other two threads). The third thread just waits for some event to occur. If you provide the user the facility to 'Cancel', you must do some extra check in the watcher-thread - to not display a message box saying 'Taking long..Stop now?'. The user has already clicked stop, and your badly written threading code could behave erratically.

If you could not understand the efficient approach listed in the bullets above, don't worry. I just put them for completeness. The attached project doesn't do that way either!

Pseudo-code

ProcessHandle handle = CreateAnotherProcess(Path_To_Executable);
WaitForProcessToExit(handle);
// If control reaches here you can let user know, or take appropriate action

Project Details

MFC project: ProcessWait_MFC
.NET project: ProcessWait_NET

[•] Practical Example 7: Making a Fail-safe Application

It is not about data safety, it is about process safety. For example, you are having some server or service that should keep running no matter what. A device driver, antivirus, database system could be your fail-safe application. Now, if your process terminates, by program crash, or the user terminating your process deliberately, or some other malicious process trying to terminate your process, what would you do?

Here, you just create a dedicated process (not thread) which would wait for process termination, as described above. Now, as soon as it finds the process is terminated, it would re-launch the process (CreateProcess or Process.Start) and would again start waiting on the new process' handle.

Your main process also needs to dedigate a thread to wait for the other process (which is ensuring fail-safety for this process). In that process, you would wait for other processes to terminate, and as soon as you get that it's terminated, you instantly spawn that process again!

As a last note on this: what if some way both processes get terminated at the same time? Well, in that case, you can have two or more watcher-processes wait for the main process to terminate, and wait for all those processes in your main process. I can discuss more on this, but let's end this discussion here. The best thing you can do is to attach some security attribute to your process, and one watcher process, so that a malicious program cannot terminate them!

Pseudo-code and projects are not given for this example. I assume the reader can understand how to achieve the same, and is able to implement it!

Let's Jump into the Realms of Multithreading!

In the above four examples, I used the multithreading concept just to control the execution of program and threads, by just waiting or checking some primitives. Well, as you all know, it is not what multithreading is meant for. Multithreading is mainly used to achieve parallelism, to perform something parallel while some other task is being done. With multiple processors/cores, your program can achieve true simultaneous execution. You can read, or have already read, more on this subject theoretically - so let's stop here, and concentrate on the practical approach!

The following three examples would show you how to use multithreading to improve application response, utilizing the true potential of hardware.

[•] Practical Example 8: Spell Checking While User is Typing

Well, no! I am not elaborating the full fledged spell-checking, giving the appropriate words as suggestions. Just this simple: the program would have some words (say around 100 words) which are considered valid words (so say, keywords). Every other word would be marked as invalid word (i.e., treated a spelling mistake). Read this article for an actual spell checker implementation using Word Automation.

In our example, we would allow the user to type into a text box and let the user add it into some list. As soon as the user types and adds the text to a list, we let the spell-checker-thread know about the text. Due to the complexities involved with inline spelling checking (i.e., as-the-user-types scenario), I am not explaining spell-check on a textbox-content-change event. The reason is simple: the user may type in any sequence, may move the caret (typing-cursor) anywhere, may cut, copy, paste, delete, or do any random text change. Sending the entire text on every text change is not feasible; the text might have modified dramatically in the text box, and the thread might perform a spell check on the text that may not even remotely match the text on the text box.

For this, I will be using a collection of strings. The string-list would be protected by a critical section. The GUI thread would add to this list, and the spell-checker thread would read the element that was added last to the list, in a FILO manner.

What is a Critical Section?

A critical section is very similar to mutex - it allows only one thread to access some protected data. However, these are the differences:

Critical Section is a User Object, whereas Mutex is a Kernel Object. A user-object need no interference with the kernel (OS Core); a Kernel-object requires a round-trip to kernel-mode.
Critical Sections are faster than Mutexes. This rule holds true for any User v/s Kernel object.
Critical Sections cannot be named and cannot be shared among processes.
User objects cannot have security attributes attached to them.
You cannot specify the timeout while waiting on a critical section. That means infinite wait time-out!
Some languages provide a direct and simple mechanism to use critical sections. For example, C# facilitates the lock keyword to implicitly use a critical section, which is more intuitive than using the Monitor class.

The programming elements table:

What / Where	Windows API / C++	MFC Library	.NET Framework
Critical Section data-type	`CRITICAL_SECTION`	`CCriticalSection` class	Static `Monitor` class, or `lock` keyword
Creating a Critical Section	`InitializeCriticalSection`	`CCriticalSection` constructor	-N/A-
Locking/ Acquiring a Critical Section	`EnterCriticalSection`	`CCriticalSection:: Lock`* method	`Monitor.Enter` method
Unlocking/ Releasing a Critical Section	`LeaveCriticalSection`	`CCriticalSection:: Unlock` method	`Monitor.Exit`
Trying to acquire a CS without blocking (i.e., zero timeout)	`TryEnterCriticalSection`	-N/A-	`Monitor.TryEnter`
* `CCriticalSection::Lock` has two overloads; one also takes a timeout parameter. But it is ignored by `CCriticalSection`. Note that `CCriticalSection` derives from `CSyncObject`, which exposes two versions of the `Lock` method. See MSDN for more information.

So, How do We Implement Our Protected String-list?

Try to understand the following diagram:

The GUI thread would lock the critical-section, add the currently entered text to the string-list, and release the critical-section. After this, the GUI thread can do its own work, like adding the text into the list-control, clearing the text box for the next input, and so on. At this time, the spell checker would already have started processing the text!

Similarly, the spell-checker thread would lock the critical-section, would read into some local variable, and release the Mutex. Then it would start spell-checking the text.

From the above diagram, you see that I acquired and released the critical-section in a short span of time. That's the right way of locking and unlocking, so that the other thread doesn't end up waiting for the critical-section for more than it should! That is the reason I put the last item into a local variable and perform the spell-check on the local variable! The GUI thread should also do the same, i.e., perform the text validation etc., before locking!

That's Okay. But What Event has to Do in This Case?

Well, the spell-checker thread cannot read from an empty string list. It cannot wait for a Critical Section. Doing so would block the GUI (think how!). Thus, in both critical section regions, we find the number of items in the list and put it into some local integer variable. Now, after the critical section region, in the GUI thread: we check if the element count is zero, and we set the event. And in the spell-checker thread: we wait for the event to get signaled only if the count is zero. See the attached code in your favorite language. Still, it may be a bit tricky to understand, and I cannot push that into your brain-box by words, picture, or even code - try to understand it yourself - and you will!

What About Letting the User Know of Spelling Errors?

In the above diagram, I mentioned data-sending from the UI thread to the spell-checker thread. We also need the spell-checker to send some data to the GUI thread so that it can display something for the end user.

1. Send back invalid counts

As explained in part 1, we use PostMessage or BeginInvoke with our custom message to let the GUI-thread know about spelling errors. For just passing a count of invalid words, we can simply pass an integer, as we did in part 1:

C++/MFC

// WM_SPELL_CHECK_REPLY is defined as our custom message code.
pThis->PostMessage(WM_SPELL_CHECK_REPLY, nInvalidWordCount);

C#

this.BeginInvoke(this.SpellReplyDelegate, nInvalidWordCount);

If you do not understand the code above (respective to your language), please read the first part of this article.

The GUI thread would get this reply from the spell-checker thread, and would update the count. But this way, you can only send an invalid word-count, and not a valid word count (you need to send both). Also, you may need to reply back with a set of all words that were valid and invalid. Also, the GUI thread needs to know which line contains those valid/invalid words. Though we are sending a line to the spell-checker-thread in a sequential manner, and the thread would receive it in a sequential manner (remember, the spell-checker thread is processing lines in FILO mode); there may be a need where we must reply back (from the spell checker thread to the GUI thread) with the line-number also.

Tall order to understand I know, but let's proceed incrementally. Please note that the things I am discussing below would be very useful in multithreaded programming. So, please understand carefully!

2. Send Back Valid and Invalid Counts

There are two approaches to do this. The first is to pack a valid-count and invalid-count in a single variable - taking counts as 2-byte integers, and packing them into a 4-byte (or 4-4 into 8 byte). But I reserve that approach as an exercise for the reader! The second is to use two arguments for PostMessage / BeginInvoke. The first argument would be the valid word count and the second would be the invalid word count. PostMessage can take one more argument (optional functional argument), and the respective C# delegate can be easily modified to take two (or more) arguments.

3. Send Back a List of All Valid and Invalid Words

Please ignore the slight extra memory usage requirement for this case. In this approach, we put all valid and invalid words in a list. So, we need to have two vector/list/array for each category. Further, we put these two string-lists into some structure. We allocate this structure on the heap (using new), and put valid/invalid words into the respective string-list variable. Finally, we send this structure's reference/pointer to the GUI-thread.

Why this complicated? You might argue!

Well, the number of arguments for PostMessage/BeginInvoke is limited. Even with delegates, you cannot easily code a delegate that takes a variable number of arguments (like 3 valid words, 6 invalid words - 9 arguments!). Let's don't get deep into that!

Secondly, we must allocate the structure on heap (new) since the stack variable would be local only to the current thread/function, and not to the GUI thread. The GUI thread, after utilizing the structure, would delete the structure (C++).

Pseudo-code:

struct SpellResult
{
    StringList ValidWords;
    StringList InvalidWords;
    int Line;
};

/// In spell-checker thread
 
// Before loop which is checking the line nCurrentLine
SpellResult spell_result = new SpellResult; // Assume 'spell_result' is pointer (C++)
spell_info.Line = nCurrentLine;

// Inside the loop, which is checking one line
 
if( IsValidWord (sWord) )
    spell_result.ValidWords.Append(sWord);
else
    spell_result.InvalidWords.Append(sWord);
 
/// After the loop (for current line)
/// C++ ////
PostMessage ( WM_SPELL_CHECK_REPLY, spell_result);
// 'spell_result' is pointer - ignore the syntax

/// C# ////
this.BeginInvoke(this.SpellReplyDelegate, spell_result);

In C++, the GUI thread would typecast the WPARAM to SpellResult* and would utilize the structure for displaying spell errors to the end user, eventually deleting the pointer. For C#, the delegate would take an argument of type SpellResult, so no typecasting is required. Please download the project and look at the code, not shown here for brevity.

What if the User Closes the Window Before the Spell-thread Finishes?

Well, that's an interesting case! There are a set of possible solutions. Firstly, in the example discussed above, you can just let the GUI thread and the process die - the spell-checker-thread would die automatically (since it performs spell checking very fast). Secondly, you can let the spell-checker know that the GUI is now terminating, "quit"! Thirdly, we can wait for the spell-checker-thread to finish the work.

For the first solution, we don't need to do anything. It is already done!
For the second solution, which is appropriate for this example, we let the thread know about the user's wish to exit. (A) We can use some shared variable to let the spell-checker know about the exit statement. Or (B) we can have one more event to let the spell-checker know about the exit-request.
The third solution isn't appropriate for this case. We must obey the user's request and stop spell-checking that the user doesn't want to see! For non-GUI requests, let's postpone the third solution for the next part!

Solution A: Just have a simple bool or int variable as a class member. Set it to false/0 - meaning 'quit-request' is false. Whenever the user attempts to exit, you just set this variable to 1/true. In the spell-checker-thread, while processing each line, just before the CS-lock (see the diagram above), check this variable. If it is true, exit immediately! On 32-bit processors, a read/write to a 32-bit or lower variable is atomic, thus you don't need to synchronize them. But again, not a good idea.

Solution B: Do you recollect the Pause-Resume scenario presented in the first part, where I just used Thread-pausing and Thread-resuming functions to pause/resume the worker thread? Well, this solution is similar to that one. In that example, we needed the worker thread to know one of the two requests from the GUI thread: Stop or Pause. In this case, we need the spell-checker to know one of two requests: new-line-arrived or stop-request-arrived. But as you can understand (I assume you do!), putting this two-event check when there are a few more lines within the spell-thread does not work as expected. Recollect again, we are waiting for the event only when there are zero lines in the string-list (see the diagram again). Thus, the spell-thread must process all lines before it can re-read the event status! But for completeness, I am mentioning this approach also.

How do I Wait for Two (or More) Events in One Call?

The obvious solution you might think, which does not work, is to call the wait-function two or more times! Like:

// Pseudo-code
status_stop = Wait(event_for_stop); // WaitForSingleObject / WaitOne method 
status_newline = Wait(event_for_newline);

And further check the status of events and act accordingly. In this example (and the thread pausing/resuming example also), it would work, since we are passing zero (0) as the time-out parameter. But what if we cannot pass 0 as the wait timeout? An example would be: launch multiple processes and wait for any one of them to finish. Think of the solution! None would work efficiently.

The solution is to use the WaitForMultipleObjects API, or the WaitAny method of the WaitHandle base-class. These routines can wait for a maximum of 64 sync-objects. Any type of object can be specified (any combination of mutex, events, semaphores, process/thread handles etc.). The return value of the routine specifies which event has signaled. Please look at the documentation on how to use these routines; following is just pseudo code:

// Initialize with two event-handles (or any type of handle)
SyncObject event_array[2] = {...};
event_wait_result = WaitMultiple(event_array, 2); // Assume timeout is 0
if( event_wait_result == 0 ) //  first event fired.
   // Handling for first event
else if (event_wait_result == 1)
   // Handling for second event

The event_array holds up the handles/objects for which you want to wait as an atomic-operation. The return value of WaitMultiple specifies which event was signaled, and you act accordingly. So, for the Cancel/Pause scenario, you can have two events, put them into an array, and call WaitMultiple. Likewise, for this new-line/stop-request scenario, you can have two events. But, for this example, the stop-request event must be first in the array. Why? Well, both the WaitForMultipleObjects API and the WaitHandle.WaitAny method would return the lowest index if both events are set. So, the newline would have arrived, and the user desires to close the application, then the close-event must be processed at a priority over the new-line event. It is true that both actions are performed by the end user, thus they'd arrive in newline, stop-request sequence. But in other cases, you must process the event having the higher priority, and thus must be put in proper order in the WaitMultiple call.

The spell checking does not take time at all. So, the stop-request and WaitMultiple is not implemented in this example project. It is used in the Example 9 project. In the course of discussion on the subject, it arrived on the way, and I explained it! I assume you understood it; if not, come back here when you read about Example 9 below.

Project Details

MFC: SpellChecker_MFC
.NET: SpellChecker_NET

[•] The Practical Example 8 : Reading very large file

Open a huge file (say 400 MB) in Notepad. And you know the rest - Notepad will hang, until it opens the entire file and displays it on a text box. Or it may raise some error, mentioning that it cannot open the file.

Open a similar sized Word file (actual Word document) having thousands of pages. It would open the document almost instantly. After you open the document, you can see the number of total pages keeping on increasing. Eventually, the whole file is loaded. But again, depending on the file content, number of pages etc., Word performs a smart move: it doesn't load the entire file into memory (or at least not on visible pages).

Here I would explain loading huge text files only. Instead of loading a file line by line, in one thread, we designate another thread to do the reading. The dedicated thread would read the lines of the file and may do one of the following:

Add the line to some container. The container could be std::vector, std::list, CPtrList, CStringArray, Array, ArrayList - all depends on the language and your preference. The main (GUI) thread may read this container in a thread-safe manner, and would update the UI. For thread safety, we can use critical section (or mutex) as we did in the last example.
Notify the main thread directly of the newly read line, using the PostMessage API or the BeginInvoke method. The main GUI thread would actually add the line to the container, or to the GUI component (like textbox, combobox, listbox etc.). But this wouldn't give effectiveness, as the GUI thread needs to process each line, thus making the GUI non-responsive until the reader-thread is finished.
Optimize the above approach by adding a few lines in some container, post the container (having a few lines - say 100 lines) instead of posting for each line read. This would increase the performance and responsiveness of the GUI thread. The GUI thread would read all lines from the posted mini-container, and would add the lines to the control.

For both the approaches mentioned above, the reader-thread would allocate the buffer for a single line or container, and post the buffer (as a pointer/reference). The GUI thread would utilize the buffer and would be responsible for deleting the buffer (C++).

Use advanced techniques like overlapped IO or IO completion ports. But discussing them at this stage isn't appropriate. I may discuss them in a future article!

We have already seen an example using a container (option 1). Option 2 is ruled out as it is almost the same as single threaded file-reading. Not supposed to implement option 4, in this article!

We use will option 3. Here are the steps (GT: GUI-Thread, RT: Reader-Thread):

[GT] Let the user choose a file to load.
[GT] Attempt to open a text file in read mode. Emit an error to the user if file cannot be opened.
[GT] On successful file open, create a new thread, reader-thread. Pass this file reader handle/object to the reader-thread.
[GT] Would store the file-size before starting the reader-thread. This is needed to show the file-reading progress to the user.
[RT] The reader thread typecasts the argument to the file handle/object and starts the file-reading loop. The loop would read the file line by line.
[RT] The thread would keep the line in a string-list (local, but allocated on the heap). On reaching some threshold - say 100 lines, it sends the string-line container to the GUI-thread for actually appending this much text into the textbox (or another control). Sending is done via PostMessage or BeginInvoke.
GT gets the posted message, translates into a string-list type, and adds all the lines into a control (textbox). Also, it would increase/update the bar or percentage (text), to show the end user how much file was read. Since the file-size has no direct relevance with the varying length of lines, the progress-calculation would just be an indicative estimate.
GT would also free the memory allocated for the string-list that was allocated by RT (C++).
RT would finally send an indication that the file read is complete - it could be zero-lines in a string-list (the container), or some extra message in PostMessage or the C# delegate. It would also close the text file.
GT would process that message as file-reading-finished, and would update the GUI (hiding the progress, enabling the textbox etc.).

Do We Really Need a Reader-thread Even to Read a Small File?

Actually - no. We can optimize by just delegating a large file reading (say, > 1KB). If the file size is less than that, we can just read and add into the textbox control, within the GUI thread.

Here is a diagrammatic representation:

What if the User Closes the Application, Tries to Load Another File, or Requests to Cancel File Loading?

For all the three user requests, we can implement stop-request-handling using one event. If you are following the article carefully, you very well know: (1) what stop-event means, and (2) that two events are not required in this example, since the worker-thread is processing and sending data to the GUI-thread (spell check example was inverse of this example).

Application closure request must be done the way described in the above paragraph. But for the other two requests, we can also:

Disable the controls/menus until file reading is finished so that the user cannot cancel or attempt to open another file.
Let the user open another file in two other different threads. This would require an MDI application.
Let the user Abort (cancel) file reading. This can be done for both SDI and MDI types of applications.

Why Two Different Threads for an MDI Application? You Would Probably be Curious!

Well, I am not covering MDI or multiple file reading in the article (not even in the attached projects). I must, however, clarify this point.

The entire application's GUI runs under a single thread, that is generally (not necessarily) the primary thread, no matter how many windows you create or how many controls/menus you have or place on form/dialogs. In native Windows API (Win32) sense, one message-loop is responsible for all windows (forms, dialogs, controls etc). Thus, if you make an MDI application and let the user open multiple files simultaneously - the GUI thread is still one! This single thread is responsible for handling all user actions, as well the posted or invoked messages! So, if you create three child windows and open three large files in one go - there would be 4 threads only (3 reader-threads and one poor GUI-thread), and not 6 threads! (Please ignore the threads created by the CLR for .NET applications; the GUI-thread is still one).

For instance, try opening Windows Explorer (explorer.exe) and copy some files. You'd see the 'Copying...' window. Look into Task Manager, explorer.exe has created two threads (or even more!) for handling this thing. One is for actually copying (worker-thread), and one is for the GUI. The GUI thread is the 'Copying...' window, which is independent of the Explorer's main window. That's the reason you can close the main window and keep the 'Copying' window open!

One more thing that I am not covering in this article is: thread pooling. Many applications create a few threads in advance and keep them idle. This is to ensure that a request would be processed as soon as possible, without dealing with the time-consuming expense of creating a thread.

(Don't get these two curious topics? Don't worry, I am here to cover them in one of the next parts!)

Project Details

The closure case I mentioned above is not implemented. That means, if the user closes the application before an entire file is read, the reader-thread would not be notified. Behavior undefined. Please check 'File Reading wWthout Spell Check' for this example. Other form/dialog/source-files are for the next example.

MFC: LargeFileReading_MFC
.NET: LargeFileReading_NET

[•] Practical Example 9: Large File Reading and Spell Checking!

That's right boy - we can now play with three threads! Well, it isn't that complicated you might think. We have two approaches to implement this.

Approach 1: On file-open request by the user, we create two threads. One is the file-reader thread that would actually read the file and would send the set of lines to the GUI-thread, as we did in the last example. Another is the spell-checker thread that would wait for the spell-check-requests, as we did in the Spell-Checker Example.

The GUI thread, on receiving lines from the reader-thread, would append to the text box and would send the same set of lines (string-list) to the spell-checker-thread. The spell-checker-thread would check the text-lines for spell checking. It would respond back with the spell-check results to the GUI thread, as we did in example 7. But remember, we need another message/delegate to handle the request from the spell-checker thread. Though we can have the same message/delegate, but that would add more complexity.

Approach 2: On file-open, we create just one thread: reader-thread. The reader thread, before starting to read the file, would create the spell-checker-thread. It would then start to read the text-file, and would send a set of lines to both threads! The spell-checker would report back the spell-results to the GUI-thread only. Another message/delegate is required for the spell-results, in GUI, as explained in approach 1.

We will user Approach 2 for this one. It should be noted that the GUI update is time consuming than other CPU bound operations. We want to let the user use our application smoothly. Thus, less burden on the GUI-thread. Also, unlike the spell-checking example (have you gripped that code?), we will not send regular updates about spelling checks. Instead, we would send a total of the Valid and Invalid words, at the end (if you remember, I elaborated this approach somewhere in this article!). So here is what it's like (GUI-thread not shown):

And here is the abstraction on how these three threads interact with each other:

Since this example does not do anything extra than just combining the previous two examples, there is no need to elaborate it more. I hope the above diagrams are sufficient. For the relevant project, look at 'File Reading with Spell Check' in the LargeFileReading project.

[•] Practical Example 10: Performing Parallel Search

Last example, but definitely not the least! This example presents the dominance where multithreaded programming rules. Before we get into the technical details, let me discuss a real life example.

There are ten laborers (diggers, as mentioned in example 4) who are supposed to dig land. Now assume they have a big ground to dig, but a single shovel or just one digging-machine. In this case, all the other nine diggers must wait until the one who is using the device passes it to others. Now assume they are now given 5 devices to dig. As you know, now five workers can work simultaneously, and only five are waiting for the digging-device. In computing, the digging-device is the processor (or say, the processor-core).

The following diagram illustrates how a single device can impact the overall digging process:

And the following illustrates how the five-devices usage can optimize the digging. Note that they dig on different areas of the same ground.

Now, let's get into the technical arena.

Assume your application is holding up a huge database in its memory for faster retrieval or quick search. The data is not supposed to be modified in any way. You just need to search in memory. You would say - that's simple! Just write up a thread that would perform searching within the container (as mentioned, the container could be a list, array, vector, map...). That's true! But would not give significant improvement on a multiprocessor (2, 48, or even 256 processors) machine. If you write the searching code in one thread, it can process in a sequential manner, not in a parallel manner (here, I'm not talking about multiple search requests, but one search request for some data in say, a 100,000 element container). This would give your process only one processor/core (the shovel), and other processor-cores would be sitting idle. So the optimal solution is to create more than one thread to do the searching.

Like the digging example, we decide which area of the database a particular thread has access to. Unlike digging, we do not modify the database (or any collection in memory). We just search.

For example, let's dedicate two threads to perform the search. We can do it this way:

Create two threads; let them know their range to look into the container. Here, we can say 0 to 49,999 for the first thread, and the other thread searches in the range 50,000 to 99,999. We can also specify processor affinity, so that these two threads run actually on two different processors. But I'm skipping this for simplicity.
The two threads search for the same data (search request) simultaneously! They respond back to the main thread (or some kind-of watcher-thread) in one of the approaches discussed above (see the Spell Checking example).
The watcher-thread or the main thread (which originally sent the request) would then join the search results from these two threads and would notify the end user. Depending on how you give results to the end user, you can return an entire result in one go, or notify the end user about results as they arrive - or in bursts (say 100 results in one burst).

You might wonder what a watcher-thread has to do in this case. Well, do not go for the terms I invent. The main thread which sent the search-query may be the main thread, the GUI-thread, the main-thread of the server application that listens on the socket, or so on. That thread cannot wait for the searching to finish - it must respond to other requests and ask the worker-threads to do the actual job. Now, as I pointed out above, joining the search result is also required; the middle-thread (as I named it: watcher-thread) does the joining work. Also, the middle-thread is responsible for creating the number of actual searcher-threads, depending on the number of records, CPUs in the machine, the load on your application, the response time required etc. I hope you are getting the real feel, enjoyment, and complexities involved in multithreading!!

Relax! I am not covering that much of complexity in the example. Just this simple: at application startup, the application would create a set of integer numbers (say 10 million numbers) and randomize them. This number-collection would act as the database for our application that would be searched with multiple threads depending on the user's query. I've made three searching examples for this one:

Search on the basis of a relational-operator and the number given by the user. The relational operator could be one of (==, <, >, <=, >= and !=).
Search for prime numbers within the number database. I also added a criterion (criteria) the user can supply: search prime numbers greater than. See the example code attached.
Search if number is divisible by or not-divisible by the given number.

All three examples search on the same number-collection, and can also work in parallel with the others. That means you can search for a prime number as well as divisible-by numbers at the same time. For the first two sub-examples given, the thread would collect a set of numbers, and on reaching some threshold value, it would notify the GUI-thread. The GUI-thread would then display the numbers to list-control. The list-control will display the thread-ID from which the data arrived, and the numbers matching the criteria.

For the third example, I take your implicit permission to explain the last thing: Interlocked operations.

As you can see in the Divisibility tab above, only the count of matching numbers is displayed. DivisibilityCounterThread only increments the counter, does not collect the matching number. But wait, there would be multiple instances of the Counter-thread running, and after a certain threshold, the thread must respond to the GUI-thread about the count. It is perfectly possible that each instance of the thread would have a local counter variable and would send that variable to the GUI thread, and the GUI thread (having an actual counter-variable) would add this value and would display it on the UI.

But I use another approach: increment the same-variable from every instance of the Divisibility-Counter thread. For this, we just need an integer variable at the class level (or at any level you prefer, but must be available to the thread for modification), and allow the thread to amend it. As mentioned previously, modifying a 32-bit integer on a 32-bit platform is an atomic operation; but we should not do that. We can simply use a Critical Section/lock to modify the variable in a thread-safe manner.

For example, assume there is an Employee Fund Account where all employees can put money. Since it is a single bank account, multiple deposits may happen at the same time. The following diagram illustrates when data loss may happen not using an interlocked operation:

Using the safe approach, i.e., using an interlocked variable modification, the situation would be different, and is illustrated below:

Thus we will use Interlocked Variable Access. The following table lists the programming elements (these are the most commonly used, but not complete). Please refer the documentation:

Atomic Operation on Variable	Windows API / C++	.NET Framework
Incrementing by one	`InterlockedIncrement` `InterlockedIncrement64`	`Interlocked.Increment`
Decrementing by one	`InterlockedDecrement` `InterlockedDecrement64`	`Interlocked.Decrement`
Adding/reducing by some value (specify negative to reduce)	`InterlockedExchangeAdd` `InterlockedExchangeAdd64`	`Interlocked.Add`
Reading	No function available. Use above functions with 0 as the second parameter. (Read this discussion.)	`Interlocked.Read` (Only for 64-bit. Use `Interlocked.Exchange` with 0 for 32-bit.)
Assigning an other value (term: Exchanging)	`InterlockedExchange` `InterlockedExchange64` `InterlockedExchangePointer` (for pointer assignment)	`Interlocked.Exchange`
Assigning by comparison (only `==` condition)	`InterlockedCompareExchange` `InterlockedCompareExchange64`	`Interlocked. CompareExchange`
APIs suffixed with 64 are for 64-bit variables, others are for 32-bit. The `Interlocked` class has overloaded methods for 32/64-bit, as well as for other data types.

We pass the address or reference of a variable to these functions, and they modify (increment, add, subtract, AND/OR etc.) the variable in a thread safe manner. They are faster than controlled variable modification using locking/unlocking with critical section primitive.

The Division of Work

Workers mutually agree to work on a specific part of the ground, they are intelligent. So, how do we tell the threads what work-area they have to work upon? Well, first thing first - we need all threads to know and make available the work-area, the ground, to work upon (search upon). The ground is the number-collection, as you very well know by now. We make this collection available to the threads; we can place it at the global level, at some class level, or pass the reference of the collection to the thread. The article has got big, please see the attached code on how I did this.

Before any search-thread starts, we calculate the range the thread has to search. The thread count defaults to the number of CPUs in the machine the user is allowed to modify. We calculate the low-index and high-index on which the particular thread would work. These indexes are nothing but array-indexes. For instance, if there are 100 numbers and the user specifies 4 threads, we divide 0-24, 25-49, 50-74 and 75-99 as work-areas for these 4 threads. Please see the code, the code comments on how this division occurs if the total-number-count of the thread-count varies.

The following diagram illustrates how the work is divided for the different threads:

Project Details

The project contains all the three sub-examples given here. The first dialog, at application startup, would ask for the database size (number collection size). It would allocate that many integers, initialize and randomize the number. The same set of numbers would be used by the dialog followed.

MFC: ParallelSearch_MFC
.NET: ParallelSearch_NET

In the End...

Hope you liked the long journey of learning threading in a practical way. I gave my best effort to clarify everything. The attached code would definitely give you more understanding about multithreading, and how and why to program multithreaded applications. The presented code is not perfect, there are flaws, design constrains. I made them to clarify the subject, and did not make the example code perfect. Hope you can utilize the examples and the details mentioned here in your actual applications.

Let the learning curve be amended so that you glide into the realms of actual multithreading, not just reading.

Why MFC?

Because programming for GUI applications using MFC on C++ is much more convenient than the core Windows API for the same. Using other libraries like WX doesn't make sense to me, where people even cannot accept MFC. Although I first made a few applications in .NET, MFC, as well as in WinAPI, the time and effort required to make a core Win32 project did not allow me to continue further. Thus I dropped the idea. I may submit one, if it is demanded at large.

Waiting for feedback!