Click here to Skip to main content
Click here to Skip to main content

A Standard Multi-threaded Dynamic Queue

By , 28 Oct 2010
 

Introduction

This is a standard Windows / C++ implementation of a multi-threaded queue after:

M. Michael and M. Scott. "Nonblocking algorithms and preemption-safe locking on
                          multiprogrammed shared - memory multiprocessors."
Journal of Parallel and Distributed Computing, 51(1):1-26, 1998.

The queue implemented in this article is a de-facto standard in a multithreaded environment for a dynamic queue. The article cited is one of a long list of references. The queue is also explained in the textbook: "C++ Concurrency in Action - Practical Multithreading" by Anthony Williams.

The explicit goal of a multithreaded queue is to let different threads in your application to pass messages in the form of specified objects. For instance, you may have producer threads that do create messages for consumer threads that are able to respond to such messages and perform operations onto them.

The queue presented in this article is the most basic queue as dictated by the requirement of the fastest possible execution times in a real time programming environment.

The algorithm - Brief explanation

A multithreaded queue has basically an initialize section and two operations: push and pop, or alternatively, enqueue and dequeue. In a multithreaded environment, the emphasis is on the locking - unlocking of the resource while concurrently enqueueing or dequeueing items from the queue.

The two operations on the queue must allow the concurrent access to the shared resource. Hence, they must use locks (on mutexes) to hold on in order to regulate access to the shared resource.

This standard implementation of the dynamic queue solves this problem by representing the queue itself as a simple, singly linked list that is never empty; it always contains at least one single dummy node. This allows for parallel concurrent accesses of the head / tail of the queue as one element will always be there in order to prevent concurrent access of the same node. Hence, the two operations push and pop need only a lock each to regulate concurrent access to the tail / head of the queue, respectively.

It never occurs that such locks need to be held simultaneously: each operation will access the head / tail of the queue in complete independence from the other operation. This allows for the fastest implementation possible for a multithreaded queue as each lock will be held only to independently serialize the two operations; i.e., each lock is necessary only to serialize multithread access to the head or tail of the queue but not to serialize simultaneous access to the head and tail of the queue itself.

Hence the two operations push and pop run in complete independence, and may run in parallel without disrupting the state of the queue. The locks are necessary only to safely access each of the two operations alone from multiple threads; i.e., we need to serialize access to the head or tail of the queue. This is the fastest implementation possible for a multithreaded queue using locks. Non-blocking algorithms are possible, but they require increased complexity coding.

The algorithm - Pseudo-code

This is a brief description in pseudo-code of how the queue looks like in a procedural language.

In the code associated with the article, the pseudo code is translated into C++. Hence the initialisation will be performed in the constructor of the queue.

Initialisation

structure node_t  {value: data type, next: pointer to node_t}
structure queue_t {Head: pointer to node_t, Tail: 
          pointer to node_t, H_lock: lock type, T_lock: lock type}

INITIALIZE(Q: pointer to queue_t)
node = new node()              # Allocate a free node
node->next = NULL              # Make it the only node in the linked list
Q->Head = Q->Tail = node       # Both Head and Tail point to it
Q->H_lock = Q->T_lock = FREE   # Locks are initially free

Enqueueing and dequeueing items (push and pop, respectively)

ENQUEUE or PUSH(Q: pointer to queue_t, value: data type)
node = new node()        # Allocate a new node from the free list
node->value = value      # Copy enqueued value into node
node->next = NULL        # Set next pointer of node to NULL
lock(&Q->T_lock)         # Acquire Tail lock in order to access Tail
   Q->Tail->next = node  # Link node at the end of the linked list
   Q->Tail = node        # Swing Tail to node
unlock(&Q->T_lock)       # Release Tail lock

DEQUEUE or POP(Q: pointer to queue_t, pvalue: pointer to data type): boolean
lock(&Q->H_lock)              # Acquire Head lock in order to access Head
   node = Q->Head             # Read Head
   new_head = node->next      # Read next pointer
   if (new_head == NULL)      # Is queue empty?
      unlock(&Q->H lock)      # Release Head lock before return
      return FALSE            # Queue was empty
   endif
   *pvalue = new_head->value  # Queue not empty. Read value before release
   Q->Head = new_head         # Swing Head to next node
unlock(&Q->H_lock)            # Release Head lock
free(node)                    # Free node
return TRUE                   # Queue was not empty, dequeue succeeded

The solution associated to the article in C++ for the queue keeps the "move semantics" presented in the above pseudo code: pointers are used to pass and store by reference the objects (items) composing the queue. This is in contrast with a different solution possible based on a templated approach that, instead, would implement the queue as a template based queue where the elements (items) of the queue are stored by value, i.e., implementing a "copy semantics". In general, in real time programming environments, where the requirement of the fastest possible execution times is of primary concern, the "move semantics" is the standard solution to this kind of problems as it is inherently faster than the "copy semantics" approach. Hence, the queue presented in this article is implemented using pointers and respecting the philosophy of the solution cited in the original paper from where this work stems. A clear exception to this rule is, of course, when the template parameter is itself a pointer; then we will regain the "move semantics" for the queue. A template version of the queue is provided for such purposes.

Using the code

Multiple threads access the queue with the push method in order to queue new items, and multiple threads access the pop method in order to retrieve the items. Each object is passed by reference via its pointer.

Let's declare our queue:

CMultiThreadSingleQueue Queue;

Using the push method is as simple as:

CObject* ptrToObj = &SomeObject;
Queue.Push( (void*) ptrToObj );

Using the pop method is as simple as:

void* ptrToObj = NULL;
while ( Queue.Pop( ptrToObj ) )
{
  // Do something with the objet, for instance
  if ( ptrToObj )
     ((CObject*) ptrToObj)->DoWork();
}

It is important to note that the pop method should be used in a polling loop or synchronised with external events for continuous operation; for instance, in a polling situation, one would use the following approach:

while (true) 
{
   void* ptrToObj = NULL;
   while ( Queue.Pop( ptrToObj ) )
   {
      // Do something with the objet, for instance
      if ( ptrToObj )
         ((CObject*) ptrToObj)->DoWork();
   }
   Sleep(nTimeInMilliseconds); // Polling wait in the loop
   if ( m_bDoStop ) // boolean to exit thread
       break;
}

while for an event based synchronised operation mode, one would use the following approach:

while (true) 
{
   // Wait for event signalled
   WaitForSingleObject(hHandleToEvent,nTimeInMilliseconds);
   // Do work
   void* ptrToObj = NULL;
   while ( Queue.Pop( ptrToObj ) )
   {
      // Do something with the objet, for instance
      if ( ptrToObj )
         ((CObject*) ptrToObj)->DoWork();
   }
   if ( m_bDoStop ) // boolean to exit thread
       break;
}

About the demo application

This application demonstrates the use of some standard "Queues" used in real time programming techniques for inter-thread communication.

The sample offers example of use of:

  1. Class CMultiThreadSingleQueue: Concurrent unbounded queue with First In First Out (FIFO) semantics. This queue's intended use is as a buffer in between multiple producers and multiple consumers. The sample shows the use by 3 producers and 2 consumers.
  2. Class CCircBuffer: Concurrent bounded queue with First In First Out (FIFO) semantics. This queue's intended use is as a buffer in between a single producer and a single consumer. Multiple consumers and producers are not allowed. Messages are passed in between the two threads in FIFO order, and may also be stored in a shared segment memory for inter-process communication.
  3. Class CDoubleBuffer: Flip flop double buffer as it may be used in between two threads only: one producer and one consumer. The double buffer is a structure with two identical buffers that are used in a concurrent way by the two threads: when one thread writes to a buffer, the other thread may read from the second buffer; after these operations end, the buffers are swapped and the process continues. Messages are passed in between the two threads in FIFO order, and may also be stored in a shared segment memory for inter-process communication.

The sample application also demonstrates the techniques used in managing threads for producers and consumers (worker threads) and for writing a simple multi-tabbed dialog application (class CTabCtrlEx). The GridCtrl library must be compiled before the demo app. These working solutions are for Visual Studio 2008 Service Pack 1.

Points of interest

This is the de facto standard multithreaded queue with locks as used in the research.

History

  • 6 October 2010 - First draft.
  • 7 October 2010 - Added more information about specific points.
  • 21 October 2010 - Added a demo Win32/MFC application.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

About the Author

federico.strati
Software Developer (Senior)
Italy Italy
Member
Senior Software Developer in C/C++ and Oracle.
Ex-physicist holding a Ph.D. on x-ray lasers.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
AnswerRe: Standard Queue?memberfederico.strati9 Oct '10 - 20:48 
It is a standard in the following sense:
it is adopted in research as test bench against which the other
implementations are benchmarked.
 
Also, it is the founding algorithm of many works such as
the intel threading building blocks "concurrent queue" and
others.
 
It is in widespread use in many technologies.
 
In such sense it is a standard, given the spread of its uses.
 
Cheers
GeneralRe: Standard Queue?memberYuantu Huang9 Oct '10 - 21:51 
1. The original algorithms target multiprocessors, not multithreads, so they use CAS provided by hardware. Your queue uses software critical section, which is very different with LL/SC in MIPS described in the original article
2. TBB targets multi-cores, not multithreads.
3. It is questinable to say that it (original algorithms) is in widespread use in many technologies.
 
The original algorithms and TBB are targeting parallel programing, not single processor sequential multithreads programming.
GeneralRe: Standard Queue?memberfederico.strati10 Oct '10 - 5:18 
Probably there is a misunderstandings here, I term it Standard because
the algorithm is in itself a standard: in the literature it is used
as the standard algorithm to compare to other implementations.
GeneralRe: Standard Queue?memberYuantu Huang10 Oct '10 - 23:45 
No, I did not misunderstand your idea. I reckon your contribution to code project community. I did research of this area for several years and would like to give my point of view: there is no standard implementation for thread-safe queue in the world at all. Actually the critical section is not neccessary for your implementation. Mutex should be enough. Of course, using C++0x thread library to implement thread-safe queue is more effecient and much faster.
 
The simplest implementation (cross platform) thread safe queue is to wrap the std::queue by using std::mutex and std::lock_guard. e.g.: push(const T& t) { std::lock_guard lock(myMutex); myMueue.push(t); }, where std::queue myQueue;
std::mutex MyMutex; Simular wrapper can be applied to pop, front, empty. Same rule can be used to wrap vector, list, map.
 
I believe C++0x thread library, concurrency, and atomic access will be widely used in multithread and multiprocessor (parallel) applications.
GeneralRe: Standard Queue?membertrotwa11 Oct '10 - 4:11 
With a mutex you synchronize normally processes and it is more cpu intensive and slower than a critical section.
Therefore the usage of critical sections to synchronize threads is alright.
GeneralRe: Standard Queue?memberYuantu Huang11 Oct '10 - 23:34 
Basically, critical section suspends other(preemptive) multi-threads, while mutex protects critical section by using ownership flag (similar to binary semaphore). If frequency of shared resource accessed by multithreads at the same time is very high, then critical section may be less overhead. You should be aware that there are different implemetations of critical section, mutex, and semaphore on different OS (say Linux, Windows, RTOS).
GeneralRe: Standard Queue?membertrotwa12 Oct '10 - 23:03 
Last-ditch attempt:
This article and code is for Windows and it is not multi-platform.
You are mixing Windows with Posix mutex,
but they are different as I mentioned before.
Also Critical Sections suspends the threads only in worst case situations,
and using for example spin-locks internal.
Et cetera ...
GeneralRe: Standard Queue?memberYuantu Huang13 Oct '10 - 23:16 
I do not understand what you are talking about. I have been using both critical section and mutex a lot in Windows middleware programming and using Mutex in Linux socket programming. You may have a look at my code at http://www.codeproject.com/KB/IP/Rshd.aspx. If you can convience me I am wrong, please explain what difference among critical section, mutex, and semaphore in your point of view. Here is codeproject community, the goal is to help each other, to learn each other, and to share knowledge each other, isn't.
GeneralRe: Standard Queue?memberfederico.strati14 Oct '10 - 2:18 
I think this is a sterile discussion, the article
discusses the simplest example of a multi thread queue
in a windows environment only, hence using critical
sections light weight objects instead of mutexes.
 
The algorithm for this queue using locks appears to be
of widespread use and hence is termed "standard".
 
You may want to discuss in another article other solutions
implemented using CAS (like using the InterlockedCompareExchange on windows),
solutions that are lock free and have other benefits.
 
This will benefit the user community if you provide
a lock free implementation for windows of the
algorithms presented in the article M. Michael and M. Scott.
"Nonblocking algorithms and preemption-safe locking on
multiprogrammed shared - memory multiprocessors."
Journal of Parallel and Distributed Computing, 51(1):1–26, 1998.
 
Cheers
GeneralPop speedmemberSerge Savostin9 Oct '10 - 1:10 
Hello Federico,
 
Great article, just created something like that by myself.
Before adding a critical section part into my classes I noticed that the Pop() function works quite slow.
The test code:
vector<int> b;
for(int i=0; i<1000000; i++)
    b.push_back(i);
// 1 second passed
for(int i=0; i<1000000; i++)
    b.pop_back(); // not the same but...
// 1 second passed

CTMultiThreadSingleQueue<int> c;
for(int i=0; i<1000000; i++)
    c.Push(i);
// 3 seconds passed
int z;
for(int i=0; i<1000000; i++)
    c.Pop(z);
// 26 seconds passed
 
(I commented out all CCriticalSection code)
If you comment out the try { delete pCurrentNode; } catch(...) { NULL; } line the Pop() function works fine (sure with memory leak).
 
Confused | :confused:

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 29 Oct 2010
Article Copyright 2010 by federico.strati
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid