A Standard Multi-threaded Dynamic Queue

federico.strati

4.79/5 (23 votes)

Oct 6, 2010

GPL3

6 min read

159000

5577

This is a standard Windows / C++ implementation of a multi-threaded queue.

Introduction

This is a standard Windows / C++ implementation of a multi-threaded queue after:

M. Michael and M. Scott. "Nonblocking algorithms and preemption-safe locking on
                          multiprogrammed shared - memory multiprocessors."
Journal of Parallel and Distributed Computing, 51(1):1-26, 1998.

Article link

The queue implemented in this article is a de-facto standard in a multithreaded environment for a dynamic queue. The article cited is one of a long list of references. The queue is also explained in the textbook: "C++ Concurrency in Action - Practical Multithreading" by Anthony Williams.

The explicit goal of a multithreaded queue is to let different threads in your application to pass messages in the form of specified objects. For instance, you may have producer threads that do create messages for consumer threads that are able to respond to such messages and perform operations onto them.

The queue presented in this article is the most basic queue as dictated by the requirement of the fastest possible execution times in a real time programming environment.

The algorithm - Brief explanation

A multithreaded queue has basically an initialize section and two operations: push and pop, or alternatively, enqueue and dequeue. In a multithreaded environment, the emphasis is on the locking - unlocking of the resource while concurrently enqueueing or dequeueing items from the queue.

The two operations on the queue must allow the concurrent access to the shared resource. Hence, they must use locks (on mutexes) to hold on in order to regulate access to the shared resource.

This standard implementation of the dynamic queue solves this problem by representing the queue itself as a simple, singly linked list that is never empty; it always contains at least one single dummy node. This allows for parallel concurrent accesses of the head / tail of the queue as one element will always be there in order to prevent concurrent access of the same node. Hence, the two operations push and pop need only a lock each to regulate concurrent access to the tail / head of the queue, respectively.

It never occurs that such locks need to be held simultaneously: each operation will access the head / tail of the queue in complete independence from the other operation. This allows for the fastest implementation possible for a multithreaded queue as each lock will be held only to independently serialize the two operations; i.e., each lock is necessary only to serialize multithread access to the head or tail of the queue but not to serialize simultaneous access to the head and tail of the queue itself.

Hence the two operations push and pop run in complete independence, and may run in parallel without disrupting the state of the queue. The locks are necessary only to safely access each of the two operations alone from multiple threads; i.e., we need to serialize access to the head or tail of the queue. This is the fastest implementation possible for a multithreaded queue using locks. Non-blocking algorithms are possible, but they require increased complexity coding.

The algorithm - Pseudo-code

This is a brief description in pseudo-code of how the queue looks like in a procedural language.

In the code associated with the article, the pseudo code is translated into C++. Hence the initialisation will be performed in the constructor of the queue.

Initialisation

structure node_t  {value: data type, next: pointer to node_t}
structure queue_t {Head: pointer to node_t, Tail: 
          pointer to node_t, H_lock: lock type, T_lock: lock type}

INITIALIZE(Q: pointer to queue_t)
node = new node()              # Allocate a free node
node->next = NULL              # Make it the only node in the linked list
Q->Head = Q->Tail = node       # Both Head and Tail point to it
Q->H_lock = Q->T_lock = FREE   # Locks are initially free

Enqueueing and dequeueing items (push and pop, respectively)

ENQUEUE or PUSH(Q: pointer to queue_t, value: data type)
node = new node()        # Allocate a new node from the free list
node->value = value      # Copy enqueued value into node
node->next = NULL        # Set next pointer of node to NULL
lock(&Q->T_lock)         # Acquire Tail lock in order to access Tail
   Q->Tail->next = node  # Link node at the end of the linked list
   Q->Tail = node        # Swing Tail to node
unlock(&Q->T_lock)       # Release Tail lock

DEQUEUE or POP(Q: pointer to queue_t, pvalue: pointer to data type): boolean
lock(&Q->H_lock)              # Acquire Head lock in order to access Head
   node = Q->Head             # Read Head
   new_head = node->next      # Read next pointer
   if (new_head == NULL)      # Is queue empty?
      unlock(&Q->H lock)      # Release Head lock before return
      return FALSE            # Queue was empty
   endif
   *pvalue = new_head->value  # Queue not empty. Read value before release
   Q->Head = new_head         # Swing Head to next node
unlock(&Q->H_lock)            # Release Head lock
free(node)                    # Free node
return TRUE                   # Queue was not empty, dequeue succeeded

The solution associated to the article in C++ for the queue keeps the "move semantics" presented in the above pseudo code: pointers are used to pass and store by reference the objects (items) composing the queue. This is in contrast with a different solution possible based on a templated approach that, instead, would implement the queue as a template based queue where the elements (items) of the queue are stored by value, i.e., implementing a "copy semantics". In general, in real time programming environments, where the requirement of the fastest possible execution times is of primary concern, the "move semantics" is the standard solution to this kind of problems as it is inherently faster than the "copy semantics" approach. Hence, the queue presented in this article is implemented using pointers and respecting the philosophy of the solution cited in the original paper from where this work stems. A clear exception to this rule is, of course, when the template parameter is itself a pointer; then we will regain the "move semantics" for the queue. A template version of the queue is provided for such purposes.

Using the code

Multiple threads access the queue with the push method in order to queue new items, and multiple threads access the pop method in order to retrieve the items. Each object is passed by reference via its pointer.

Let's declare our queue:

CMultiThreadSingleQueue Queue;

Using the push method is as simple as:

CObject* ptrToObj = &SomeObject;
Queue.Push( (void*) ptrToObj );

Using the pop method is as simple as:

void* ptrToObj = NULL;
while ( Queue.Pop( ptrToObj ) )
{
  // Do something with the objet, for instance
  if ( ptrToObj )
     ((CObject*) ptrToObj)->DoWork();
}

It is important to note that the pop method should be used in a polling loop or synchronised with external events for continuous operation; for instance, in a polling situation, one would use the following approach:

while (true) 
{
   void* ptrToObj = NULL;
   while ( Queue.Pop( ptrToObj ) )
   {
      // Do something with the objet, for instance
      if ( ptrToObj )
         ((CObject*) ptrToObj)->DoWork();
   }
   Sleep(nTimeInMilliseconds); // Polling wait in the loop
   if ( m_bDoStop ) // boolean to exit thread
       break;
}

while for an event based synchronised operation mode, one would use the following approach:

while (true) 
{
   // Wait for event signalled
   WaitForSingleObject(hHandleToEvent,nTimeInMilliseconds);
   // Do work
   void* ptrToObj = NULL;
   while ( Queue.Pop( ptrToObj ) )
   {
      // Do something with the objet, for instance
      if ( ptrToObj )
         ((CObject*) ptrToObj)->DoWork();
   }
   if ( m_bDoStop ) // boolean to exit thread
       break;
}

About the demo application

This application demonstrates the use of some standard "Queues" used in real time programming techniques for inter-thread communication.

The sample offers example of use of:

Class CMultiThreadSingleQueue: Concurrent unbounded queue with First In First Out (FIFO) semantics. This queue's intended use is as a buffer in between multiple producers and multiple consumers. The sample shows the use by 3 producers and 2 consumers.
Class CCircBuffer: Concurrent bounded queue with First In First Out (FIFO) semantics. This queue's intended use is as a buffer in between a single producer and a single consumer. Multiple consumers and producers are not allowed. Messages are passed in between the two threads in FIFO order, and may also be stored in a shared segment memory for inter-process communication.
Class CDoubleBuffer: Flip flop double buffer as it may be used in between two threads only: one producer and one consumer. The double buffer is a structure with two identical buffers that are used in a concurrent way by the two threads: when one thread writes to a buffer, the other thread may read from the second buffer; after these operations end, the buffers are swapped and the process continues. Messages are passed in between the two threads in FIFO order, and may also be stored in a shared segment memory for inter-process communication.

The sample application also demonstrates the techniques used in managing threads for producers and consumers (worker threads) and for writing a simple multi-tabbed dialog application (class CTabCtrlEx). The GridCtrl library must be compiled before the demo app. These working solutions are for Visual Studio 2008 Service Pack 1.

Points of interest

This is the de facto standard multithreaded queue with locks as used in the research.

History

6 October 2010 - First draft.
7 October 2010 - Added more information about specific points.
21 October 2010 - Added a demo Win32/MFC application.