Mutual exclusion objects, such as mutexes, semaphores, interlocked increment / decrement, are slow compared to what you would write if all operations were atomic. For example, rather than this:
we could use this much faster code:
However, the faster code doesn't work because the threads might context switch between the true condition of
doingStuff==false; and the next line
doingStuff=true;. This is too bad, because mutual exclusion objects are slow while simple variable checks are fast.
There is an exception though, which happily turns out to cover one of the most common uses of mutual exclusion objects: sending data from one thread to another in a queue. This is called the Single Producer Single Consumer model. My implementation takes advantage of this model, and uses an implementation with no mutual exclusion objects that is twice as fast as you would get otherwise.
While performance profiling my network library RakNet, I found that, at one point, my mutex Lock and Unlock calls took 20% of the total program CPU usage. Refactoring the code to avoid contention helped a lot, but as it turned out, the Lock and Unlock calls themselves were ponderously slow. Was there a better way? As it turns out, using interlocked volatile variables work, and is twice as fast. I've tested the code on a single processor Athlon 3000, a dual core Athlon 4200, and a multi-processor console, and it worked flawlessly over a billion iterations every time.
Using the code
It's important to note that this code is for the special circumstance when you have a single producer single consumer model; only one thread ever writes to a stream and only one thread ever reads from that stream.
The class takes a type and will pre-allocate
#define MINIMUM_LIST_SIZE elements in a linked list containing your type.
To instantiate the class:
MyType can be a
struct or a native type such as an
To write to the queue, simply call
WriteLock, and it will return a pointer to your type.
MyType *myType = spc.WriteLock();
Write your data to the returned pointer:
When you're done, call
WriteUnlock do not have to be in order but you need the same total number of calls. This is OK although your reader thread won't get any data until the first
myType = spc.WriteUnlock();
myType = spc.WriteUnlock();
Reading is similar to writing. If no data is available to read,
ReadLock will return 0.
WriteLock, it is OK to do multiple calls to
ReadUnlock as long as the total number of calls to each function match.
Points of Interest
There are three problems facing a programmer who wants to send queued elements from one thread to another:
- Writes getting corrupted due a context switch to another writer thread while in the middle of a write.
- A context switch immediately after checking the synchronization variable (the problem I described in the introduction).
- Reads getting corrupted due to a context switch while another thread is in the middle of a write.
The first problem is easy to solve: our definition of single producer single consumer means that only one thread will ever write to our class.
The second problem is solved using two volatile variables (the interlock) and branching to the failure condition if either variable fails. According to the MSVC help file on the
The system always reads the current value of a volatile object at the point it is requested, even if the previous instruction asked for a value from the same object. Also, the value of the object is written immediately on assignment.
Each node in the linked list has a volatile pointer to the next element in the list and a volatile
bool indicating if the node data is valid. Before reading, I check to make sure the read pointer does not equal the write pointer. I also check to make sure the boolean variable is not false. If either is the case, then I return 0 rather than return a node that is possibly being written to. Similarly, for writing, I check to make sure the next element of the write pointer is not the read pointer, and the boolean variable is not true. If either is the case, then I allocate a new node in the linked list rather than reuse an old node.
The reason this works is because by the definition of volatile, I know that one variable is finished being written to by the time another starts being written to. So one variable always has the correct value. If either variable has the correct value, then I use the failure condition (return 0 or allocate an extra pointer).
If A then B
If not B then not A (Modus Tollens)
A is the statement "if either variable has the correct value"
By set theory, if (C or D) == if (!C and !D)
Hence: If not B then not C and not D
This is the statement:
If I do not use the failure condition (I reuse a node), then the pointer does not have the wrong value and the boolean does not have the wrong value.
Finally, we get to the third case: what happens if a Read reads the wrong value because we read when the Write is in the middle of a write? This could happen if, for example, the high two bytes of the pointer were written, but the low two bytes were not written? It doesn't matter, because if the boolean was in the middle of a write and has a trashed value, the pointer is still correct because we are using the
volatile keyword. The same is true for the more likely case of the pointer having a trashed value - even if this is the case, the boolean has the correct value and we enter the failure case. The worst that can happen is we return 0 for
ReadLock when data was actually available, or we allocate an extra node when we could have actually reused a node.