Background
I use threads a lot and, with them, I use synchronization objects a lot. But, except for some minor exceptions, the synchronization objects are unmanaged resources. For a long time I used them but recently I understood better how to use the Monitor
methods, Wait
and Pulse
/PulseAll
, and with them I was able to create full managed synchronization primitives.
EventWaitHandles
One of the simplest ways to pass messages from one thread to another, and to wait for those messages, is to use some kind of EventWaitHandle
, usually AutoResetEvent
with a shared variable for the data.
AutoResetEvent
has many problems as it uses unmanaged resources, it's very slow to create and if you Close()
it when there are threads waiting for it, well, those threads will be kept waiting forever.
When I first saw Pulse()
I thought it could solve the problem, but the MSDN documentation says to avoid it. And, in fact, Pulse()
alone does not give the same result, as Pulse()
will only release an already waiting thread. It will not keep the state "signalled
" if there are no threads waiting making the next thread to try to wait to be released immediately (as happens when using an AutoResetEvent
).
But, there is a way to solve it. By using a boolean variable.
So, to signal the object, we do something like this:
lock(_lock)
{
_value = true;
Monitor.Pulse(_lock);
}
And the WaitOne()
waits only if the signal is not already signalled. If it is already signalled, it can reset the signal and return immediately.
lock(_lock)
{
while(!_signalled)
Monitor.Wait(_lock);
_value = false;
}
Ok. With this simple structure, I have the equivalent of AutoResetEvent.Set()
and AutoReset.WaitOne()
methods.
To make a ManualResetEvent
, I need to replace the Monitor.Pulse()
by Monitor.PulseAll()
, and the _value = false
must be removed.
Surely I will still need the Reset()
method, but I am not showing the entire class yet.
As I said before, one of the problems of the AutoResetEvent
is that a Thread
can be kept waiting forever if the AutoResetEvent
is closed/disposed while there are threads waiting for it.
To simulate the problem, you could use this code:
using System.Threading;
namespace ConsoleApplication1
{
public static class Program
{
private static AutoResetEvent _are = new AutoResetEvent(false);
static void Main(string[] args)
{
Thread thread = new Thread(_RunThread);
thread.Start();
Thread.Sleep(1000);
_are.Close();
}
private static void _RunThread(object state)
{
_are.WaitOne();
}
}
}
After one second, the AutoResetEvent
is disposed, but the Thread
waiting for it is not released and does not throw an exception by using a Disposed
object.
If I simply replace the Close()
by a Set()
the single thread waiting will be released and it can Close()
the event (as there was only one Thread
waiting). But what happens if I have more than one Thread
waiting on the event?
Calling Set()
on a ManualResetEvent
releases all threads, but on an AutoResetEvent
only one thread is released. Also, if I don't know which is the last thread to finish, I will never be sure if I could Close()/Dispose()
or not, so it will be better to avoid Close()/Dispose()
than to call it too soon.
I decided that my ManagedAutoResetEvent
should be disposable, but disposing it only Set
s it in an irreversible way. So, when it is no more needed, you Dispose()
it and all threads actually waiting, or any threads that decide to wait after that, will be released immediately.
So, first let's try it with a normal handle:
using System.Threading;
namespace ConsoleApplication1
{
public static class Program
{
private static AutoResetEvent _are = new AutoResetEvent(false);
static void Main(string[] args)
{
_are.Close();
Thread thread1 = new Thread(_RunThread);
thread1.Start();
Thread thread2 = new Thread(_RunThread);
thread2.Start();
}
private static void _RunThread(object state)
{
_are.WaitOne();
}
}
}
I disposed the AutoResetEvent
before the Thread
s have a chance to wait on it. With the normal AutoResetEvent
an ObjectDisposedException
is thrown, with the message "Safe handle has been closed
".
But, changing the AutoResetEvent
by a ManagedAutoResetEvent
, the program will exit normally:
using System.Threading;
using Pfz.Threading;
namespace ConsoleApplication1
{
public static class Program
{
private static ManagedAutoResetEvent _are = new ManagedAutoResetEvent();
static void Main(string[] args)
{
_are.Dispose();
Thread thread1 = new Thread(_RunThread);
thread1.Start();
Thread thread2 = new Thread(_RunThread);
thread2.Start();
}
private static void _RunThread(object state)
{
_are.WaitOne();
}
}
}
You can also use the ManagedAutoResetEvent
in the first sample and the program will exit after one second.
Why Is This Better?
Well, the best example I can think of is a component that has an AutoResetEvent
and can have an unknown number of threads waiting on that event.
Disposing the component should immediately dispose all its inner components, and the AutoResetEvent
is one of them. But with normal AutoResetEvent
s, that will cause many threads to be lost waiting forever.
Setting an AutoResetEvent
is not really that useful, as it will release only one thread. But having an AutoResetEvent
that becomes always set when disposed will release all waiting threads. Surely the Thread
s must check if the component was disposed and handle it properly (throw an Exception
, return null
or something similar) but then the call to Dispose()
will be at the right place, at the component's Dispose()
. No risk of keeping threads waiting forever, no complicated code to discover if this is the last thread and no wait for Garbage Collection.
So, to make that possible, when disposing the ManagedAutoResetEvent
I set the _value
, the _wasDisposed
and invoke PulseAll()
method. Then, all wait methods only reset the event if it was not disposed. So the code ended up like this:
using System;
using System.Threading;
namespace Pfz.Threading
{
public sealed class ManagedAutoResetEvent:
IAdvancedDisposable,
IEventWait
{
private readonly object _lock = new object();
private bool _value;
private bool _wasDisposed;
public ManagedAutoResetEvent()
{
}
public ManagedAutoResetEvent(bool initialState)
{
_value = initialState;
}
public void Dispose()
{
lock(_lock)
{
_wasDisposed = true;
_value = true;
Monitor.PulseAll(_lock);
}
}
public bool WasDisposed
{
get
{
return _wasDisposed;
}
}
public void Reset()
{
lock(_lock)
{
if (_wasDisposed)
return;
_value = false;
}
}
public void Set()
{
lock(_lock)
{
_value = true;
Monitor.Pulse(_lock);
}
}
public void WaitOne()
{
lock(_lock)
{
while(!_value)
Monitor.Wait(_lock);
if (!_wasDisposed)
_value = false;
}
}
public bool WaitOne(int millisecondsTimeout)
{
lock(_lock)
{
while(!_value)
if (!Monitor.Wait(_lock, millisecondsTimeout))
return false;
if (!_wasDisposed)
_value = false;
}
return true;
}
public bool WaitOne(TimeSpan timeout)
{
lock(_lock)
{
while(!_value)
if (!Monitor.Wait(_lock, timeout))
return false;
if (!_wasDisposed)
_value = false;
}
return true;
}
}
}
As you can see, the Dispose()
uses the PulseAll()
, and the WaitOne()
only resets the value if the event not disposed.
With it, it is really safe to Dispose()
the event in the Dispose()
method of a class that uses it and be sure that no thread will not be kept waiting forever.
More Synchronization Classes based on Monitor.Wait() and Monitor.Pulse()
I presented the equivalent of an AutoResetEvent
. By only changing the Pulse()
by PulseAll()
and by removing the _value = false;
from the WaitOne()
methods I made a ManagedManualResetEvent
. I tried to put both in the same class, but the performance was not so good and I opted to keep two completely separated classes.
But that's not all. With the same technique, I was able to create a "real slim" Semaphore
. Instead of using a boolean value, I use an integer value as the "still available". When you try to Wait()
(or Enter()
... I really prefer the Enter name) it checks if the value is -1
(disposed). Using the same principle, when it is disposed everyone is released. If not, it checks the availableCount
, if it is zero then it waits, if it is more than zero, it reduces the value by one and returns.
Disadvantages
Well, my code has some disadvantages. Normal ManualResetEvent
and AutoResetEvent
can cross app domain boundaries. These new components can't.
But, to be honest, I only used cross application WaitEventHandle
s when trying to use MemoryMappedFile
s. The performance of these events is equivalent on Wait()
and Set
(ok, sometimes a little slower) but they are much, much faster to create and destroy. In fact, I don't know why Microsoft created the Pulse()/PulseAll()
and Wait()
methods and didn't create the managed versions of the WaitHandle
s.
Locks, SpinLocks and Optimistic Locks
The most basic synchronization is not done by WaitHandle
s, it is done by lock
s.
While a wait handle waits for something to happen (the event become signalled) and in many situations waits for very long times, locks are used to protect the resources from simultaneous accesses, so if two threads want to do something, only one obtains the lock and the other waits.
I already used the lock
keyword to create my managed WaitHandle
s, but that's mandatory to use the Wait()
and Pulse()
methods.
In fact, locks can be of various types. There are the ReaderWriterLock
s, in which many readers can hold a lock at the same time, but to do any change the user needs an exclusive write lock. There are the SpinLock
s and so on.
I used locks for years and many times I avoided ReaderWriterLock
and ReaderWriterLockSlim
.
ReaderWriterLock
had some bugs and the slim version is not that slim. In many situations, I thought it was better to use full locks (lock
keyword) than to use ReadLock
s, as it consumes a lot of CPU to acquire such lock. So, in many cases I spent more time using a read-lock than waiting for the other thread to finish its work with a full lock.
But when doing my game, I thought it was time to check the new locking mechanisms.
SpinLock - I will never use the actual implementation
SpinLock
is said to be very useful for a lot of fine-grained locks that are used by very small times. Its optimization is in the fact that it does not release its processor time while waiting, it keeps "Spinning".
So, when I saw it was good for very short lock times, I thought it will be ideal for me, but I was very disappointed.
The uncontended lock
keyword is almost twice as fast as the SpinLock
. That made me think the SpinLock
is useless, as the rare times in which it will be faster by spinning will be much less common than the times it will be slower to acquire an uncontended lock, after all, the more fine-grained locks, the less times two threads will try to acquire the same lock at the same time.
Even if they can still have some advantages, the locks in my game are uncontended most of the time, so using SpinLock
s will only make things slower and more CPU intensive. So it was by no means a solution.
But I remembered the time when I programmed for BeOS. They talked about the "Bemaphores" and about optimistic locking. At the time, I didn't understand all the details, but I decided to try and create my own lock, using only the Interlocked
class.
Interlocked
To those who don't know, the Interlocked
class has mathematical methods for adding, subtracting and comparing and exchanging values that are "atomic". By default, if two threads try to add one to the same variable, there is a chance that only one gets added, as the two threads first read the initial value (let's say, 0), sum 1 (becoming 1) and then store one.
Interlocked does those operations guaranteeing that if both threads sum one, the final sum will be of 2, each thread will receive the result of its operation correctly and they are much faster than doing:
lock(someLock)
_variable++;
So, to create a exclusive lock using Interlocked
, I thought about doing something like this to enter:
while(true)
{
if (Interlocked.Increment(ref _value) == 1)
return;
Interlocked.Decrement(ref _value);
Thread.Yield();
}
This is an optimistic lock because it starts by adding one and, if the result is one, it acquires the lock and can return.
If it can't manage to get the lock, then it decrements the value and waits. In the first version I used Thread.Yield()
to wait because I didn't know spinning well enough to trust it, but the actual implementation uses the SpinWait
struct as it works much better.
I did that original test expecting a bad result but I was surprised that this outperformed even the custom lock
keyword.
So, I decided to go one step further and create a ReaderWriterLock
.
I did that by considering that values from 0 to 65535 (the last 16 bits) are used for readers, while values higher than that are used by writers.
So, the EnterReadLock()
method looks like this:
while(true)
{
if (Interlocked.Increment(ref _value) <= 65535)
return;
Interlocked.Decrement(ref _value);
Thread.Yield();
}
It increments the value and, if it is less than a write-lock, then there are only readers, so it can return immediately.
And the EnterWriteLock()
looks like this:
while(true)
{
if (Interlocked.Add(ref _value, 65536) == 65536)
return;
Interlocked.Sum(ref _value, -65536);
Thread.Yield();
}
It adds 65536
and, if the result value is exactly that, there are no other readers or writers, so it gained the exclusive lock.
It worked and, when I compared it to ReaderWriterLockSlim
it was about 5 times faster (that varies from the number of CPUs, but in all my tests it was faster).
The actual class is more complete, as it gives preference to the writer after some time, it does not keep Incrementing/Decrementing while waiting, it uses the CompareExchange
to do the job and it also supports Upgradeable lock. So, with it, I really replaced all my uses of ReaderWriterLockSlim
.
Disadvantages
Surely my technique has disadvantages.
- The first one is the use of the
SpinWait
. If it is goind to be a long wait, it will be too processor intensive. This is partially solved by the OptimisticReaderWriterLock
that is included in the sample (it is good for optimistic locks, but prepared for long waits... but it is slower on fast waits). Yet you should decide which type of lock to use when compiling, while the wait occurs during run-time, so if long waits happens, it will be bad to use the SpinReaderWriterLock(Slim)
; - The second is that I use only an integer to check everything, so I don't know if the actual threads holds the lock or not, and trying to re-acquire a write lock will cause a dead-lock. There will be no exceptions telling you that you are doing something wrong and recursion will never be supported;
- There is no check when releasing the lock, so you can enter a write-lock and exit a read-lock. I solved part of this problem by separating the code in two classes. The
SpinReaderWriterLockSlim
is the real lock and it is a struct. Then the SpinReaderWriterLock
is a class that contains it and has methods that return Disposable objects to release the lock, so the appropriate Exit is done and Disposing it many times will only exit the lock once. Also, the using
keyword can be used. It is slower than the SpinReaderWriterLockSlim
but it is still faster than .NET ReaderWriterLockSlim
; - It is not Abort safe. The
lock
keyword is abort safe but the ReaderWriterLockSlim
is not. So I think mine is not that bad, after all calling Abort()
is a bad practice anyway; - When Upgrading to a
WriteLock
, you can't call EnterWriteLock
. You must call UpgradeToWriteLock
or the Upgrade
method on the object returned by the UpgradeableLock()
method. This one can be problematic if you are simply replacing ReaderWriterLockSlim
s by the SpinReaderWriterLock
.
Advantages
- Until now, it proved to be faster than
ReaderWriterLockSlim
. I did my tests on a single CPU computer and on a 4 CPU computer, and in both cases SpinReaderWriterLock
was faster. - As it uses only an integer to do the job, the object is very small.
- It does not uses unmanaged resources, so you don't need to
Dispose()
it (in fact, it does not have a Dispose
method).
SpinLockSlim
After many time from the original post, I decided to add an extra class. It is the SpinLockSlim
. It is similar to my original test, but it uses the SpinWait
struct to wait and it does not increase/decrease a count, it uses the Interlocked.CompareExchange()
method. With this, I don't have an extra cache flush if the lock is not get and also it does not uses any interlocked method to release the lock, it simple sets the variable to 0. For many time I though that was unsafe, but the .NET gives a strong guarantee that all writes have "release" semantics, so setting a variable of type int
when we don't need to know its previous value can be done without using any volatile
or Interlocked
method, and as releasing an exclusive lock means we know we had the lock, there is no need to read the previous value. With this, I created a lock type that is much faster than the .NET implementation of the SpinLock
and for many core processors (which may have a bad performance for many consecutive locks [like in the sample]) the SpinLockSlim
does a great job.
The SpinReaderWriterLockSlim
(and the old YieldReaderWriterLockSlim
) are doing a worst job than the .NET ReaderWriterLockSlim
in my actual computer, even if there are only reads. Apparently the .NET implementation that does too many things and spends too many time to acquire the lock gives CPU caches a chance to work better than these implementations that are pretty fast with too many concurrence, yet the SpinLockSlim
does a better job in such cases (even being a full lock instead of a reader/writer lock). Yet, the SpinReaderWriterLockSlim
will continue to be faster if there is not that amount of forced concurrency (like in the sample).
Sample
In this article, you can download a sample that only does speed comparisons between the locks and shows the dead-lock that can happen by using the AutoResetEvent
, which does not happen with the ManagedAutoResetEvent
.
But if you want a full application that uses all of those resources, see my game at this link as I decided to create all of these to improve it.
Version History
- 30 May 2014. Corrected a bug in the OptimisticReaderWriterLock class that could cause dead-locks.
- 03 May 2013. Replaced the YieldReaderWriterLock classes by the SpinReaderWriterLockClasses and added the SpinLockSlim. Updated the sample;
- 23 Aug 2011. First version.