 |
|
 |
since upgrading can produce a deadlock (or its not atomic) why not try following. First aquire write lock, then check if we need write lock. If not downgrade to read lock and read data. Or don't downgrade and write data.
|
|
|
|
 |
|
 |
I believe that acquiring the write lock (even if only temporarily) when the caller wants only to read would mean that only one reader at a time would be able to acquire the lock (in order to acquire the write lock no thread must have a reader lock). That would effectively render the lock a simple critical section.
|
|
|
|
 |
|
 |
Sorry to burst your bubble, but there is a reason you can't just upgrade a reader-writer lock. If two threads call ClaimReader and then call ClaimWriter at the same time they will deadlock EVERY time (both have a reader lock and attempt to get a writer lock without releasing their reader lock--both have to wait for each other to finish before they can proceed). The .NET documentation on this is misleading (actually the name of the function itself is misleading), probably because they didn't realize this when they initially wrote it. When "upgrading", the .NET reader-write lock actually RELEASES the read lock before acquiring the writer lock--meaning that it wasn't really "upgraded" at all, it is EXACTLY the same as a call to ReleaseReaderLock followed by AcquireWriterLock, except that everyone thinks that it switched to a writer lock without releasing the reader lock first. When you call UpgradeToWriterLock in .NET, it is entirely possible that another writer acquired thew lock (and CHANGED the data you were protecting) BEFORE UpgradeToWriterLock returns. If you google for it, you'll see where the MONO guys ran into this and had to fix their implementation to match Microsoft's.
|
|
|
|
 |
|
 |
umm...
looking at his code, it only allows for an upgrade should there be a single reader which would eliminate the deadlock issue...
|
|
|
|
 |
|
 |
Yes, but look at what happens when the calling thread ISN'T the only reader. It keeps going around the loop until it is, or until it acquires the lock (which it NEVER can until it is the only reader).
|
|
|
|
 |
|
 |
Thanks for the comments. I'm testing an implementation that uses TLS and releases the reader lock when escalating from read to write lock. I though about TLS but never having used it before I was worried to open another can of worms.
My system uses so far just one reader/writer lock so the limitation on the number of TLS slots isn't an issue for me.
The execution time is approx. 4800 ms, so there's a 20% gain compared to the non-TLS version. The biggest drain on performance is probably the SetEvent()/WaitForSingleObject() calls that trap into the kernel. During the execution task manager reports that most CPU time is spent in kernel mode (approx 60% kernel and 30-35% user).
class ReadWriteLockTLSRelease {
CRITICAL_SECTION atomizer;
int numReaders;
int numWriters;
HANDLE spinEvent;
static const int WaitSpinTimeout=1000; // If the timeout exipres it's not a big deal, as the ClaimXX() function rechecks if it can claim lock
int writerThreadId;
int tlsSlot;
__forceinline int GetTLSReaderCount()
{
return (int)(INT_PTR)TlsGetValue(tlsSlot);
}
__forceinline void SetTLSReaderCount(int count)
{
TlsSetValue(tlsSlot, (LPVOID)(INT_PTR)count);
}
__forceinline void SpinThreads()
{
PulseEvent(spinEvent); // Unreliable but in case of failure there's always the wait timeout
}
inline bool CheckUpgradingFromReaderToWriter()
{
if(numReaders==0) return false;
int readerCount=GetTLSReaderCount();
if(readerCount>0) {
// exit read lock
SetTLSReaderCount(-readerCount);
numReaders-=readerCount;
return true;
}
return false;
}
inline void CheckRestorePreviousReaderLock()
{
int previous=-GetTLSReaderCount();
if(previous>0) {
SetTLSReaderCount(previous);
numReaders=previous;
}
}
public:
~ReadWriteLockTLSRelease()
{
DeleteCriticalSection(&atomizer);
CloseHandle(spinEvent);
TlsFree(tlsSlot);
}
ReadWriteLockTLSRelease()
{
InitializeCriticalSection(&atomizer);
numReaders=numWriters=0;
spinEvent=CreateEvent(NULL,TRUE,FALSE,NULL);
tlsSlot=TlsAlloc(); // The slot default value is 0 (NULL) for each thread (not clearly documented)
if(tlsSlot==TLS_OUT_OF_INDEXES) throw std::exception("Out of TLS slots");
}
void ClaimReader()
{
int myThreadId=GetCurrentThreadId();
while(true) {
EnterCriticalSection(&atomizer);
if(numWriters>0) {
// Grant read access if thread already has write access
if(myThreadId==writerThreadId) {
LeaveCriticalSection(&atomizer);
return;
}
} else //if(numWriters==0)
break;
LeaveCriticalSection(&atomizer);
WaitForSingleObject(spinEvent, WaitSpinTimeout);
}
_ASSERT(numReaders>=0);
numReaders++;
SetTLSReaderCount(GetTLSReaderCount()+1);
_ASSERT(numWriters==0||(numWriters>0&&writerThreadId==myThreadId));
LeaveCriticalSection(&atomizer);
}
void ReleaseWriter()
{
EnterCriticalSection(&atomizer);
_ASSERT(numWriters>0);
numWriters--;
CheckRestorePreviousReaderLock();
SpinThreads();
LeaveCriticalSection(&atomizer);
}
void ClaimWriter()
{
int myThreadId=GetCurrentThreadId();
while(true) {
EnterCriticalSection(&atomizer);
if(numWriters==1&&myThreadId==writerThreadId)
break; // Reentering write lock
CheckUpgradingFromReaderToWriter();
if(numReaders==0&&numWriters==0)
break;
LeaveCriticalSection(&atomizer);
WaitForSingleObject(spinEvent, WaitSpinTimeout);
}
numWriters++;
_ASSERT(numReaders==0);
writerThreadId=myThreadId;
LeaveCriticalSection(&atomizer);
return;
}
void ReleaseReader()
{
EnterCriticalSection(&atomizer);
int myThreadId=GetCurrentThreadId();
if(numWriters==0) // if numWriters>0 I am also have the writer lock
{
numReaders--;
SetTLSReaderCount(GetTLSReaderCount()-1);
_ASSERT(numReaders>=0);
_ASSERT(numWriters==0);
SpinThreads();
}
LeaveCriticalSection(&atomizer);
}
};
|
|
|
|
 |
|
 |
ok, I still don't see what the problem is... that is infact the desired behaviour. It's a single write, multiple read lock system and as such should wait until all readers have released their locks so the write can complete. This would be the same case in the .NET situation irregardless if it is checking that it is the only reader or no readers.
The check for a single reader simply allows you to not have to release the writers read-lock before it claims the write lock. Irregardles of this optimzation it would still have to claim the writelock which, by definition requires all other read-locks to have been released.
On a side note: The automatic upgrade to a readlock does indeed cause all the difficulties with his multiset and my TLS optimization to the origional code. There really is no need to do auto-upgrades. All the code need do is release the read-lock, claim the write lock and set a flag to indicate that a read-lock should be reaquired upon release of the writelock. This is the optimal solution as it would then allow others in the write-lock queue to get their turn.
|
|
|
|
 |
|
 |
This would work, but is setting a trap for users of the class. The trap is that users are still "holding" a read lock, but the data the lock is protecting can get changed anyway. The code below is an illustration of just such a problem (though it gets FAR more complicated when you have multiple nested function calls). I've used my own reader-writer locks for 10 years now, and the only solution I've come up with is to make it an error to call AcquireWriterLock when a read lock is active on the current thread (ie. throw an exception if this occurs). See the following psuedo-code:
RWLock lock;
int protectedValue;
int protectedValue2;
int function1()
{
ReadLock rl (lock);
if (protectedValue != 10)
{
return 1;
}
function2();
return protectedValue;
}
void function2()
{
WriteLock w1 (lock);
if (++protectedValue2 == 0)
{
++protectedValue;
}
}
The return value from function1 is not necessarily 1 or 10, but you could never know that without looking at the details of function2, which makes the ReadLock totally deceptive--it is only a "lock" as long as no other functions that might get a write lock are called. When you have a system with 15+ levels of function calls in it, that's effectively impossible to determine.
While this illustration isn't particularly useful code, I have run into real scenarios where this is a problem on several occasions.
|
|
|
|
 |
|
 |
Granted.
You could never guarantee that any bit of code is self consistant. Not allowing upgrades would certainly solve the stated problem. However, there are ceratinly a class of programs (state driven rather for example, rather then deeply nested) which may benefit from an upgrade ability (granted, however it shouldn't be any different, and probably more paintainable if an explicit unlock/lock was done, or at a minimum use a function with "upgrade" in it's name).
|
|
|
|
 |
|
 |
I agree, except that "upgrade" is a bad name because it is misleading. That's the term the .NET function uses and the even the internal documentation writer didn't understand it well enough to point out the trap. (The documentation doesn't mention anything about unlocking first, though the sample code would deadlock if it didn't unlock first--not that anyone who didn't fully understand the problem would notice that). The MONO guys also misunderstood what the function did and implemented it incorrectly until a tester pointed out the problem. I suspect that the MS guy who designed the API at first didn't understand the issue and by the time they figured it out, the API had already been released so they couldn't change it.
|
|
|
|
 |
|
 |
I've posted an update to the article that adds a ClaimWriterNoEscalation() method that throws an exception if the thread already has the read lock. ClaimWriterAllowEscalating() instead lets the "implicit" escalation happen and releases the read lock before waiting for the write lock. I'll try using the latter method in my application as it should be fine provided that the thread that gets the write lock knows that some other thread may have gotten hold of the lock while it was waiting for the lock escalation to take place.
|
|
|
|
 |
|
 |
The problem isn't the optimization per se, it's the entire concept of switching a reader lock to a writer lock without first releasing the reader lock. It is theoretically impossible to do such a thing without causing deadlock. Let me illustrate with a code walkthrough:
1. Let's start with threads 1 and 2 both having called ClaimReader.
2. Next, both threads call ClaimWriter.
3. Both threads get their thread id.
4. One of the threads (the winner) wins the race to get the critical section (it doesn't matter since both are exactly the same).
5. The winner thread checks to see if it is the only reader. It isn't, the loser thread is also a reader.
6. The winner thread checks to see if it is already a writer. It isn't.
7. The winner thread checks to see if there are no other active readers or writers. There are. The loser thread is an active reader.
8. The winner thread unlocks the critical section (without having changed the state in any way!).
9. The winner thread waits for the spin event.
10. At this point it is likely that the loser thread acquires the critical section, although it doesn't matter.
11. The new winner goes back to step 5.
There is no way out of the loop. Not only are the threads deadlocked, but they are consuming some CPU going around the loop over and over, which to many developers would make it appear as if deadlock has NOT occured.
|
|
|
|
 |
|
 |
Implementation issues aside there is no general solution that works in all cases and there never will be. The deadlock is avoidable but the irreconcilable contention is not. When I implemented a read/write lock some time ago, I introduced a third type of lock which was an upgradable (I called it "escalatable") read lock. The idea is that the following types of concurrency are supported
1 thread holds write lock
or
0/1 thread holds upgradable read lock, any number of threads hold plain read lock
This is incidentally exactly the way boost::shared_mutex works. The advantage is expensive read/modify/write operations need not block plain readers until they are actually performing the write. At the same time, it is guaranteed that an upgradable read lock can be converted to a write lock atomically (barring starvation), so there is no risk that the expensive operation suddenly fails half-way through because of contention.
My implementation additionally offered upgrade operations for plain read locks, but those could fail. Instead of deadlocking, however, the code would detect this condition and throw an exception in the losing (IOW late) thread. There are some corner cases where the best approach isn't obvious. For example, if a reader (thread A) tries to upgrade its lock, but there is already another thread B with a guaranteed upgradable lock, the function could fail immediately. But maybe thread B is actually going to release the lock without upgrading. In this case, it would have been better to just block A until B releases its lock. The optimum strategy depends on how probable it is for thread B to actually upgrade.
|
|
|
|
 |
|
 |
Ok... if I understand your code, what you really need to know is if the thread requesting the write lock also has a read lock. This will be requested WITHIN the thread requesting the write lock.
So the answer to your problem is actually quite simple and extremly fast.
During object creation just allocate a Thread Local Storage slot (TlsAlloc)and during ClaimReader simply do a TlsSetValue ( m_TlsReadSlot, TlsGetValue( m_TlsReadSlot) + 1 ).
During ReleaseReader do a TlsSetValue ( m_TlsReadSlot, TlsGetValue ( m_TlsReadSlot) - 1 );
Now during the ClaimWriter method you simply need to do an
if ( TlsGetValue( m_TlsReadSlot ) == numReaders )
in place of the ThereIsASingleReaderAndItIsMe() call.
That should be much, much faster then using a multiset (The TLS functions end up being only a few instructions).
steve
|
|
|
|
 |
|
 |
That works if you know you will never have more than a few reader/writer locks, but you can't count on an arbitrary number of TLS slots being available--there is a limit, it isn't very large, and if you are writing this in a DLL, you have no way of knowing how many TLS slots other DLLs in the process have used until you run out.
|
|
|
|
 |
|
 |
Alright sure, it's over 1K on windows NT systems but if you think you need more there's always other ways...
It's certainly possible to have a single TLS value and simply allocate an array per thread containing the maximum number of r/w locks you'll need.... (manage them in any number of ways)... heck, make the array a few million big if you want and just use the vm functions and don't commit the memory.
The speed increase in switching to using TLS data to indicate the threads read state rather then keeping track using a multiset with calls to new/malloc and it's OWN calls to the TLS functions is probably deserving of another rev, even if it's just a special case for people needing only a small number of r/w locks.
|
|
|
|
 |
|
 |
Yes. I believe a proper implementation could rely on getting *one* TLS slot, because that would fail when the DLL is loaded instead of at some random point later. I wouldn't advise making the arrays "a few million big" because you would very quickly run out of virtual address space. Physical memory would be fine, but there IS a limit of 2GB of virtual address space and many systems in use today are already pushing up against that boundary. If you have 50 threads using R/W locks (this is reasonable in many systems--you're using R/W locks so you must have more than 2 threads!), and used 2 million 4-byte values for each thread to track R/W lock access, that's 400MB of virtual address space which is a very significant chunk. A proper implementation would definitely have to be more complicated. If you're not worried about an implementation for general use, this type of simplified system is fine (except for the deadlock problem mentioned in my other thread).
|
|
|
|
 |
|
 |
Hello, one question: How did you create the benchmark results? Does there exist any tool for benchmarking unmanaged C++ programs? Amd Opteron
|
|
|
|
 |
|
 |
The results in the graph are just the execution times of the demo project (click on "Download demo project" at the top of the page). To test the different locks change the typedef at the bottom of ReaderWriterLock.h
Flavio.
|
|
|
|
 |