Click here to Skip to main content
Click here to Skip to main content

Testing reader/writer locks

, 20 Jan 2009 CPOL
Rate this:
Please Sign up or sign in to vote.
This article describes testing the results of reader/writer locks on Windows XP/Vista Uniprocessor and Multi-core PCs.

Introduction

Various homegrown reader/writer lock objects were suggested by a number of developers striving to provide better performance than the existing API constructs could offer. Recently, Microsoft introduced the SRWLOCK, a new built-in reader/writer primitive for Windows Vista.

Existing API synchronization constructs provide simple and fast access to a shared resource. On the other hand, API primitives (for example, a critical section) might not work well in scenarios when high contention on the lock occurs. Lock convoy is not the only problem here. If a writer competes with multiple readers, performance of a particular reader may not be guaranteed, varying in a wide range, to the extent where minimum performance is not acceptable due to the application requirements. As a rule, custom reader/writer lock objects are slower than API constructs. However, the benefit of using custom locks is that a hand-coded object may provide reasonable performance for all reader/writer threads. We will be looking for a solution that compromises between maximum performance of multiple readers and acceptable performance of a writer.

The tests results are summarized in Figure 1 and Figure 6. One writer and four readers were concurrently accessing a shared resource. The writer thread acquired an exclusive lock, filled the shared buffer with random bytes, and calculated the CRC. Each reader thread acquired a shared lock, calculated the CRC, and compared the latter with the CRC saved by the writer (an exception was supposed to be thrown in case of a CRC error, which would indicate incorrectness of the lock implementation). The following reader/writer lock objects were tested: the “CSRW” object was the simplest implementation using the Windows API critical section; the “RWFavorNeither” and “RWLockFavorWriters” objects were taken from the MSDN magazine article “CONCURRENCY: Synchronization Primitives New To Windows Vista” written by Robert Saccone and Alexander Taskov (http://msdn.microsoft.com/en-us/magazine/cc163405.aspx); the “Ruediger_Asche_RWLock” object was taken from the MSDN article “Compound Win32 Synchronization Objects” written by Ruediger R. Asche (http://msdn.microsoft.com/en-us/library/ms810427.aspx). This lock implements the “CRWLock” compound object. The “Jim_B_Robert_Wiener_RWLock” lock object was taken from “Multithreading Applications in Win32: The Complete Guide to Threads” written by Jim Beveridge and Robert Wiener. The new Windows Vista API user-space lock, SRWLOCK, was also tested.

Figure 1. Testing one writer and four readers on a Uniprocessor PC, AMD Sempron(tm) 3000+, 2.00 GHz, 32-bit Windows XP Home Edition SP3, compiled in VC++ 2005 Express. Duration per operation in microseconds and the coefficient of variation, %.
Without locking

reader: t=0.213 (s=0.85%)
writer: t=3.234 (s=0.46%)

Critical section CSRW lock

reader: t1=41586.02 (s1=161.18%) t2=0.974 (s2=40.32%) 
writer: t1=83.78 (s1=202.08%) t2=6.938 (s2=0.69%)

RWFavorNeither lock

reader: t1=1.166 (s1=9.18%) t2=1.09 (s2=7.05%) 
writer: t1=230.71 (s1=118.71%) t2=7.741 (s2=0.86%) 

RWLockFavorWriters lock

reader: t1=1.553 (s1=19.52%) t2=1.431 (s2=10.41%) 
writer: t1=28.7240 (s1=11.4%) t2=8.623 (s2=1.86%)

Jim_B_Robert_Wiener_RWLock

reader: t1=30.202 (s=8.05%) t2=9.973 (s2=1.26%) 
writer: t1=6.303 (s=6.84%) t2=6.105 (s2=2.68%) 

Ruediger_Asche_RWLock

reader: t1=2.605 (s1=6.84%) t2=2.444 (s2=1.53%) 
writer: t1=4484987.15 (s1=76.96%) t2=14.438 (s2=1.22%)

The average duration per read/write operation “t”, “t1”, and “t2” in microseconds and the coefficients of variation “s”, “s1”, and “s2” in % were calculated based on 50 tests; each of them included 1,000,000 and 4,000,000 iterations for a writer and a reader, respectively. Comparison of the “t” value (duration per operation without acquiring locks) with “t1”, and “t2” values (the duration when thread concurrency is present) shows performance degradation due to contention on the locks. The average duration “t2” was measured from the beginning to the end of the working threads. After all the threads were done with the given number of iterations, the overall duration per read/write operations was calculated. The following pseudo-code illustrates calculating “t2”:

Figure 2. Calculating average duration per operation.
worker_thread_procedure()
{
   ...
   worker.m_perf_counter.start();
   for (int i = 0; i < test_iteration; i++)
   {
      if( false == worker.simulate_work() )
      {
         ::RaiseException( STATUS_NONCONTINUABLE_EXCEPTION, 0, 0, 0);
      }   
      else
      {
      }
   }
   worker.m_perf_counter.end();
   worker.m_perf_counter.set_iteration_done(test_iteration);
}

The duration “t2” in Figure 1 shows that there is not much difference in the locks' performance. Although using “t2” as criteria may appear simple and reasonable at first glance, it has several problems. If many reader threads are holding a shared lock, a writer can become completely locked from acquiring an exclusive lock due to the race condition. Being almost blocked and therefore having poor performance in the beginning, the writer may show “fine” overall performance at the end of the test after the other reader threads have completed the given “test_iteration”. The “Ruediger_Asche_RWLock” object is an example of a lock that behaves this way. As one of four readers has done all the 4,000,000 iterations, the writer may complete only 1-7 iterations (from 1,000,000), see Figure 3:

Figure 3. Individual results of testing the “Ruediger_Asche_RWLock” object. The number of iterations done and the duration per operation in milliseconds.
reader thread 1: n=3999973 (t=0.0025) writer thread: n=7 (t=1433.846) 
reader thread 2: n=4000000 (t=0.0025) 
reader thread 3: n=3999993 (t=0.0025) 
reader thread 4: n=3999901 (t=0.0025)

When all the readers have finished, the writer unlocks and quickly finishes the remaining 999,993 iterations. Although, the “overall” performance of the “Ruediger_Asche_RWLock” object may appear acceptable (t2=14.438 microseconds per write operation), it does not. Take a look at the duration t1=4484987.15 microseconds per writing operation in Figure 1. The estimation “t1” is similar to “t2”, but it is calculated during a short period of time at the beginning of the test, when high-contention on the shared/exclusive locks exists. The pseudo-code below illustrates calculating the duration “t1”:

Figure 4. Calculating average duration per operation when all the threads work concurrently.
worker_thread_procedure()
{
   ...
   worker.m_perf_counter.start();
   int i;
   for (i = 0; i < test_iteration; i++)
   {
      if( false == worker.simulate_work() )
      {
         ::RaiseException( STATUS_NONCONTINUABLE_EXCEPTION, 0, 0, 0);
      }
      else
      {
      }
      if(TRUE == g_stop_test)
      {
         break;
      }
      else
      {
      }
   }
   worker.m_perf_counter.end();
   ::InterlockedExchange(&g_stop_test, TRUE);
   worker.m_perf_counter.set_iteration_done(i);
}

“t1” and “s1” show poor performance of the critical section object “CSRW” on Windows XP. Several reader threads can be almost “blocked” under high concurrency condition. While one reader thread has completed 4,000,000 iterations, the others may complete 6-48 iterations only. Figure 5 illustrates deviation in the “CSRW” performance within individual threads and tests:

Figure 5. Individual test results when testing the “CSRW” object. The number of iterations done and the duration per operation in milliseconds.
...
test1:

reader thread 1: n=138302 (t=0.013412) writer thread: n= 4151 (t=0.449985) 
reader thread 2: n=4000000 (t=0.000464) 
reader thread 3: n=69812 (t=0.026571) 
reader thread 4: n=3513004 (t=0.000492)

test2:

reader thread 1: n=30 (t=64.038158) writer thread: n= 294219 (t=0.006664)
reader thread 2: n=30 (t=64.035178) 
reader thread 3: n=30 (t=64.032356) 
reader thread 4: n=4000000 (t=0.000480) 
...

Although, some reader threads are very fast (0.464-0.492 microseconds), the average reading cost, however, is extremely high (t1= 41586.02 microseconds). The high value of the coefficient of variation s1=161.18% indicates that performance of the individual reader threads varies in a large extent. It is difficult for the CSRW to guarantee performance of all the readers. When using a new superior Windows Vista API SRWLOCK object, high contention may have a very negative impact on performance. If one writer competes with many readers running on a Multi-core PC, it can be quite difficult for the writer to acquire an exclusive lock. Figure 6 shows poor SRWLOCK writer performance, t1=186.171 microseconds, with a quite high coefficient of variation, s1=25.17%, on a Quad-core PC. A writer could only complete 5,000-30,000 iterations (from 1,000,000), while readers had almost finished all the 4,000,000 iterations. When changing the number of concurrent reader threads, the SRWLOCK writer performance may vary in a great extend. If the number of readers increases from 2 to 4, most of the lock objects show performance regression from 40% to 100%, but the SRWLOCK does not. The SRWLOCK writer performance degrades form 7.1 to 186.2 microseconds (~26 times).

Figure 6. Testing one writer and four readers on a Quad-CPU PC, Intel Q6700, 2.66 GHz, Hyper-Threading disabled, 64-bit Windows Vista Home Premium SP1, compiled in VC++ 2008 Express. Duration per operation in microseconds and the coefficient of variation, %.
Without locking

reader: t=0.106 (s=1.45%)
writer: t=1.416 (s=0.79%)

Critical section CSRW lock

reader: t1=19.337 (s1=48.99%) t2=1.322 (s2=5.65%) 
writer: t1=2.018 (s1=3.17%) t2=2.194 (s2=4.71%) 

Windows Vista SRWLOCK lock

reader: t1=0.991 (s1=11.77%) t2=0.881 (s2=8.2%) 
writer: t1=186.171 (s1=25.17%) t2=5.204 (s2=1.31%) 

RWFavorNeither lock

reader: t1=1.274 (s1=6.59%) t2=1.381 (s2=3.07%) 
writer: t1=72.13 (s1=5.31%) t2=6.981 (s2=1.64%) 

RWLockFavorWriters lock

reader: t1=6.311 (s1=4.83%) t2=5.511 (s2=3.48%) 
writer: t1=21.948 (s1=4.34%) t2=21.127 (s2=5.06%) 

Jim_B_Robert_Wiener_RWLock

reader: t1=61.688 (s1=1.03%) t2=59.166 (s2=0.59%) 
writer: t1=84.911 (s1=9.15%) t2=85.005 (s2=9.6%) 

Ruediger_Asche_RWLock

reader: t1=1.755 (s1=11.69%) t2=1.803 (s2=7.95%) 
writer: t1=2107697.58 (s1=108.77%) t2=10.607 (s2=5.17%)

The decision on which reader/writer lock is better to employ depends on the hardware and software architecture and a particular application scenario. However, the test results show that a likely candidate might be the “RWLockFavorWriters” object. The cost of locks is relatively low (1.6 - 6.3 and 22 - 29 microseconds for reader/writer, respectively). Unlike the others, the “RWLockFavorWriters” object performs consistently on different hardware/software platforms and test scenarios. If a writer competes with two readers (in place of four readers in Figure 1 and Figure 6), the readers and the writer perform ~30-60 % faster on both Uniprocessor and Quad-core PCs. The coefficient of variation stays relatively low (the same improving performance of the “RWFavorNeither” object will lead to increasing the coefficient of variation up to 60-70%). On the other hand, depending on the number of concurrent reader threads, the kernel CPU loading can be high ~10-40%; a large number of thread context switching per second ~150000-200000 can cause additional overheads. As against to custom locks, the new user-space Windows slim lock performs very well, loading kernel CPU ~0%, and generating a small number of thread context switching per second ~10-1000. The SRWLOCK object seems to be the best reader/writer lock to employ, if the writer performance is acceptable, and solution scalability is considered with respect to the increasing number of reader threads; as the number of reader threads increases, high contention on the lock may have a negative impact on the SRWLOCK writer performance.

Conclusion

  • It might be a disadvantage to use the Windows API critical section lock CSRW as a reader/writer lock in scenarios with multiple readers and under the condition when high contention on the lock occurs.
  • Windows Vista SRWLOCK seems to be the best object to employ, if potential regression of the writer performance along with increasing number of readers is acceptable.
  • The “RWLockFavorWriters” lock object is a high-performance, robust implementation of a reader/writer lock object in a wide range of application scenarios, performing consistently on both Uniprocessor and Multi-core platforms, and giving reasonable parity of reader/writer performance.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Valery Grebnev
Software Developer
Canada Canada
No Biography provided

Comments and Discussions

 
GeneralRuediger_Asche_RWLock has a bug Pinmemberadisakp19-Feb-09 11:42 
GeneralRe: Ruediger_Asche_RWLock has a bug Pinmemberadisakp19-Feb-09 12:13 
GeneralRe: Ruediger_Asche_RWLock has a bug PinmemberValery Grebnev19-Feb-09 15:39 
GeneralRe: Ruediger_Asche_RWLock has a bug Pinmemberadisakp19-Feb-09 16:40 
GeneralRe: Ruediger_Asche_RWLock has a bug Pinmemberadisakp20-Feb-09 8:51 
GeneralRe: Ruediger_Asche_RWLock has a bug PinmemberValery Grebnev20-Feb-09 14:00 
GeneralRe: Ruediger_Asche_RWLock has a bug Pinmemberadisakp12-Mar-09 8:34 
GeneralRe: Ruediger_Asche_RWLock has a bug PinmemberValery Grebnev17-Mar-09 17:39 
GeneralRe: Ruediger_Asche_RWLock has a bug Pinmemberadisakp17-Mar-09 21:46 
GeneralRe: Ruediger_Asche_RWLock has a bug PinmemberValery Grebnev18-Mar-09 15:43 
GeneralRe: Ruediger_Asche_RWLock has a bug Pinmemberadisakp23-Dec-09 16:34 
GeneralRe: Ruediger_Asche_RWLock has a bug PinmemberValery Grebnev24-Dec-09 9:59 
GeneralPut the results into graphs PinmemberWong Shao Voon8-Feb-09 20:50 
GeneralRe: Put the results into graphs PinmemberValery Grebnev9-Feb-09 14:08 
GeneralOne Question to locking PinmemberKarstenK21-Jan-09 21:52 
GeneralRe: One Question to locking PinmvpLuc Pattyn22-Jan-09 9:09 
GeneralRe: One Question to locking PinmemberValery Grebnev22-Jan-09 16:07 
KarstenK wrote:
i have to write a logfile which i may* in access from different threads. I have locked the writing via a critical section. Is this the best solution or would you prefer another one?
 
*it isnt the standard case, but it happens (and crashed)

 
The article was about some lock objects when high contention presents (for instance, when multiple threads issue more than 30-60 reads/writes requests per millisecond). If there is no frequent reads/writes, the critical section API primitive may be the best (simple and fast) object to employ. I would say the same about your “logger” application – if you are infrequently logging messages, say once per 10-100 millisecond, the critical section would be Ok. If we need more performance (or superior performance), we may consider decreasing contention on the locks or a lock-free solution using quite standard and simple techniques:
 
- Queue all read/write requests to a dedicated IO thread.
- If performance is still not good, we may try queue of ring buffers, or double buffering technique using ring-buffers. A pool of working IO threads might be used.
- If performance is still not enough we might try lock free solution for multiple threads using file mapping into shared memory (there is no dedicated IO thread or pool of threads needed in this case).
 
As a alternative, you may try some non standard solutions, for example, Oracle Berkeley DB (which is free and allow you to easily issue more than 30 concurrent IO requests per millisecond).
GeneralRe: One Question to locking PinmemberKarstenK22-Jan-09 20:40 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.141015.1 | Last Updated 21 Jan 2009
Article Copyright 2009 by Valery Grebnev
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid