 |
|
 |
Hi, I get this error message when I try and compile this thread code. I wanted to test it out and see how it worked but i can't get it to compile.
XYCriticalSection.cpp:7: error: cannot convert 'void**' to 'long int*' for argument '1' to 'LONG InterlockedCompareExchange(long int*, long int, long int)'
this is the error I get,
Cupcakes make me smile
|
|
|
|
 |
|
|
 |
|
 |
Greetings,
Thanks for an excellent article on creating thread-safe programs. I am in the process of converting a large, existing code-base to be thread-safe.
Right now, my issue is dealing with a large volume of global and static variables. Do you have any hints or tips on how to deal with these so that they are thread-safe? I've looked at TLS (Thread Local Storage) in windows, but it only seems to work with simple data types.
All help is appreciated.
-Dharmesh
|
|
|
|
 |
|
 |
I am not familiar with TLS so can't give you any help on that.
To make the global and static variables thread-safe, the straight forword way is associate each variable with a lock (critical section). Always set the lock before accessing the variable and release the lock after you are done. However, things can be more complicated than this, the following are just some common sense when programming in a multithreaded environment:
(a) You threads may have to access several global resources at once. You have to watch for deadlock in this case. If you arrange all the global resources in a certain order and always lock resources in the same order in your code (always lock resource A before locking resource B, for example), you can prevent deadlock. Avoid nested critical sections if possible.
(b) Besides global and static variables, you have to make other shared resources thread-safe, this includes files and db connections, etc.
(c) You don't need to protect those resources that will never be accessed by multiple threads simultaneously.
(d) Be very careful with error handling. For example, you have code that locks and releases a resource, but an error within the critical section could cause the code to skip the line that releases the resource. This will make all your other threads block. You need to catch and handle all errors occurred within the critical section to make sure the locked resource is released propertly.
Hope this helps.
My articles and software tools
|
|
|
|
 |
|
 |
I need some help making a program that will change seconds into hours, min and second. like 12500 = 3 hours 28 min and 20 second help
|
|
|
|
 |
|
 |
int nTotalSeconds = 12500;
int nHours = nTotalSeconds/3600; // = 3
int nMinutes = (nTotalSeconds%3600)/60; // = 28
int nSeconds = nTotalSeconds%60; // = 20
My articles and software tools
|
|
|
|
 |
|
 |
a question to me:
how can i face "m_cs" in "Class SafeObj" when making a "copy construct" function.
Should i do the same as that?:
SafeObj::SafeObj(const SafeObj& obj)
{
...
m_cs = obj.m_cs;
...
}
thank your code!
|
|
|
|
 |
|
 |
I think the approaches being used to compare this spinlock against a critical section are a bit flawed.
First, you should test how fast each can lock and unlock the lock without and any chance of a contention. Thus, just test the two in a tight loop in the main thread. Maybe 1,000,000 iterations would be enough.
Then, you should look at how each performs when it is unable to get the lock first try.
With the critical section, you are looking at a mostly constant 600 instructions as per the MSJ information. When the lock becomes available you get immediate placement of the thread on the queue for scheduling. Critical sections use a synchronization event that will only wake one waiting thread at a time, thus no wasted CPU on a race condition when the event is set and more than one thread is waiting. This event isn't created unless a thread has to wait on the critical section.
With the spinlock, it takes an unknown amount of CPU time to place the thread in a timer state. Then every 50ms the thread will be scheduled but getting the lock isn't guaranteed. That is just wasted CPU time. What is worse is that it forces context switches to test the availability of the lock. The more threads you have waiting for the lock, the more interruptions to the locking thread thus extending the amount of time the thread has the lock. When the spinlock is released, there is an average of a 25ms dead time where the lock is available, threads are waiting, but no thread takes advantage of it. Since most spinlocks or critical sections are used for short term locks, the average delay is probably closer to 50ms.
So, when you compare the two you get:
1. Where there is no lock contention, both systems perform about the same.
2. Where there is contention:
2.a. The critical sections has no extra overhead other than the event and any transient kernel structure required to place the thread in a waiting state. The critical section wakes a thread only when there is a very high chance the lock will be available. When a lock is made available, a thread is immediately scheduled to acquire the lock.
2.b. The spinlock only has the extra overhead of the kernel timer for the sleep. Threads are periodically wakened, thus forcing a context switch with no reasonable guarantee that the lock will be available. When a lock is made available, no thread is scheduled to acquire the lock. This effectively makes the lock unavailable for an average of 25ms unless another thread requests the lock between the time it is released and a waiting thread wakes to test the lock.
The big problem here is that you are looking at CPU time which is totally flawed (but I have yet to see any real solid proof that a critical section takes more CPU time.) When you look at CPU utilization, the spinlock system is inferior due to the extended waits when there is a contention.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
I have done some of the tests you suggested but still can't come to a clear conclusion.
If there is no contention, my spin lock just does an InterlockedCompareExchange which is reasanbly fast (equal or better than EnterCriticalSection).
When there is extremely large number of threads contending for the same lock, despite all the theory and background provided by you and others (thanks, btw), I still don't see a clear disadvantage of my spin lock (in my test results). I think I know why but haven't really verify it:
Let's say there are 2000 threads waiting to acqure the spin lock. You would believe that every 50 ms the CPU has to do something for each of these threads therefore wasting a high percentage of cycles. But this is probably not the case: if the CPU is too busy, it won't switch to each thread to execute the Sleep(50) statement (at least not every 50 ms) therefore it won't waste a lot of cycles on these threads.
This discussion has gone far beyond my knowledge of OS and C++, so I better stop here.
|
|
|
|
 |
|
 |
Ok, I think I finally have come up with a test I like. This test doesn't test how much CPU time it takes. To be perfectly honest, that is very hard to test. What this test does it see how long it takes for n threads to lock the lock m times while holding the lock for s milliseconds during each lock.
Here is the code:
#include <stdio.h>
#include <process.h>
#include "XYLock.h"
#define MAX_THREADS 10
#define MAX_LOCKS 100
#define LOCK_WAIT 45
#define USE_CS 0
CRITICAL_SECTION cs;
XYCriticalSection myCriticalSection;
unsigned __stdcall ThreadProc( void *dummy )
{
for (int i = 0; i < MAX_LOCKS; i++)
{
#if USE_CS
::EnterCriticalSection (&cs);
#else
XYLock myLock(&myCriticalSection);
#endif
::Sleep (LOCK_WAIT);
#if USE_CS
::LeaveCriticalSection (&cs);
#endif
}
return 0;
}
void main()
{
HANDLE ahHandles [MAX_THREADS];
::InitializeCriticalSection (&cs);
for(int i=0;i<MAX_THREADS;i++)
{
ahHandles [i] = (HANDLE) _beginthreadex (NULL, 0, ThreadProc, NULL, CREATE_SUSPENDED, NULL);
if ((int) ahHandles [i] == -1)
{
printf("Failed to create a thread\n");
return;
}
}
DWORD dwStart = ::GetTickCount ();
for (i = 0; i < MAX_THREADS; i++)
{
::ResumeThread (ahHandles [i]);
}
for (i = 0; i < MAX_THREADS; i++)
{
::WaitForSingleObject (ahHandles [i], INFINITE);
}
DWORD dwEnd = ::GetTickCount ();
printf ("%d\r\n", dwEnd - dwStart);
}
Results:
Test 1, 100 threads, 100 locks each, 10 ms wait
Spinlock: 103449ms
CriticalSection: 100144ms
Test 2, 10 threads, 100 locks each, 100 ms wait
Spinlock: 100194ms
CriticalSection: 100144ms
Test 3, 10 threads, 100 locks each, 45 ms wait
Spinlock: 50072ms
CriticalSection: 500072ms
In these three test cases the spinlock is either slower or just as fast as the critical section. The reason being that the spinlock wastes time in the sleep while the critical section wastes no time between an unlock and a new lock.
Please don't take this as trashing the article. The only thing I am taking issue with is that the spinlock is faster than the critical section. Other than that I think it is informative to see a spinlock implementation.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
Ok, this is a result I was not expecting at ALL. My best guess was that your spinlock would work well when the lock wasn't being held for long at all, so I ran the following test.
Test 4, 100 threads, 100 locks each, 0 ms wait
Spinlock: 4957ms
CriticalSection: 81ms
I guess what this shows is that even with just 100 threads and the lock not being held for any length of time, there are a fair number of contentions. Looking at the data, maybe 1 in 100.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
Thanks for providing the test result. I used your code unmodified on my workstation (NT 4.0, single CPU, 350 mhz), here is the comparison (the numbers in parentheses are mine):
Results:
Test 1, 100 threads, 100 locks each, 10 ms wait
Spinlock: 103449ms (102417ms)
CriticalSection: 100144ms (105101ms)
Test 2, 10 threads, 100 locks each, 100 ms wait
Spinlock: 100194ms (100194ms)
CriticalSection: 100144ms (100194ms)
Test 3, 10 threads, 100 locks each, 45 ms wait
Spinlock: 50072ms (50072ms)
CriticalSection: 500072ms (50072ms)
Test 4, 100 threads, 100 locks each, 0 ms wait
Spinlock: 4957ms (4957ms)
CriticalSection: 81ms (2503ms to 4968ms)
Except "Test 1" and "Test 4", our results are identical. But I wouldn't conclude that spin lock is slightly faster in test case 1, it is definitely slower in test case 4.
I think this kind of test is also flawed and unrealistic just like my previous tests. For example, in test case 1, if you watch the thread count in the task manager, you will see that the total number of threads for this process goes down by 1 every second. It takes almost exactly 1 second to execute the whole loop in the "ThreadProc" function (100*10 ms = 1 second). That means, the code was executed by the CPU almost sequentially (first thread 1, then thread 2, then thread 3, ... etc.)!
P.S.
Tim wrote:
"Ok, I think I finally have come up with a test I like."
Tell me, how many test results you throw away before finding the one you like? Sorry, I am just kidding. I really appreciate your effort and I learned a lot from your posts, not limited to the ones for this article.
|
|
|
|
 |
|
 |
I think this kind of test is also flawed and unrealistic just like my previous tests. For example, in test case 1, if you watch the thread count in the task manager, you will see that the total number of threads for this process goes down by 1 every second. It takes almost exactly 1 second to execute the whole loop in the "ThreadProc" function (100*10 ms = 1 second). That means, the code was executed by the CPU almost sequentially (first thread 1, then thread 2, then thread 3, ... etc.)!
Well, I wouldn't say the test was flawed then, I would say the spinlock is flawed. In the case of CPU bound processes, the spinlock fails where the critical section works just fine. In the case of the critical section, the thread count remained 100 until the very end.
Tell me, how many test results you throw away before finding the one you like?
None. The test was specifically designed to expose the problems of the spinlock (poor utilization of an unlocked lock while threads are waiting). In fact, there is no way the spinlock could ever perform better than a critical section with this test.
But, this test models real world reasonably well. It tests how long it takes for n threads to do m units of work. About the only thing you can add is another wait after the lock is unlocked. Maybe then the spinlock would reliably operate as fast as the critical section. But the real problem is that the spinlock isn't a good all around solution. It fails with CPU bound threads. But even with non CPU bound threads, the spinlock can never perform better than the critical section.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
"In the case of the critical section, the thread count remained 100 until the very end."
This is definitely NOT what I have observed: In Test 1 the thread count goes down by 1 every second, the behavior is very consistent. If you are talking about Test 4 then you can't really see anything from the task manager because it happens so fast.
"About the only thing you can add is another wait after the lock is unlocked. Maybe then the spinlock would reliably operate as fast as the critical section. But the real problem is that the spinlock isn't a good all around solution."
I did exactly that, the spin lock performs slightly better but the difference is so small (less then 2%) that I don't think it is fair to reach any conclusion from it.
Even if we look at your result only, it is not conclusive at all. I can easily run the same code to get opposite result (while spin lock is slightly better) in Test 1, 2, 3 (I can't do that for Test4, I think it is too extreme anyway ).
Thanks for the enlightening discussions.
|
|
|
|
 |
|
 |
I don't know what to tell you. Every system I have tried the code on show that the critical section is always faster and fair to the threads. It makes me wonder what OS you are using or if it has been damaged or strangely configured.
I also noticed that the spinlock version cause significantly more context switches than the critical section version.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
I don't think you can make the conclusion you did about sequential execution. Calls to Sleep() force a context switch (i.e. you're nearly gauranteed another thread will be given CPU time), which means that there's definately not a sequential ordering here. You can actually easily prove this by putting in a printf command that prints the thread id in the loop.
William E. Kempf
|
|
|
|
 |
|
 |
"You can actually easily prove this by putting in a printf command that prints the thread id in the loop."
Thanks for the suggestion. I added a printf statement to output the current thread id, it proved my conclusion!
The problem is, the call to Sleep() is within the critical section so other threads can't get the lock anyway. Even if I put a call to Sleep(0) outside the critical section, things are not changing too much (still sequential most of the times). To make a real difference, I have to increase the sleep time.
|
|
|
|
 |
|
 |
The bigger problem than spinning or CPU time is that once threads request the lock, they aren't ever guaranteed to receive it.
Example:
Thread A takes the lock and enters the critical section. Thread B tries to enter the critical section, but fails, so sleeps for 50ms. During that time, thread A releases the lock, loops around, requests, and re-acquires it. Thread B wakes up, checks the lock, and goes back to sleep. If this continues, Thread B will rarely (or never) gain access to the critical section (starve).
Using kernel synchronization routines prevents this from happening - but it's of course not impossible to accomplish in software.
Good try, though! I appreciate your desire to avoid consuming kernel resources. Although, I think that Windows critical sections store their data in the local address space, so they don't consume much kernel memory anyway.
- James
|
|
|
|
 |
|
|
 |
|
 |
Don't take it too badly, though - the first time I tried to code something 'thread safe', I used this:
void CriticalFunction()
{
static bool IsBusy = false;
while(IsBusy);
IsBusy = true;
...
IsBusy = false;
}
and then wondered why it didn't work.
- James
|
|
|
|
 |
|
 |
Doh, I just posted a message pointing this out too.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
Being a man known as too "argumentative" and "defensive", I did the following tests to justify my reputation.
First I modified my test program and increased the thread count to 2000 (which is the most I can create on my NT 4.0 workstation), the CPU usage jumped to 100% as I expected.
Then I realized that it's the "printf" statement that was burning all the CPU. So I commented out the "printf" statement and retested the program: the CPU usage was down to about 50%, except in the very beginning when all the threads were created.
Finally, I changed the code again to use the system functions (EnterCriticalSection, LeaveCriticalSection, etc.) To my surprise, the CPU usage jumped to 100% again.
With my implementation, the CPU usage is about 50%, but NEVER 100%. With the system functions, the CPU usage is ALWAYS 100%!
So I don't care about what others say, I am going to trust my own eyes on this one.
Here are the exact code used in the tests.
/********** test 1 ************/
#include < stdio.h >
#include < process.h >
#include "XYLock.h"
XYCriticalSection myCriticalSection;
void ThreadProc( void *dummy )
{
while(true)
{
{
XYLock myLock(&myCriticalSection);
try
{
throw "a test";
}
catch(...)
{
}
}
::Sleep(10);
}
}
void main()
{
for(int i=0;i<2000;i++)
{
if(_beginthread(ThreadProc,0,NULL )==-1)
{
return;
}
}
while(true)
{
{
XYLock myLock(&myCriticalSection);
try
{
throw "a test";
}
catch(...)
{
}
}
::Sleep(10);
}
}
/************ test 2 ****************/
#include < stdio.h >
#include < process.h >
#include < windows.h >
CRITICAL_SECTION cs;
void ThreadProc( void *dummy )
{
while(true)
{
{
EnterCriticalSection(&cs);
try
{
throw "a test";
}
catch(...)
{
}
LeaveCriticalSection(&cs);
}
::Sleep(10);
}
}
void main()
{
InitializeCriticalSection(&cs);
for(int i=0;i<2000;i++)
{
if(_beginthread(ThreadProc,0,NULL )==-1)
{
return;
}
}
while(true)
{
{
EnterCriticalSection(&cs);
try
{
throw "a test";
}
catch(...)
{
}
LeaveCriticalSection(&cs);
}
::Sleep(10);
}
DeleteCriticalSection(&cs);
}
/*************** end of code ****************/
|
|
|
|
 |
|
 |
I just did your test and found no differences between the two.
Given that threads are excessively delayed by 50ms in your spinlock, it degrades the overall application performance. Also, the 50ms delay could starve a thread of CPU time. Threads should be serviced on a first come first serve basis. In theory, a thread might never get the lock if the lock is being heavily used and is never available when the lock is released by another thread. Of course, this is a known problem with spinlocks. However, spinlocks are primarily designed for a specific application where the lock isn't going to be held for long and thus having multiple people waiting on the lock is rare.
Also, ::Sleep requires system resources. Since NT has origins in VMS, it would be a safe bet that sleeps are just implemented as kernel timer objects. These require system resources. Critical sections could very easily be implemented so that they do not use system resources unless a thread is waiting. They could also support first come first serve lock grants. They would also lack any arbitrary delay in scheduling a new thread of execution once the lock is released.
Now I am not saying your spinlock is bad. A spinlock is a spinlock is a spinlock. But I have yet to see any proof that your spinlock is better. It doesn't seem to be faster. An the 50ms delay is a killer.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
 |
|
 |
Tim wrote:
"I just did your test and found no differences between the two."
Are you sure about that? Please note that you have to test using the code in my comment, not the code in the downloaded zip file (it has too much "sleep" to erase the difference). On my machine the testing results are very consistent: my method uses about 50% CPU while using the system functions will always consume 100%.
Tim wrote:
"And the 50ms delay is a killer."
The 50ms delay happens only when the critical section is already in use and you can even change the constructor to pass in the value of your choice.
For a "light weight" solution, I think my method is ok most of times. But you made good points about "first come first serve".
|
|
|
|
 |
|
 |
Actually, FIFO isn't always the best behavior for locking, though it does gaurantee fairness. However, FIFO isn't so easy to implement if you keep thread priorities in mind. Long and short of it is that these things are hard to implement correctly and most of us are better off leaving the implementation to the system developers.
As for your 50% vs. 100% findings... it's easy to explain and actually points out design flaws in your implementation, believe it or not. The critical section is a unique beast that's a combination of a classic spinlock (basically your design but with out any sleep) and a more robust system object such as the MUTEX. EnterCriticalSection() first spins on the lock for a given number of iterations (you can actually control this with InitializeCriticalSectionAndSpinCount() to some extent, though the count is forced to 0 on single processor machines, which is a good thing) and then waits on the system object. The spin allows for fast acquisition if no one currently owns the lock, while the wait is performed in a manner gauranteed to be both efficient (as in no CPU time is wasted) and correct. The reason that the CriticalSection test jumps to 100% utilization is that the period of time in which the lock is held is small enough that many of these threads will be in the spin section at the same time (which takes CPU). In your implementation these same threads are much more likely to be held in the Sleep(), where no CPU is being used. However, they have to try to acquire the lock again when they wake up, so in fact the CriticalSection performs faster then your implementation despite the fact that your exagerated test makes it appear that too much CPU is used.
A spin lock is good when there are few threads that will fight for the lock and each will retain it for a very short period of time. A CriticalSection is good when there's a moderate number of threads that will fight for the lock, with no issues with the length of ownership. A MUTEX is good when there will be a large number of threads that will fight for the lock (like in your test case). Note that there are other times when a MUTEX would be preferred... such as when you want to share the synchronization object across process boundaries.
William E. Kempf
|
|
|
|
 |
|