|
Introduction
Deadlocks are common problems in multi-threaded programming. When it comes to multithreading development, the most common problem developers are facing is critical sections. It is not uncommon to use more than a single lock, but if one does not pay attention to the order of the locks, or to the context in which they are being called (e.g., from within a callback), deadlocks will form. (There are many reasons for deadlocks to occur other than the obvious critical section, e.g., two threads that are waiting for each other to signal an event, but we will not discuss them here).
As with anything that is related to threads, timing is everything. The most problematic deadlocks are those which occur rarely, they have this amazing nature of occurring at your client's site...
What if we could make the rare case the normal case? Recall that the reason a deadlock does not occur has to do with the fact that two threads that might deadlock happen not to be at the problematic places at the same time. So, all we have to do is record their "visits" at the problematic places, then we need to verify that the locks order is always the same, and if not, output the stack trace and notify the developer that we found a mismatch in the locks order.
The attached ZIP file contains a DLL that does exactly that. The DLL hacks all common (Enter, Exit, TryEnter methods, but it can be extended easily to support others, if used) monitor calls (including, of course, the .NET lock keyword) and keeps track of the locks order. Once it finds a problem in the order, it creates two stack traces, and directs you to a sample of the problematic locks (after fixing the error, repeat the test and see if there is another problem with the detected locks in another flow).
Note that there is no need for the deadlock to really occur; rather, it is only important that suspected flows (or all flows) will be performed at least once.
Using the code
- Add the file incslock.cs to your project
- Add a reference to the slockimp.dll
- Compile your component and execute it
Analyzing the stacks (slockimp based)
- Once a problem is detected, the console (if one exists) will output the last lock conflicts with the nth of the stack
- Two files are being created in the working directory: first_xxx.txt and now_yyy (xxx and yyy represent numbers)
- Go to the "now" file – find the last lock (prior to the last four calls that are inner to the DLL)
This is the lock that when locked caused the problem
- In order to find the other problematic lock, you can either:
- Spot the nth lock from the beginning of the stack (not counting locks that were recursively locked, and locks that were locked and released)
- Find the last lock in the "first" file
- Go over the "first" file stack, and find the place in which the lock from 3.a was locked
This is the second version of the implementation, which now supports more complex scenarios like the dining philosophers (thanks to Sergey's question below). The stack files are numbered as follows: 0_xxx.txt, 1_xxx.txt, and so on...
0_xxx points to the lock that when locked caused the problem. Other files point to other locks that created a kind of circular waiting.
Points of interest
Notice that you do not need to change a single existing line of code. Rather, you add two files to your project. The trick here is to cause the compiler to use our reference for implementing monitor calls rather then .NET's. (In the DLL itself, the locks are being locked and released properly, so your program should work fine critical-section wise). This trick is similar to replacing a header file in C.
Notice that the DLL is not meant for production, since it affects performance. Also, notice that the DLL allows recursive locks of the same lock. The DLL, however, will notify about possible deadlocks even if the waiting time is not infinite. (Despite the fact this is a false alarm, it indicates bad behavior, since such a behavior might influence performance and will deadlock if the time would be set to infinite).
| You must Sign In to use this message board. |
|
| | Msgs 1 to 23 of 23 (Total in Forum: 23) (Refresh) | FirstPrevNext |
|
|
 |
|
|
 |
|
|
 |
|
|
Sorry, as I did suspect, you slockimp2 code does not detect indirect deadlock of dining philosophers, either. This is too bad, really, that I cannot see your source code, but -- frankly -- I simply don't know good criteria to detect this condition in any essentially universal way.
Please see my sample code below. I tried to make it as simple as I can. It can work as a console application just one source file.
First, I made sure I'm using your monitor, not the default System.Threading.Monitor, so I renamed it to Debug.Threading (included in my sample below). I also used slockimp.imp directly (2 lines below, commented out). (By the way your trick with replacement of the class using same namespace does not always work: in my VS2005 solution make issues a warning with notification saying the default System.Threading.Monitor was used, so I renamed the namespace.)
I used only two philosophers, which is enough to show the issue, and a simplest form of the Fork code. Anyway, the deadlock is clearly demonstrated, as well as the inability of your slockimp2 to give it any help.
Could you please analyze this with my sample and give us your ideas? Thank you.
 ////////////// CODE SAMPLE
using System;
namespace Debug.Threading { using System.Threading; class Monitor { public static void Enter(object obj) { slockimp.imp.DoLock(obj); } //Enter public static void Exit(object obj) { slockimp.imp.DoUnlock(obj); } //Exit public static bool TryEnter(object obj) { return slockimp.imp.DoTryEnter(obj); } //TryEnter public static bool TryEnter(object obj, int millisecondsTimeout) { return slockimp.imp.DoTryEnter(obj, millisecondsTimeout); } //TryEnter public static bool TryEnter(object obj, TimeSpan timeout) { return slockimp.imp.DoTryEnter(obj, timeout); } //TryEnter } //Monitor } //namespace Debug.Threading
namespace DiningPhilosophers { using System.Text; using System.Threading;
internal class Player { internal Player(string name) { this.Name = name; } internal void ShowNamedMessage(string format) { Console.WriteLine(string.Format(format, this.Name)); } //ShowNamedMessage string Name; } //Player
internal class Fork : Player { internal Fork(string name) : base(name) {} internal void PickUp() { Debug.Threading.Monitor.Enter(this); //slockimp.imp.DoLock(this); ShowNamedMessage("Fork {0} is taken"); Thread.Sleep(1000); //SA!!! this is done to ensure deadlock in a very first iteration } //PickUp internal void PutDown() { Debug.Threading.Monitor.Exit(this); //slockimp.imp.DoUnlock(this); ShowNamedMessage("Fork {0} is put down"); } //PutDown } //class Fork
internal class Philosopher : Player { internal Philosopher(string name, Fork left, Fork right) : base(name) { this.Left = left; this.Right = right; Thread = new Thread(delegate() { while (true) { Think(); Left.PickUp(); Right.PickUp(); Eat(); Left.PutDown(); Right.PutDown(); } //loop }); } //Philosopher internal void Start() { this.Thread.Start(); } void Eat() { ShowNamedMessage("Philosopher {0} is eating"); } //Eat void Think() { ShowNamedMessage("Philosopher {0} is thinking"); } //Think Thread Thread; Fork Left, Right; } //class Philosopher
class Program { static void Main(string[] args) { Console.WriteLine("Process started"); Fork fA = new Fork("A"); Fork fB = new Fork("B"); Philosopher pA = new Philosopher("A", fA, fB); Philosopher pB = new Philosopher("B", fB, fA); pA.Start(); pB.Start(); Console.ReadKey(); } //Main } //class Program
} //namespace DiningPhilosophers
////////////// END OF CODE SAMPLE
Thank you very much.
Sergey A Kryukov
|
| Sign In·View Thread·PermaLink | 3.67/5 (2 votes) |
|
|
|
 |
|
|
Sergey,
Please read my answer to SAM below. The DLL can not prevent a deadlock once it happens, and you made sure in your code that it will happen.
Its purpose is to make the cases in which the deadlock does not happen 99.9% of the time (the hardest deadlocks to find, that occur at your customer site but never at your lab) and alert about them at your lab. In my answer to SAM I explained the rational of this behavior, and suggested him with a way to detect deadlocks that always happen with the DLL.
Assume thread 1 wants to lock A then B, and tread 2 wants to lock B then A. Assume in the first thread you already locked A. Now your second thread locks B. Now the second thread is about to lock A. I can not declare a deadlock here since nothing wrong in this scenario (yet, since I am not aware to thread 1’s intentions to lock B). So the code continues and a try to lock A is made, but then this lock will cause my data-structure protecting lock to stay locked, and the first thread (if it wishes to go on and lock B) will get stuck on my DS lock). Not protecting my DS is not an option since then false deadlock will be detected and also true potential deadlocks may not be detected.
Regarding VS 2005 this is true. Please read my suggestions to mickridgway2 bellow.
Eran
modified on Saturday, August 9, 2008 3:42 AM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
One additional remark about my suggestion to SAM below. If you are planing on using timeouts it will increase the chance of detecting the possible deadlock, but in order for the DLL to successfully detect it the problematic locks must be able to lock at least once! If I will take your example and repeatedly try it even with timeouts there is a good chance it won't detect the deadlock. But usually this won't be the case in most software.
In addition a 100% deadlock can usually be easily detected during development since I assume you execute parts of your code during development so it is easy to detect which new methods/locks created the deadlock.
Eran
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Can it detect more complex cases than a textbook case of the deadlock which is possible with just two thread and one lock/semaphore? How about the dining philosophers problem and other more complex cases? Thank you.
Sergey A Kryukov
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Sergey,
Thank you for your important question. The original posted file did not handle this issue since it was created in order to quickly find such a text-book case in short time due to a proble in an existing code that misbehaved. Nevertheless your remark is important so I uploaded a second file that should deal also with an indirect circular lock.
Eran
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Wow! Eran, thank you very much for such a quick and productive reply. Now, I should give it a try. Thanks again.
Sergey A Kryukov http://www.SAKryukov.org
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
No problem. Keep in mind it wasn't tested enough but I think it should be fine.
-- modified at 16:25 Tuesday 18th September, 2007
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi there,
I am getting:
The predefined type 'System.Threading.Monitor' is defined in multiple assemblies in the global alias; using definition from 'c:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\mscorlib.dll'
I am on .NET 2.0 using VS2005sp1.
Is there something I am missing?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi,
This is the "trick": you should be able to make the compiler use the Monitor that is defined in the cs file (included in the zip). In VS 2003 the compiler warns you about it and says that it chose the newer version (that is not the one in mscorlib).
For VS2005 (or .net 2) you should be able to go around this using aliases, I never tried it but I think it should not be a problem to get the compiler use the incslock.cs version intead of its default mscorlib. Let me know if you are having problems.
Eran
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I can't seem to get the compiler to every use the version incslock.cs. It always chooses mscorlib.
Any ideas on how to get this going with aliases?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi,
Well I didn't have much time to explore this issue, but if you can't find the way to do it then I would suggest the following:
Rename the namespace in the incslock.cs file I uploaded (System -> _System). use Monitor = _System.Thread.Monitor
This will solve the issue only if you are using a direct Monitor.SomeLockMethod. As for the lock keyword: since it is a syntactic sugar to the full path System.Threading.Monitor I would suggest search-replace it within your entire code with a TimedLock - style lock (google it). This way you will still have the robustness of the try..finally that the lock keyword hides and in the implementation use the _System.Threading.Monitor version (or use Monitor =...).
Sorry for not finding an easier solution, but if I'll find something I will update later on.
-- modified at 15:51 Thursday 18th October, 2007
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi Eransha,
I too am running into this issue, but your workarounds don't work for my project and what I need.
First, my project is in VS2005, and like the previous poster, I get:
Warning : The predefined type 'System.Threading.Monitor' is defined in multiple assemblies in the global alias; using definition from 'c:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\mscorlib.dll'
...so the official/unmodified version is being used, not the incslock version, and I don't see any mechanism to override this and force the linker to use the incslock version.
Second, my app doesn't directly use Monitor or call Monitor.Enter, so your suggestion of renaming the namespace to _System and calling _System.Thread.Monitor isn't a work around I can use.
Any suggestions? Can you look into this? It's a good article, but due to this I can't use it.
Thanks,
Jeff
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi Jeff,
I guess your code uses the "lock" clause, as I suggested in my previous answer you should be able to automatically search and replace all "lock" keywords into a kind of TimeLock pattern using the using keyword.
e.g. if your code looks like this:
lock(someobject) { }
It will look like this:
using (TimedLock.Lock(someobject)) { }
with the VS2005 IDE search & replace it should be done easily.
In your implementation of the TimedLock, use explicitly the _Monitor version rather than the Monitor of mscorlib.
Hope that helped, Eran
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
No can do. Yes, search & replace is simple, but with the project I'm working on, we're not using lock(). We're using a mix of background threads and Event threads. I've seen some locking going on in the UI (in the UI, when a UI thread gets locked) at times, and it would be nice if there was a tool that was able to point out when a deadlock or a blocking occurs, but we're not using lock().
Thanks,
Jeff
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Thanks for the dll. My university project had few deadlocks in it. It took me a 3 days to go over the locks I use and to detect the problematic locks. I just tried the dll within the problematic project and it directed me to another deadlock that I wasn't aware of since it didn't occur (yet...).
|
| Sign In·View Thread·PermaLink | 4.40/5 (2 votes) |
|
|
|
 |
|
|
 |
|
|
public static bool DoTryEnter(object obj) { bool flag = Monitor.TryEnter(obj); <<<<< deadlock will be here! if (flag) { UpdateOnLock(obj); ////// and this code is unreachable } return flag; }
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
Sam,
The most import thing to realize is that the DLL is NOT meant to prevent the deadlock from occurring. As I wrote in the article the DLL will induce the locks (otherwise other things in your program will not work right (assume you have a data structure you protect with different lock, if the DLL won't perform the actual work your program might fail for other reasons)).
The DLL is meant to be used in cases that your product works fine most of the time. Assume that you product works fine 99.9% of the time since the threads that lock in opposing order do not do it at the same time. In this scenario the DLL will have the chance to record the order of your locks, and once it detects wrong ordering it will fail your program (changing from 99.9% working fine to not working at all!) and it will take you to 2 different stacks of code that caused the possible deadlock.
If your program on the other hand DEADLOCKs 100% of the time the DLL won't work. But then again you would find the problem in your testing environment. Furthermore trying to prevent the lock from being locked or querying for the lock availability will hurt the detection on the complex scenario in which your software works fine 99.9% of the time.
My advice is that if your program do fail 100% of the time, change the locks to finite timeouts, and the DLL will detect the deadlock scenario.
-- modified at 2:44 Thursday 13th September, 2007
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
|
General News Question Answer Joke Rant Admin
|