Click here to Skip to main content
Click here to Skip to main content

Non Kernel Semaphore

By , 11 Jun 2012
 

Introduction

I've been using threads a long time in .NET and honestly the topic interests me very much. When I began to get deeper into multithreading I came to use other threading constructs rather than just the Monitor (lock) and Interlocked classes. Interlocked and volatile mechanisms are user mode constructs meaning that they do not transition into the kernel. Kernel level objects like Mutex and Semaphores are created and managed by the operating system and are system wide. If we use a semaphore in our application we must leave user mode and transition into kernel mode just to create it. There is a slight performance hit that we incur since the kernel does not trust us. While this hit is negligible it can begin to cost a lot when used in different types of scenarios.

Background

I like the functionality that the semaphore provides however in some cases I would like to use it in just one process and not transition into kernel mode and take a hit. I've decided to write a very simple non-kernel semaphone.

What it has:

  1. Fast creation time since it just a regular .Net non-kernel object
  2. Only transitions to kernal mode when the maximum capacity has been met and more threads want to use it.

What it lacks:

  1. It's not system wide which makes sense because it 'mostly' stays in user mode.
  2. It does use Monitor which can transition into kernel mode, however this only occurs when all the slots have been filled.

The code

 public class NonKernalSemaphore 
    {
        private readonly object padlock = new object();
        private readonly int maxSlots;
        private int usedSlots;

 
        public NonKernalSemaphore(int maxSlots)
        {
            this.maxSlots = maxSlots;
        }

        public void Enter()
        {
            lock (padlock)
            {
                while (usedSlots == maxSlots)
                {
                    Monitor.Wait(padlock);
                }

                usedSlots++;
            }
        }

        public void Release()
        {
            lock (padlock)
            {
                if (usedSlots > 0)
                {
                    usedSlots--;
                    Monitor.Pulse(padlock);
                }
            }
        }
    }

I wanted to keep this class as simple as possible without adding fluff so that I could cater for the basic usage scenario.

NonKernelSemaphore takes the maximum number of threads allowed at one time as constructor parameter maxSlots. It also keeps a count of how many threads are currently using the semaphore using the usedSlots field.

Enter Method Explanation

When a consumer calls the Enter() method a lock, padlock, is first accquired. We now check if there are any slots remaining in the semaphore. Obviously if usedSlots is equal to the maxSlots then all the slots are occupied. In this case we wait until another thread releases the slot by calling the Release() method. By calling Monitor.Wait() we transition into kernel mode and let the thread scheduler take care of things. Our scope at this point is still only process wide.

Release Method Explanation

Again in this method we must first call lock on the shared padlock object. We now decrease the usedSlots count which releases a slot in the semaphore. Then we awaken all the threads who are waiting on a slot by calling Monitor.PulseAll(). When the threads wake up they must first check if all the slots are used again hence the while condition (usedSlots == maxSlots).

Optimization

One optimization I was considering is only calling Monitor.Pulse() only if there are threads waiting for an available slot. This is definitely better than blindly calling the method. In this case we just need to keep track of how much many threads are waiting by introducing a new variable, waitingCount of type int incrementing it whenever a thread is waiting. The Release() method will then check if there are threads waiting. If there is then call Monitor.Pulse() otherwise just decrease usedSlots as usual. Here is the code:

 

 public class NonKernalSemaphore 
    {
        private readonly object padlock = new object();
        private readonly int maxSlots;
        private int usedSlots;
        private int waitingCount;

        public NonKernalSemaphore(int maxSlots)
        {
            this.maxSlots = maxSlots;
        }

        public void Enter()
        {
            lock (padlock)
            {
                if (usedSlots == maxSlots)
                {
                    waitingCount++;

                    do
                    {
                        Monitor.Wait(padlock);

                    } while (usedSlots == maxSlots);

                    waitingCount--;
                }

                usedSlots++;
            }
        }

        public void Release()
        {
            lock (padlock)
            {
                if (usedSlots > 0)
                {
                    usedSlots--;

                    if (waitingCount > 0)
                    {
                        Monitor.Pulse(padlock);
                    }
                }
            }
        }
    }





Updates

Originally I used Monitor.PulseAll() however a Code Project reader ran some tests found that Pulse() was a bit faster.

Tests

Machine: Windows 7 Professional (64), Intel Xeon(R) 2.4 Ghz, 12 GB RAM, .Net 4.0, Release Mode

Test 1

Creation Time (Milliseconds) after 1 million iterations:

NonKernelSemaphore: 76

Semaphore: 2263

SemaphoreSlim: 623

Test 2

Enter/Wait/Release Time

Max concurrent slots: 8

Number of threads: 64

My test procedure included something like:

       public static void DoNonKernel()
        {
            nks.Enter();
            
            nksc++;
            nks.Release();
        }

        public static void DoSemaphore()
        {
            s.WaitOne();
            
            sc++;
            s.Release(1);
        }

        public static void DoSemaphoreSlim()
        {
            sl.Wait();
            
            slc++;
            sl.Release();
        }

I ran this a few times. Here are the results in milliseconds:

SemaphoreSlim:453.0453
Semaphore:241.0241
NonKernelSemaphore:249.0249

SemaphoreSlim:288.0288
Semaphore:249.0249
NonKernelSemaphore:249.0249

SemaphoreSlim:278.0278
Semaphore:264.0264
NonKernelSemaphore:285.0285

SemaphoreSlim:235.0235
Semaphore:273.0273
NonKernelSemaphore:246.0246

SemaphoreSlim:226.0226
Semaphore:237.0237
NonKernelSemaphore:217.0217

SemaphoreSlim:228.0228
Semaphore:224.0224
NonKernelSemaphore:381.0381

Points Of Interest

Feedbacks welcomed!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

FatCatProgrammer
Software Developer (Senior) Finance Industry
United States United States
Currently pursuing 'Programming Nirvana' (The ineffable ultimate in which one has attained disinterested wisdom and compassion as it relates to programming)
 
Acknowledgements:
 
Microsoft Certified Technologist for WPF and .Net 3.5 (MCTS)
Microsoft Certified Technologist for WCF and .Net 3.5 (MCTS)
Microsoft Certified Application Developer for .Net (MCAD)
Microsoft Certified Systems Engineer (MCSE)
Microsoft Certified Professional (MCP)
 
Sun Certified Developer for Java 2 Platform (SCD)
Sun Certified Programmer for Java 2 Platform (SCP)
Sun Certified Web Component Developer (SCWCD)
 
CompTIA A+ Certified Professional
 
Registered Business School Teacher for Computer Programming and Computer Applications (2004)
(University of the State of New York Education Department)
 
Graduated from University At Stony Brook

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
SuggestionIssuememberMaxim Kartavenkov8-Oct-12 6:17 
You have an issue with the locking. See: your both two methods performs locking on the object - both uses lock(padlock) statement, but in one method you are performing waiting operation inside lock, which will cause deadlocking in case if one method will be waiting inside Enter method in one thread and in other thread you will call Release. This is a major issue in multithreading applications - waiting for a long period inside critical sections should be avoided, but your code can wait infinite.
 
Maxim.
GeneralRe: IssuememberFatCatProgrammer8-Oct-12 13:25 
Thanks for your comment. However you are incorrect as Monitor.Wait releases the lock when it waits.
 
That's just how it works
Relativity

GeneralRe: IssuememberMaxim Kartavenkov8-Oct-12 21:12 
Then it should works if lock reacqure after wait and you are right to use Pulse instead of PulseAll, as otherwise if can enter few times instead of one.I suggest you to add functionaly to unblock waiters and discard entering on disposing - this will avoid deadlocking in there - just saw comments abt deadlocks. And yes critical sections will be working faster than kernel approach.
 
Maxim.
GeneralRe: IssuememberFatCatProgrammer9-Oct-12 13:52 
Actually even if a PulseAll is called only one thread will acquire the lock. We just waste a lot of cycles waking threads up, having them contend for a lock, and then putting the losers back to sleep. Smile | :)
Relativity

GeneralMy vote of 3 [modified]memberozbear18-Jun-12 13:48 
unsure if it is accurate

modified 19-Jun-12 21:23pm.

GeneralRe: My vote of 3memberFatCatProgrammer18-Jun-12 16:26 
Hi, what I'm saying is that they will spin in user mode for approximately 20 ms (depends on the architecture). If there are empty slots a lock/release will occur rather quickly and a thread that calls Enter is more likely to acquire the lock before transitioning to kernel mode.
 
BTW, please no one liners. Fully explain your point. Thanks
Relativity

Questionlock statement is just a disguised Monitor.Entermemberozbear18-Jun-12 13:46 
You say that a kernel transition only occurs when the number of slots is filled causing a Monitor.Wait() to be issued.
Are you saying the lock statements, which acquire a mutual exclusion lock and are really just a disguised Monitor.Enter/Exit calls, do not also cause a transition to kernel mode?
AnswerRe: lock statement is just a disguised Monitor.EntermemberFatCatProgrammer18-Jun-12 16:25 
Hi, what I'm saying is that they will spin in user mode for approximately 20 ms (depends on the architecture). If there are empty slots a lock/release will occur rather quickly and a thread that calls Enter is more likely to acquire the lock before transitioning to kernel mode.
Relativity

GeneralRe: lock statement is just a disguised Monitor.Entermemberozbear19-Jun-12 15:23 
You haven't answered my question so I will repeat it: Does a Monitor.Enter call, itself, cause a transition to kernel mode or not?
If it does, then you haven't avoided kernel transitions at all, which I thought was the whole point.
Just askin'.
GeneralRe: lock statement is just a disguised Monitor.EntermemberFatCatProgrammer20-Jun-12 1:07 
That's what I'm saying...Moniter.Enter will spin in user mode for a approx 20 ms give or take only then it will transition to kernel. If there are open slots then the call to Enter is lightening fast and this allows threads to acquire the lock before transitioning to kernel mode. It's that simple.
Relativity

GeneralMy vote of 4memberoggenok11-Jun-12 12:45 
Actually i don't understand at all what you are trying to achieve. Mutex'es, semaphores and all other sorts of synchronization primitives must by very definition have global scope. Anything else makes no sense at all.
Please don't reinvent the flat tire. Read and re-read Dijkstra's original article - http://www.cs.utexas.edu/~EWD/transcriptions/EWD01xx/EWD123.html. It's dense with meaning.
Your scheme will work in a very limited number of cases, but it is not a general solution.
 
I've voted 4 for your article because it raises some interesting points, but i could just as well have given it a -1 if that was possible.
- oggenok

GeneralMy vote of 4memberVitaly Tomilov10-Jun-12 7:50 
I don't like Sleep functions!
GeneralRe: My vote of 4memberFatCatProgrammer11-Jun-12 4:24 
Thanks for the feedback. I removed the 'Sleep' in the tests re-ran the tests and updated the article. It should be updated shortly in Code Project. Please update your vote if necessary.
 
Also, I really hate one liners. Please tell me WHY you hate Thread.Sleep. I know that it has become 'popular' to not like it. However I would like your explanation since I'm only using it for a simulation here.
Relativity

GeneralMy vote of 4 [modified]mvpPaulo Zemek6-Jun-12 10:59 
I will revote when you correct it, but waitingCount-- should be inside the if.
Ok... you corrected it, so I revoted. I tried to change the title here, but codeproject does not keep the new title.

modified 7-Jun-12 0:23am.

GeneralRe: My vote of 4memberFatCatProgrammer6-Jun-12 13:05 
Oops can't believe I missed that! thanks
Relativity

QuestionGood, but...mvpPaulo Zemek6-Jun-12 6:40 
I myself wrote an article on the topic (in fact, for creating managed versions of AutoResetEvent, ManualResetEvent, Semaphore and ReaderWriterLock) here: .Managed Thread Synchronization[^]
 
But my question is: Did you measure the time to use the hashset in the optimized version? And, can't a simple integer be used instead (a simple count on how many threads are waiting)?
AnswerRe: Good, but...memberFatCatProgrammer6-Jun-12 7:41 
Yes, however the same thread can wait, wake up and wait again in the 'while' loop. This can result in false increments. At least the Hashset only adds the thread id once. And when we wake up and wait again the Hashset will not duplicate the entry. This results in an accurate count.
Relativity

GeneralRe: Good, but...mvpPaulo Zemek6-Jun-12 7:50 
In fact... that is something I was about to comment...
Put the add (or the increment) before the while, not inside it.
 
Also, is it really possible that a thread awakes from a Wait when the state is still invalid using Pulse only (with PulseAll it is... I am not sure about Pulse).
GeneralRe: Good, but...mvpPaulo Zemek6-Jun-12 8:47 
In fact, the implementation can be:
 
public void Enter()
{
  lock (padlock)
  {
    if (usedSlots == maxSlots)
    {
      waitingThreadCount++;
 
      do
      {
        Monitor.Wait(padlock);
      } while (usedSlots == maxSlots)
 
      waitingThreadCount--;
    }
 
    usedSlots++;
  }
}
 

But I am also not sure if the while is needed at all. In my own implementation I did it, but I really think that it may return from Wait in an invalid state if there is a PulseAll somewhere (in my implementation, there is as I can "Release" many times at once).
GeneralRe: Good, but...memberFatCatProgrammer6-Jun-12 9:15 
Thanks I like the idea. I re-ran the performance tests after making the change and DID NOT see any much gain. However I like the idea of not using the HashSet. I think i'll leave that in.
Relativity

GeneralRe: Good, but...mvpPaulo Zemek6-Jun-12 10:54 
I am not sure if your test is really like that... but I see a Thread.Sleep in the test...
To really check the performance, you should execute the test millions of times, but without any Sleep. That will show how many time is really spend in the acquisitions/release of the semaphore.
But either way... I think an int is always faster than a hashset and, in that case, the hashset is not to be seen, so there are no problems in losing it.
 
Edit: Also it will be good to see the differences where they are (or may be) waiting clients to obtain the semaphore and the semaphore got immediately approach. In fact, as happens with locks, many times we use them to prevent some kind of excessive concurrency, but often there is no concurrency at all (in fact, your optimization is for that... no one waiting). So, show the speed advantage of that approach in a separate speed comparison.
SuggestionGood article, but expecting more [modified]memberSridhar Patnayak6-Jun-12 1:16 
Good Knowledge sharing article, add your thread performance result and describe it, which will give good voting
 
Keep it up Thumbs Up | :thumbsup:

modified 6-Jun-12 7:23am.

GeneralRe: Good article, but expecting morememberFatCatProgrammer6-Jun-12 3:28 
Look at the message by Yaroslav Tatarenko below. It has the performance metrics there.
I may add that into this article when I get a chance.
Relativity

GeneralRe: Good article, but expecting morememberFatCatProgrammer6-Jun-12 6:28 
Tests are there now!
Relativity

GeneralMy vote of 1memberYaroslav Tatarenko4-Jun-12 20:34 
There are 2 problems:
1. Solution has deadlock
2. Poor performance compared to kernel semaphore and SemaphoreSlim
GeneralRe: My vote of 1memberFatCatProgrammer5-Jun-12 1:34 
Thank you for the comment. However you can't just say something and not back it up. Please elaborate so that I can look into it! A deadlock is when two threads are waiting for each others lock to be released in order to acquire them. Here there is only one lock. Please explain!
Relativity

GeneralRe: My vote of 1memberYaroslav Tatarenko5-Jun-12 7:12 
Pay attention to usedSlots field. It is a source of problem when Monitor was pulsed. Just add volatile keyword Wink | ;)
But this is a first problem. What about second?
GeneralRe: My vote of 1memberFatCatProgrammer5-Jun-12 11:26 
You're going to have to be specific here. I appreciate your feedback but you keep leaving me one liners.
If there is something wrong specifically point it out! Let's not keep goose chasing.
 
As for the second..
What are your performance numbers? Give them to me.
Relativity

GeneralRe: My vote of 1memberYaroslav Tatarenko5-Jun-12 21:51 
OK, my test is 32 threads, concurrent limit is 8 (usedSlots is 8), each thread increase counter using Interlocked function.
Test system is Core i7 4.6GHz/16GB/Windows 7 x64/.NET 4
Results
NonKernelSemaphore 00:00:35.3012097
SemaphoreSlim 00:00:33.9184206
Semaphore 00:00:32.7732066
I tested with another variants of source parameters (thread count and concurrent limit) and found that Semaphore is not the best choice for all source parameter sets. Sometimes SemaphoreSlim has better result, but in all cases NonKernelSemaphore has the worst result.
I little change your code and exchange Monitor.PulseAll to Monitor.Pulse (I didn't understand what a reason to use PulseAll) and ...
NonKernelSemaphore 00:00:33.3612481
This result is better than SemaphoreSlim in my test Wink | ;)
If you want I can share my test solution Smile | :)
Unfortunately I haven't created deadlock test yet. May be if I have more time I will create it. Sorry.
GeneralRe: My vote of 1memberFatCatProgrammer6-Jun-12 0:40 
Thank you very much for putting in the time and getting some results. Now that we have the numbers (empirical data) we have a starting point for a real discussion.
 
I was considering using the Pulse() instead of PulseAll() since PulseAll() will awaken all the waiting threads who then must all content for the lock even though only one will succeed. I will change to Pulse() since it offers some performance improvements. (nice find)
 
Also you did not include object creation time in your tests. You should do that as well if you want to cover more cases.
 
As for the deadlock....I'm still waiting.....Smile | :)
Now if you can reason out why a deadlock can occur then that is good enough for me. Give me the reason and scenario. You may not ever be able to observe a deadlock in your testing but that doesn't mean the possibility isn't there! And I may have to change some more code Smile | :)
 
By the way since NonSemaphore beat the SemaphoreSlim in your test cases does a vote of ONE still apply? Smile | :)
Relativity

GeneralRe: My vote of 1 [modified]memberMember 861853811-Jun-12 23:01 
public static void DoNonKernel()
{
nks.Enter();
nks.Enter();
 
nksc++;
nks.Release();
nks.Release();
}
 
With maxSlots == 1 this will cause a lockup or "deadlock" if you will.
 
However, the .NET Semaphore has the same behaviour. So, this does not seem to be a bug. I believe the fact that a Critical Section and Mutex are re-entrant within the same thread muddies the issue a little.
 
Also, why would waitingCount ever be anything other than 0 in the Release() method? Competing threads would wait on the lock, not in the locked section.

modified 12-Jun-12 5:25am.

GeneralRe: My vote of 1memberFatCatProgrammer12-Jun-12 15:10 
"Also, why would waitingCount ever be anything other than 0 in the Release() method? Competing threads would wait on the lock, not in the locked section."
 
This is not a true statement. Many threads could be waiting and the waitingCount increases. When Monitor.Wait is called the thread 'releases' the lock and waits for a pulse allowing more threads in come it and wait if necessary.
Relativity

Generalvolatile is redundantmemberOleksandr Kucherenko6-Jun-12 3:00 
The volatile modifier is usually used for a field that is accessed by multiple threads without using the lock statement to serialize access. Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.
 
http://msdn.microsoft.com/en-us/library/x13ttww7(v=vs.71).aspx[^]
Good Luck
Alex Kucherenko

GeneralRe: volatile is redundantmemberYaroslav Tatarenko6-Jun-12 3:56 
It is true because Monitor.Enter and Leave create memory barriers, but Monitor.Wait is not.
GeneralRe: volatile is redundantmvpPaulo Zemek6-Jun-12 6:47 
Where did you get that info?
As far as I know, Monitor.Wait is completely safe regarding memory barriers, as it is the Enter and Exit pair.
GeneralRe: volatile is redundantmemberFatCatProgrammer6-Jun-12 7:36 
I agree
Relativity

GeneralRe: volatile is redundant [modified]memberYaroslav Tatarenko6-Jun-12 7:45 
I use public sources of .NET 2.0 (you can find it by pattern "sscli20_20060311"). The function Monitor.Wait() extracts a suitable header of object and finds sync block that corresponds for this header. After that it analyzes linked events of sync block and determines when operation can be completed. If linked events has been raised the wait operation will terminate immediately, elsewhere check thread repeats. If another thread is there the function will enqueue thread with event in global thread queue, registers callback and going to thread sleep. Nothing else.
Monitor.Pulse dequeues pair thread and event and raise event. After event rise system will call callback function. This function blocks specified thread and waiting for rising linked event. In our case it will be risen when function Monitor.Leave was called.
PS:
Text from MSDN article:
This method returns when the calling thread reacquires the lock on the object.
I haven't found any references to memory barrier in this article. Where did you get information about barriers in Wait function?

modified 6-Jun-12 14:00pm.

GeneralRe: volatile is redundantmvpPaulo Zemek6-Jun-12 7:55 
So, if it returns immediately... that means that between the lock, the check for the while (that returned false) and the call to Wait someone pulsed that thread.
Considering it can only be pulsed inside a lock, I think such situation can only arrive if between the Wait itself the state got updated and, in this case, memory is safe.
 
If that's not the case, Wait really waits... and, in this case, it is already doing the MemoryBarrier. So... it is safe.
GeneralRe: volatile is redundantmemberYaroslav Tatarenko6-Jun-12 8:08 
So, if it returns immediately... in this case, memory is safe.
I agree
and, in this case, it is already doing the MemoryBarrier.
Could you explain this idea more? I do not understand.
GeneralRe: volatile is redundantmvpPaulo Zemek6-Jun-12 8:10 
System events (used for the wait) always do memory barriers.
So, putting a volatile will only lose time doing another memory barrier.
GeneralRe: volatile is redundantmemberYaroslav Tatarenko6-Jun-12 8:21 
Jesus Christ! You are right! I forgot it Frown | :(
 
FatCatProgrammer, I'm sorry. I'm fixing rating immediately.
GeneralRe: volatile is redundantmemberFatCatProgrammer6-Jun-12 9:25 
Apologies accepted Smile | :) . Nothing wrong with a good discussion!
Relativity

GeneralRe: volatile is redundantmemberabdurahman ibn hattab7-Jun-12 9:32 
Probably if Wait reacquires the lock then this implies that it does the memory barrier again (while reacquiring the lock). Otherwise the implementation would be not consistent and thus buggy.
 
Talking in the terms of practice:
I never experienced synchonization problems when monitor was used without the explicit volatile keyword during my 10+ years experience with .NET monitor. Even the compiler handles this situation and takes off the optimization of data reloading.
 
So, if you can't deliver the reproduction sample then you have to deliver silence Laugh | :laugh:
 
EDIT: sorry, did not notice that it was already answered

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130617.1 | Last Updated 11 Jun 2012
Article Copyright 2012 by FatCatProgrammer
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid