Click here to Skip to main content
14,980,370 members
Articles / General Programming / Performance
Posted 13 Mar 2017


70 bookmarked

Threading - Under the Hood

Rate me:
Please Sign up or sign in to vote.
4.98/5 (48 votes)
19 Mar 2017CPOL7 min read
This article explores the performance, scalability and limitations of the various .NET Threading Implementations.

A Threading Implementation is simply a way to create threads - add parallelism and concurrency to applications.

All the research and analysis provided in this article is proven programmatically and the source code is provided. The results are certainly interesting and useful for threading intensive applications.


None of the Threading Implementations create threads immediately. Requests for threads are queued and the .NET Runtime decides when to create the threads. The first few threads are created nearly immediately but the speed at which subsequent threads are created depends on the Threading Implementation.

Explicit vs. Implicit Threading

.NET offers various Threading Implementations that can be categorized into two Threading Categories:

  • Explicit Threading
    • Threads are created explicitly: new Thread(new ThreadStart(Work)).Start();
    • All threads are created nearly immediately.
    • Thousands of threads can be queued.
  • Implicit Threading
    • Threads are created implicitly: After a task is queued, the thread is created automatically in the background.
    • Only the first few threads are created nearly immediately.
    • Millions of tasks can be queued.

4 Threading Implementations

There are 4 Threading Implementations, 4 ways to create threads - add parallelism and concurrency to applications. The following are the 4 Threading Implementations:

  • Asynchronous Invoke
  • Explicit Threading
  • Task Parallel Library (TPL)
  • Thread Pool

Asynchronous Invoke

Asynchronous Invoke is in the Implicit Threading Category. Following is basic sample code:

void CreateThread_Via_AsynchronousInvoke()
     new Delegate_SimulateWork(SimulateWork).BeginInvoke(null, null);
delegate void Delegate_SimulateWork();
void SimulateWork()

Explicit Threading

There is only one Threading Implementation in the Explicit Threading Category, and this Threading Implementation is also called: Explicit Threading. Following is basic sample code:

void CreateThread_Via_ExplicitThreading()
     new Thread(new ThreadStart(SimulateWork)).Start();
void SimulateWork()

Task Parallel Library (TPL)

Task Parallel Library (TPL) is in the Implicit Threading Category. Following is basic sample code:

void CreateThread_Via_TaskParallelLibrary()
void SimulateWork()

Thread Pool

Thread Pool is in the Implicit Threading Category. Following is basic sample code:

void CreateThread_Via_ThreadPool()
void SimulateWork(object state)

Threading Implementation Analysis Software

This software analyses Threading Implementations:

  • Determines Max & Safe Queue Limits
  • Tests the speed at which Worker Threads are created
  • Tests the number of Worker Threads that can be created
  • Compares the Threading Implementations

The software is easy to use as buttons are only enabled at the right time; however, further details on the software follows after this screenshot.

Image 1

The following is a brief description of each of the controls in the above screenshot:

  • Buttons
    • 4 Buttons at the top left: these initiate analysis of the 4 Threading Implementations.
    • Stop Analysis: this stops any analysis that may be underway.
    • Update Threading Implementation Comparison: updates the data grid at the bottom with the analysis results.
  • Chosen Queue Limit: the software determines the Max Queue Limit of a Threading Implementation based on resource consumption; however, if the Chosen Queue Limit is hit before resources run out, then the Chosen Queue Limit will be used as the Max Queue Limit.
  • Labels:
    • Analysis Phase: the phase of the analysis that is underway.
    • Threading Implementation: the Threading Implementation that is being analysed.
    • Max Queue Limit: the maximum number of tasks or threads that can be queued before the application throws an Out Of Memory Exception.
    • Safe Queue Limit: the Safe Queue Limit is 90% of the Max Queue Limit; however, if the Chosen Queue Limit was used as the Max Queue Limit, then the Safe Queue Limit is 100% of the Max Queue Limit.
    • Tasks Or Threads Queued: the number of Tasks Or Threads Queued as the analysis progresses.
    • Active Worker Thread Count: the number of threads actively simulating work as the analysis progresses.
    • Thread Count: the total number of threads in the application process.
    • Analysis Phase Start Time: the time when the current Analysis Phase started.
    • Analysis Phase Duration: the duration of the current Analysis Phase.
    • Average Worker Spawn Time (ms): the average time in milliseconds that it takes to spawn a worker thread.
    • Workers Spawned In First Second: the number of worker threads spawned in the first second after starting to queue Tasks Or Threads.
    • Process Memory Utilization (MB) : the amount of memory the application process is using.
  • Data Grid Columns:
    • Safe Limit: same as the 'Safe Queue Limit' label.
    • Workers: same as the 'Active Worker Thread Count' label.
    • Ave. Spawn Time: same as the 'Average Worker Spawn Time (ms)' label.
    • Workers In 1 Sec: same as the 'Workers Spawned In First Second' label.
    • Memory Used: same as the 'Process Memory Utilization (MB)' label.

Threading Analysis Phases

  • Idle: no analysis is taking place.
  • Determine Max Queued Tasks Or Threads: determine the maximum number of tasks or threads that can be queued for a Threading Implementation.
  • Wait For Queue To Clear: wait for the queue of tasks or threads to be cleared before continuing to the next phase.
  • Analyse Thread Creation: tests how many worker threads can be spawned and how long it takes.
  • Analysis Complete: analysis is complete and the user can update the Threading Implementation Comparison data grid.

Implicit Threading Analysis Does Not Complete

Unless a lower Queue Limit is chosen, millions of tasks are queued when analysing Implicit Threading Implementations and today's hardware generally can't run millions of concurrent threads; as such, Implicit Threading Analysis generally does not reach the Analysis Complete phase. The user can watch the average time to create a thread and the number of threads created. There comes a time when thread creation practically stops or becomes too slow, and this represents the limitations of the Threading Implementation being analysed. At this time the Threading Implementation Comparison data grid can be updated and the analysis can be stopped.

Threading Implementation Comparison

The data in the Threading Implementation Comparison data grid is the point of this whole exercise. From this data we can see the following strengths:

  • Explicit Threading
    • Spawns more worker threads
    • Spawns threads much faster
  • Implicit Threading
    • Allows millions of tasks to be queued

The following chart visualizes the Threading Implementation Comparison data. To visualize the data on the same chart, it is necessary to divide the Safe Queue Limit by 10,000 to ensure all the values are in the same numeric range (0-3000).

Note: results will vary based on Computer/Server Specs; however, the strengths/weaknesses of the various Threading Implementations should remain constant.

Image 2

Executing The Code Inside Visual Studio vs. Outside

It is interesting to note the following improvements when executing the code outside of Visual Studio:

  • Double the Memory Availability for the application and as a result:
    • Explicit Threading
      • Double the Queue Limit
      • Spawns double the number of worker threads.
    • Implicit Threading:
      • Double the Queue Limits
      • Strangely, there is no real improvement in the number of worker threads spawned.
  • Performance:
    • Explicit Threading: spawns worker threads 6X faster
    • Implicit Threading: strangely, no real improvement

Code In Article

All the code is very well documented, so it should be easy to find your way. Only the most important/interesting code will be shown in this article.

Simulate Work

The 4 Threading Implementations use the following code to simulate work.

//This delegate is only used by the Asynchronous Invoke Threading Implementation.
delegate void Delegate_SimulateWork();

/// <summary>
/// This overloaded function is only used by the Thread Pool Threading Implementation.
/// </summary>
/// <param name="state">This parameter is not used; simply set it to null.</param>
void SimulateWork(object state)

/// <summary>
/// This function is used by all the Threading Implementations to simulate work.
/// </summary>
void SimulateWork()
    //This function is called millions of times by numerous threads; so, thread safety is a concern.
    //It is necessary to use the Interlocked functionality to ensure the accuracy of these variables.
    Interlocked.Increment(ref activeWorkerThreadCount);

    //Threads simulate work indefinitely until workerThreadsContinueSimulatingWork is set to false.
    //Often threads simulate work here for 20+ minutes depending on the analysis duration.
    while (workerThreadsContinueSimulatingWork)

    Interlocked.Decrement(ref activeWorkerThreadCount);
    Interlocked.Decrement(ref tasksOrThreads_queued);

Queue Task Or Thread

The following code queues tasks or threads according to the Threading Implementation being analysed.

void QueueTask_or_thread()
    switch (threadingImplementationToAnalyze)
        case Constants.ThreadingImplementation.AsynchronousInvoke:
            new Delegate_SimulateWork(SimulateWork).BeginInvoke(null, null);
        case Constants.ThreadingImplementation.ExplicitThreading:
            new Thread(new ThreadStart(SimulateWork)).Start();
        case Constants.ThreadingImplementation.TaskParallelLibrary:
        case Constants.ThreadingImplementation.ThreadPool:

Determine Max Queued Tasks Or Threads

This code is used to determine the maximum number of tasks or threads that can be queued for a Threading Implementation.

    while (true)
        // Exit Point: Analysis will exit here if the Stop Analysis button is clicked
        if (!applicationIsInAnalysisMode)

        //Check for a Memory Fail Point.
        //This is the point where too much memory is used and an OutOrMemory Exception is thrown.
        //A Memory Fail Point indicates the: Max Queued Tasks Or Threads
        //This check works for all Threading Implementations except for Explicit Threading
        if (threadingImplementationToAnalyze != Constants.ThreadingImplementation.ExplicitThreading)
            if (tasksOrThreads_queued % 100000 == 0)//Performance Optimization
                    new System.Runtime.MemoryFailPoint(100);
                catch (Exception)

        //Chosen Queue Limit (chosen by the user):
        //If the System Resources don't first limit the: Max Queued Tasks Or Threads,
        //Then the Chosen Queue Limit is the Max Queued Tasks Or Threads.
        if (tasksOrThreads_queued == chosenQueueLimit)


        //The tasksOrThreads_queued variable is used to count the: Max Queued Tasks Or Threads
catch (Exception)
    //Do nothing.
    //Explicit Threading throws an OutOfMemory Exception; however, it is expected and handled below.
    //Max explicit threads is determined by creating explicit threads until an error is thrown.

Determine Safe Queue Limit

Here the Safe Queue Limit is determined based on the Max Queue Limit and the Chosen Queue Limit.

maxQueueLimit = tasksOrThreads_queued;
if (maxQueueLimit == chosenQueueLimit)
    //The system did not run out of resources.
    //As such, the Chosen Queue Limit is considered a safe limit.
    safeQueueLimit = chosenQueueLimit;
    //The system did run out of resources.
    //As such, the Safe Queue Limit must be less than the Max Queued Tasks Or Threads.
    //Safe Queue Limit is set to 90% of the Max Queued Tasks Or Threads.
    safeQueueLimit = (int)((double)maxQueueLimit * .90);

Analyze Thread Creation

Now that the Safe Queue Limit is known, the application now queues tasks or threads until it reaches the Safe Queue Limit. The user can then see how many worker threads are spawned and how long it takes.

for (tasksOrThreads_queued = 0; tasksOrThreads_queued < safeQueueLimit; tasksOrThreads_queued++)
    //Exit Point: Analysis will exit here if the Stop Analysis button is clicked
    if (!applicationIsInAnalysisMode)


See Something - Say Something

The goal is to have clear, error free content and your help in this regard is much appreciated. Be sure to comment if you see an error or potential improvement. All feedback is welcome.


This article has taken an in depth look 'under the hood' of .NET threading. Though it is by no means fully comprehensive, it highlights major differences in performance and scalability between the various .NET Threading Implementations. Hopefully this article and software will be helpful next time you develop a threading intensive application and need to choose a Threading Implementation.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Marco-Hans Van Der Willik
South Africa South Africa
Marco-Hans architected and developed Business Integration Software that integrates South Africa's entire food retail industry with Rhodes Food Group. His software integrates 5 JSE Listed, Blue Chip Companies: Rhodes Food Group, Shoprite Holdings, Pic n Pay, Woolworths, The Spar Group.

No business challenge is beyond understanding; no solution is out of reach. Perfect engineering at every level is priority. Marco-Hans has architected and developed large scale solutions for local and international blue chip companies in industries ranging from banking through retail and manufacturing to distribution.

Comments and Discussions

QuestionMy vote of 5 Pin
Dave Clemmer24-Mar-17 5:54
MemberDave Clemmer24-Mar-17 5:54 
AnswerRe: My vote of 5 Pin
Marco-Hans Van Der Willik24-Mar-17 7:23
professionalMarco-Hans Van Der Willik24-Mar-17 7:23 
GeneralMy vote of 5 Pin
tbayart17-Mar-17 2:49
professionaltbayart17-Mar-17 2:49 
SuggestionAnyCPU vs. 32 bit vs. 64 bit? Pin
clawton15-Mar-17 10:56
Memberclawton15-Mar-17 10:56 
GeneralRe: AnyCPU vs. 32 bit vs. 64 bit? Pin
Marco-Hans Van Der Willik15-Mar-17 23:24
professionalMarco-Hans Van Der Willik15-Mar-17 23:24 
QuestionVery interesting - Couple Questions Pin
asiwel15-Mar-17 10:00
professionalasiwel15-Mar-17 10:00 
AnswerRe: Very interesting - Couple Questions Pin
Marco-Hans Van Der Willik15-Mar-17 23:06
professionalMarco-Hans Van Der Willik15-Mar-17 23:06 
GeneralRe: Very interesting - Couple Questions Pin
asiwel16-Mar-17 6:51
professionalasiwel16-Mar-17 6:51 
Thank you for your prompt and detailed reply which helps me understand the program. Oddly, as I read about and work a little bit with "threads", the concept of what they are continues to be fuzzy. I was thinking a thread is code that starts running (i.e., is spawned) in a time slot in a real or logical core; does some work (like sleeping for 1 second); and then terminates. But a "thread" can also be viewed as another small "cpu" unit or process; spawned at the very beginning; and sitting there ready to do work on demand when called upon until the application ends.

Queuing can mean either saving a list of jobs; spawning threads to run each job and then quit (with some limit on the possible number available at any particular time); until the list is exhausted ... or it can mean starting a bunch of "permanent" threads (perhaps as many as you can) and having them sit there receiving data and processing it (or sleeping intervals) for as long as there is data in a big queue list to send to those threads.

Either way, it seems obvious that available memory would limit the size of the queue. However, the number of processors and time slots would independently limit the number of threads that could be spawned and run "at once" (concurrently and multiplexed).

Anyway, I am thinking "overhead" and "time to spawn" sort of mean the very same thing and a second is sure a lot of time, one way or another. Which is probably why having a "threadpool" for jobs needing just a few threads is a very good thing.

I guess I had just given up waiting, thinking that Step 3 was not going to quit (and that 30 seconds running with many threads would be enough benchmark data for comparison). I'll try again and allow the program to run for more than 20 minutes .. maybe in an hour or so, it will say "test completed". :-}

Thanks again for a very interesting article! And a very well written and documented program!
GeneralRe: Very interesting - Couple Questions Pin
Marco-Hans Van Der Willik16-Mar-17 10:04
professionalMarco-Hans Van Der Willik16-Mar-17 10:04 
GeneralMy vote of 5 Pin
Jackson Savitraz14-Mar-17 0:33
professionalJackson Savitraz14-Mar-17 0:33 
GeneralRe: My vote of 5 Pin
Marco-Hans Van Der Willik14-Mar-17 5:08
professionalMarco-Hans Van Der Willik14-Mar-17 5:08 
GeneralRe: My vote of 5 Pin
Marco-Hans Van Der Willik14-Mar-17 11:52
professionalMarco-Hans Van Der Willik14-Mar-17 11:52 
Generalyou Pin
Member 1305777314-Mar-17 0:34
MemberMember 1305777314-Mar-17 0:34 
QuestionDifference between TPL and Thread Pool Pin
Ehsan Sajjad13-Mar-17 21:44
professionalEhsan Sajjad13-Mar-17 21:44 
AnswerRe: Difference between TPL and Thread Pool Pin
Marco-Hans Van Der Willik13-Mar-17 22:55
professionalMarco-Hans Van Der Willik13-Mar-17 22:55 
GeneralMy vote of 5 Pin
Ehsan Sajjad13-Mar-17 21:41
professionalEhsan Sajjad13-Mar-17 21:41 
GeneralMy vote of 5 Pin
IsabelleRichet13-Mar-17 19:11
MemberIsabelleRichet13-Mar-17 19:11 
PraiseImpressive Work Pin
Andrew Watson Biz13-Mar-17 11:01
MemberAndrew Watson Biz13-Mar-17 11:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.