Click here to Skip to main content
Click here to Skip to main content

Multi core programming using Task Parallel Library with .NET 4.0

By , 9 Apr 2012
 

Introduction

Nowadays, all personalcomputer and workstations come with multiple cores. Most .NET applications failto harness the full potential of this computing power. Even when developers attempt to do so, it isgenerally be means of writing low level manipulation of threads and locks. Thisoften leads to a situation, where the code becomes either un-readable or fullof potential threats. These threats are often not detected if running on asingle Core machine.

The task parallel library allows you to write code which is human readable, less error prone, and adjusts itself with the number of Cores available. So you can be sure that your software would auto-upgrade itself with the upgrading  environment. 

What kind of Performance Boost are we talking about?

What is the first thing that you try to do, when you see parts of your code not performing well. Lazy load, Linq queries, Optimizing For loops, etc. We often overlook parallelization in the time consuming independent units of work.

Most often the CPU will show you the following story during your performance intensive routines.

Shouldn’t your CPU be utilized more like this?

Task Parallel Library

The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces in the .NET Framework 4.0. The TPL scales the degree of concurrency dynamically to efficiently use all the cores that are available. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.

The Task Parallel Library introduces the concept of “Task”. Task parallelism is the process of running these tasks in parallel. A Task is an independent unit of work, which runs within a program. Benefits of identifying tasks within your system are:

  • More efficient and more scalable use of system resources.
  • More programmatic control than is possible with a thread or work item.

The task parallel library utilizes the Threads under the hood to execute these tasks in parallel. The decision and number of Threads to use is dynamically calculated by the runtime environment.

Why Tasks? Why not threads?

The creation of a thread comes with a huge cost. Creating a huge number of Threads within your application also comes with an overhead of Context Switching. In a single core environment, it might lead to a bad performance as well, since we have a single core which serves various threads.

The task on the other hand, dynamically calculates if it needs to create different threads of execution or not. It uses the ThreadPool under the hood, in order to distribute the work, without going through the overhead of Thread creation/or un-necessary context switching if not required.

Fig 1. The time difference between a traditional Thread based approach, and a task based approach.

The following code snippet shows the creation of parallel tasks using Threads and Task.

You can download the sample used above. 

So how  is this different from creating a thread again? Well, one of the first advantages of using Tasks over Threads is that it becomes easier to guarantee that you are going to maximize the performance of your application on any given system. For example, if I am going to fire off multiple threads that are all going to be doing heavy CPU bound work, On a single core machine we are likely to cause the work to take significantly longer. It is clear, threading has overhead, and if you are trying to execute more CPU bound threads on a machine than you have available cores for them to run, then you can possibly run into problems. Each time the CPU has to switch from thread to thread, there is a bit of overhead, and if you have many threads running at once, then this switching can happen quite often, causing the work to take longer than if it had just been executed synchronously. This diagram might help spell that out for you a bit better:

As you can see, if we aren’t switching between pieces of work, then we don’t have the context switches between threads. So, the total cumulative time to process in that manner is much longer, even though the same amount of work was done. If these were being processed by two different cores, then we could simply execute them on two cores, and the two sets of work would get executed simultaneously, providing the highest possible efficiency.

Why Tasks? Why not ThreadPools?

Now when we have a slight idea of Tasks and their capacity, let us look into these Tasks in a little more detail and how they are different from ThreadPools.

Let us see how you can start a new execution on a ThreadPool

Let us see what you will have to do if you wish to Wait () for the thread to finish.

Messy! Isn’t is?.

What if you have to wait for 15 threads to finish?

How do you capture the return values from multiple threads?

How do you return the control back to GUI thread?

There are answers to it. Delegates, Raising events but this leads to an error prone situation when we drill into a chain of multi threaded actions.

Let us see how Tasks handle this situation elegantly:

Creation of a new Task

Waiting on Tasks:

Execute another Async task when the current task is done:

In real world scenarios, we often have multiple operations which we want to perform asynchronously. Look at the following code snippet and see how you can model it alternatively.

Parallel Extensions:

Parallel extensions have been introduced along with the Task Parallel Library to achieve data Parallelism. Data parallelism refers to scenarios in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array. The .NET provides new constructs to achieve data parallelism by using Parallel.For and Parallel.Foreach constructs.

Let us see how we can use these:

The above mentioned Parallel.ForEach construct utilizes the multiple cores and thus enhances the performance in the same fashion.  

The following graph shows, how parallel extensions improve the performance of the system:

Fig 1. Matrix multiplication running on a Dual Core machine. The parallel extensions consume less time.

Fig 2. Matrix multiplication running on a Quad Core machine. The same code consume far less time without any modifications

Fig 3. Matrix multiplication running on a single core machine. The execution time remains identical.

You can download the code from the following link [Download]

Conclusion

The parallel extensions and the task parallel library helps the developers to leverage the full potential of the available hardware capacity. The same code can adjust itself to give you the benefits across various hardware. It also improves the readability of the code and thus reduces the risk of introducing nasty bugs which drives developers crazy.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

varun_manipal
Software Developer (Senior)
India India
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberSperneder Patrick5 Feb '13 - 20:08 
Really clear and understandable article. hi-five! Smile | :)
QuestionParallel programming in . NET Framework 4.0memberDmitry Skorik29 Jan '13 - 16:50 
Parallel programming in . NET Framework 4.0
 
http://www.enterra-inc.com/techzone/parallel-programming-in-net-framework-4-0/
AnswerRe: Parallel programming in . NET Framework 4.0memberjibesh29 Jan '13 - 16:54 
you are spamming the forum please delete all your recent posts else you will be reported.
Jibesh V P

GeneralMy vote of 5memberdeepakdynamite24 Dec '12 - 17:53 
Nice explanations with examples
GeneralMy vote of 5memberThornik9 Nov '12 - 0:57 
My vote for 5 is a "thank" that you show well: MS again reinvented wheel. Because there is absolutely no difference between tasks and threads if you create threads as much as CPUs you have. And well... some syntax sugar to wait multiple threads, that's it! MS _WASTE_ our time while we was waiting for a really new features.
GeneralMy vote of 5memberdevvvy31 Oct '12 - 16:39 
great one (I missed out on this until today!)
GeneralMy vote of 5membersandippatil6 Oct '12 - 22:04 
abcd
GeneralMy vote of 5memberEnsamblador1 Oct '12 - 15:45 
very instructive, tanks
GeneralMy vote of 5memberSirius B23 May '12 - 13:13 
Short, clear, and backed by numbers. 5 stars 4 sho!
GeneralMy vote of 5memberАslam Iqbal17 May '12 - 11:23 
After a lot of testing I found that Parallel processing always win. So my vote is 5. Thanks for sharing.
GeneralMy vote of 5memberMihai MOGA12 May '12 - 18:28 
This is a great inspiring article. I am pretty much pleased with your good work. You put really very helpful information. Keep it up once again.
GeneralMy vote of 5memberMonjurul Habib10 May '12 - 19:29 
5
QuestionnicememberCIDev19 Apr '12 - 3:10 
Very nice, especially for a first article. Please keep writing. The only thing I would suggest is to cover things in even more depth.
Just because the code works, it doesn't mean that it is good code.

QuestionWhy not always use this?memberninj4n17 Apr '12 - 3:54 
The performance improvement of the Parallel.ForEach seems quite convincing. Is there a reason not to use this or the Parallel.For every time a loop is needed?
 
Thanks!
AnswerRe: Why not always use this?memberPaul8917 Apr '12 - 6:59 
In general the code using multi-threading is far more difficult to write correctly and debug. It can be even dangerous if used recklessly. Also in some simple loops there might be no performance gain at all. Iterating a few times might not overcome the cost of maintaining multiple threads.
GeneralRe: Why not always use this?membervarun_manipal17 Apr '12 - 10:13 
Hi Paul89,
 
The multi-threading do adds some complexity to the code. However, TPL is supposed to provide a controlled execution which is easier to read as well.
 
I guess the key to identify here is if the code is an ideal candidate for parallel execution or not.
GeneralRe: Why not always use this?memberАslam Iqbal17 May '12 - 11:25 
yeah, you are right.
AnswerRe: Why not always use this?membervarun_manipal17 Apr '12 - 10:11 
Hi Ninj4n,
 
The trick to using parallel programming is to identify the segments of code which are ideal candidate for Parallel processing.
 
You would want to add parallelism to independent units of code, however, if the executing unit isn't independent, Parallelism is a "No".
 
Ex: Imagine bubble sorting an array within a For loop, where every (i+1)th iteration is dependent on the ith execution, Adding Parallelism will lead to incorrect results there.
 
However, matrix multiplication (The one in the download sample) is a perfect example of independent unit of work and thus is suited for Parallelism.
 
Hope this helps.
GeneralRe: Why not always use this?memberninj4n17 Apr '12 - 20:57 
Okay, I get that. It was just that, compared to other parallelism methods, this seemed a very easy way, and it might have been one of those things you could just do "blindly". But clearly that's not the case. Thanks for the answer!
AnswerRe: Why not always use this?memberRugbyLeague22 Apr '12 - 23:55 
on my i7 machine if I set to zero each byte in a byte array of 10 million within a normal "for" loop it is far faster than doing the same using a "Parallel.For" loop.
GeneralRe: Why not always use this?membervarun_manipal23 Apr '12 - 1:54 
Thanks for the feedback RugbyLeague.
 
It is always important to pick up the right context to run within the parallel loops. Any operation which is dependent on the earlier iteration execution might not be the right candidate.
 
Having said that, it would be interesting to have a look at your code snippet.
Would be interesting to have a look at it and see how we can leverage the parallelism here.
 
BTW Nice machine you have there Wink | ;)
GeneralRe: Why not always use this?memberRugbyLeague23 Apr '12 - 2:01 
Code snippet here:
 

 
        static void Main(string[] args)
        {
            const int SIZE = 1000000000;
 
            Console.WriteLine("Assign");
            byte[] buffer = new byte[SIZE];
 
            Console.WriteLine("Parallel");
            Parallel.For(0, SIZE, i => buffer[i] = 255);
 
            Console.WriteLine("Traditional");
            for (int i = 0; i < SIZE; ++i)
                buffer[i] = 255;
 
            Console.WriteLine("Finished");
        }

GeneralRe: Why not always use this?memberRugbyLeague23 Apr '12 - 2:11 
Interestingly if I use the .Net stopwatch the parallel version is faster but when I use a profiler the tradional version is faster - I suppose the profiler must be adding a lot of overhead or not timing it effectively
Generalgood examples!memberdave.dolan10 Apr '12 - 6:48 
This is much better than the amorphous "hello from task 1 or 2" example everyone else writes. Now we can actually see how it works and what it's good for!
GeneralRe: good examples!membervarun_manipal11 Apr '12 - 1:38 
Thanks Dave

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 9 Apr 2012
Article Copyright 2012 by varun_manipal
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid