The Basics of Task Parallelism via C#

logicchild

4.82/5 (42 votes)

Apr 30, 2011

CPOL

5 min read

337057

An article that explains the basics of task parallel programming.

Preface

The trend towards going parallel means that .NET Framework developers should learn about the Task Parallel Library (TPL). But in general terms, data parallelism uses input data to some operation as the means to partition it into smaller pieces. The data is divvied up among the available hardware processors in order to achieve parallelism. It is then often followed by replicating and executing some independent operation across these partitions. It is also typically the same operation that is applied concurrently to the elements in the dataset.

Task parallelism takes the fact that the program is already decomposed into individual parts - statements, methods, and so on - that can be run in parallel. More to the point, task parallelism views a problem as a stream of instructions that can be broken into sequences called tasks that can execute simultaneously. For the computation to be efficient, the operations that make up the task should be largely independent of the operations taking place inside other tasks. The data-decomposition view focuses on the data required by the tasks and how it can be decomposed into distinct chunks. The computation associated with the data chunks will only be efficient if the data chunks can be operated upon relatively independently. While these two are obviously inter-dependent when deciding to go parallel, they can best be learned if both views are separated. A powerful reference about tasks and compute-bound asynchronous operations is Jeffrey Richter's book, "CLR via C#, 3rd Edition". It is a good read.

In this brief article, we will focus on some of the characteristics of the System.Threading.Tasks Task object. To perform a simple task, create a new instance of the Task class, passing in a System.Action delegate that represents the workload that you want performed as a constructor argument. You can explicitly create the Action delegate so that it refers to a named method, use an anonymous function, or use a lambda function. Once you have created an instance of Task, call the Start() method, and your Task is then passed to the task scheduler, which is responsible for assigning threads to perform the work. Here is an example code:

using System;
using System.Threading.Tasks;

public class Program {
    public static void Main() {
        // use an Action delegate and named method
        Task task1 = new Task(new Action(printMessage));
        // use an anonymous delegate
        Task task2 = new Task(delegate { printMessage() });
        // use a lambda expression and a named method
        Task task3 = new Task(() => printMessage());
        // use a lambda expression and an anonymous method
        Task task4 = new Task(() => { printMessage() });

        task1.Start();
        task2.Start();
        task3.Start();
        task4.Start();
        Console.WriteLine("Main method complete. Press <enter> to finish.");
        Console.ReadLine();
    }
    private static void printMessage() {
        Console.WriteLine("Hello, world!");
    }
}

To get the result from a task, create instances of Task<t>, where T is the data type of the result that will be produced and return an instance of that type in your Task body. To read the result, you call the Result property of the Task you have created. For example, let's say that we have a method called Sum. We can construct a Task<tresult> object, and we pass for the generic TResult argument the operation's return data type:

using System;
using System.Threading.Tasks;
public class Program {

    private static Int32 Sum(Int32 n)
    {
        Int32 sum = 0;
        for (; n > 0; n--)
        checked { sum += n; } 
        return sum;
    }

    public static void Main() {
        Task<int32> t = new Task<int32>(n => Sum((Int32)n), 1000);
        t.Start();
        t.Wait(); 

        // Get the result (the Result property internally calls Wait) 
        Console.WriteLine("The sum is: " + t.Result);   // An Int32 value
    }
}

Produces:

The sum is: 500500

If the compute-bound operation throws an unhandled exception, the exception will be swallowed, stored in a collection, and the thread pool is allowed to return to the thread pool. When the Wait method or the Result property is invoked, these members will throw a System.AggregateException object. You can use CancellationTokenSource to cancel a Task. We must rewrite our Sum method so that it accepts a CancellationToken, after which we can write the code, creating a CancellationTokenSource object.

using System;
using System.Threading;
using System.Threading.Tasks;
public class Program {

    private static Int32 Sum(CancellationToken ct, Int32 n) {
        Int32 sum = 0;
        for (; n > 0; n--) {
            ct.ThrowIfCancellationRequested();

            //Thread.Sleep(0);   // Simulate taking a long time
            checked { sum += n; }
        }
        return sum;
    }

    public static void Main() {
        CancellationTokenSource cts = new CancellationTokenSource();
        Task<int32> t = new Task<int32>(() => Sum(cts.Token, 1000), cts.Token);
        t.Start();
        cts.Cancel();

        try {
            // If the task got canceled, Result will throw an AggregateException
            Console.WriteLine("The sum is: " + t.Result);   // An Int32 value
        }
        catch (AggregateException ae) {
            ae.Handle(e => e is OperationCanceledException);
            Console.WriteLine("Sum was canceled");
        }
    }
}

outputs that the task was canceled:

Sum was canceled

There is a better way to find out when a task has completed running. When a task completes, it can start another task. Now, when the task executing Sum completes, this task will start another task (also on some thread pool thread) that displays the result. The thread that executes the code below does not block waiting for either of these two tasks to complete; the thread is allowed to execute other code or, if it is a thread pool thread itself, it can return to the pool to perform other operations. Note that the task executing Sum could complete before ContinueWith is called.

using System;
using System.Threading.Tasks;
public class Program {
    private static Int32 Sum(Int32 n)
    {
        Int32 sum = 0;
        for (; n > 0; n--)
        checked { sum += n; } 
        return sum;
    }
    public static void Main() {
        // Create Task, defer starting it, continue with another task
        Task<int32> t = new Task<int32>(n => Sum((Int32)n), 1000);
        t.Start();
        // notice the use of the Result property
        Task cwt = t.ContinueWith(task => Console.WriteLine(
                        "The sum is: " + task.Result));
        cwt.Wait();  // For the testing only
    }
}

Produces a similar result:

The sum is: 500500

Now, when the task executing Sum completes, this task will start another task (also on some thread pool thread) that displays the result. The thread that executes the code above does not block waiting for either of these two tasks to complete; the thread is allowed to execute other code, or if it is a thread pool thread, it can return to the pool to perform other operations. Note that the task executing Sum could complete before ContinueWith is called. This will not be a problem because the ContinueWith method will see that the Sum task is complete and it will immediately start the task that displays the result. Tasks also, by the way, support parent/child relationships. Examine the code below:

using System;
using System.Threading;
using System.Threading.Tasks;
public class Program {
    private static Int32 Sum(Int32 n)
    {
        Int32 sum = 0;
        for (; n > 0; n--)
        checked { sum += n; } 
        return sum;
    }

    public static void Main() {

        Task<int32[]> parent = new Task<int32[]>(() => {
            var results = new Int32[3];   // Create an array for the results

            // This tasks creates and starts 3 child tasks
            new Task(() => results[0] = Sum(100), 
                TaskCreationOptions.AttachedToParent).Start();
            new Task(() => results[1] = Sum(200), 
                TaskCreationOptions.AttachedToParent).Start();
            new Task(() => results[2] = Sum(300), 
                TaskCreationOptions.AttachedToParent).Start();

            // Returns a reference to the array
            // (even though the elements may not be initialized yet)
            return results;
        });

        // When the parent and its children have
        // run to completion, display the results
        var cwt = parent.ContinueWith(parentTask => 
                            Array.ForEach(parentTask.Result, Console.WriteLine));

        // Start the parent Task so it can start its children
        parent.Start();

        cwt.Wait(); // For testing purposes
    }
}

produces the parent/child task results:

5050
20100
45150

Internally, Task objects contain a collection of ContinueWith tasks, meaning you can actually call ContinueWith several times using a single Task object. It is important to note that Tasks do not replace threads: they run threads. When the task completes, all the ContinueWith tasks will be queued to the thread pool. Recall that when the CLR initializes, the thread pool has no threads in it. Internally, the thread pool maintains a queue of operation requests. When your application wants to perform an asynchronous operation, you call some method that appends an entry into the thread pool's queue. The thread pool's code will extract entries from this queue and dispatch the entry to a thread pool thread. If there are no threads in the thread pool, a new thread will be created. This is why when writing managed applications, you needn't actually create threads by hand. The system manages a pool per-process, and therefore the thread pool offers only static methods. To schedule a work item for execution, you make a call to the QueueUserWorkItem method passing in a WaitCallback delegate. But recall again that we can avoid the limitations of calling ThreadPool's QueueUserWorkItem by creating a Task object.

References

CLR via C#, 3rd Edition Jeffrey Richter.