Deriving from System.Threading.Tasks.Task - Tying The Knot in C#

davidbakin

4.57/5 (9 votes)

Feb 4, 2014

CPOL

14 min read

41673

320

You can subclass Task - but it is tricky. Here's how you do it by "tying the knot" - using lambdas and variable capture to implement lazy evaluation

Download source, VS2012 project, and debug and release binaries - 32.8 KB

Introduction

Perhaps you have some calculation or other operation that you want to run asynchronously, in your C# program, and you want to use a Task¹ . The expected way to do this is to just provide the task factory with the code you want to run, frequently in the form of a lambda:

    Task t = Task.Factory.StartNew( () => MyLongAsynchronousCalculation() );

That works well if the lambda is pretty much self-contained, or uses resources from the object that's creating the asynchronous activity.

But perhaps the asynchronous activity is somewhat complicated, and has an ongoing state, and, in other words, is the kind of thing you'd like to encapsulate in a class. There are two basic Object Oriented ways to approach this: Use subclassing, or use composition.

Each technique has its pros and cons in general, but in the specific case of using a Task, composition is generally preferred. No less an authority than Jon Skeet ² says

    I wouldn't personally extend Task<T>, I'd compose it instead.³

But…why? One deficiency of nearly all current object oriented languages is that they don't have first-class support for delegation; that makes subclassing the preferred approach over composition for many design problems where behavior from another class is to be used. And, also, the .NET Framework developers know proper object oriented design ⁴ and they could have made the Task class sealed, to prevent subclassing, like many other classes in the Framework—but they didn't.

As soon as you try subclassing Task you discover the major problem: Task takes the code it is to run as an Action delegate parameter to its constructor and it doesn't provide any setter for that delegate that you can use after the Task is constructed—even if you're not starting the Task right away. In the constructor you can't make that delegate refer to the Task subclass instance you're creating.

    class D : Task {
        public D() : base(??) { }
        public void Run() { … }
    }
    …
    D d = new D();

What do you put where the question marks are? You can't simply refer to D's method Run: It is not a static member so you need an object reference—and you don't have one. You can't use the this keyword there: it isn't allowed by the language. There's no good choice, so you're stuck?

Or are you? That's what this article describes: A way to properly derive from Task so that it runs a method in an instance of your subclass. The technique is borrowed from lazy functional languages ⁵ and is called "Tying The Knot."⁶ ^,⁷

Using a value before it is computed: Tying the Knot

Don't care about the theory and want to cut to the chase? Skip ahead to the next section, Tying the Knot in C#: closures and variable capture.

In a functional language all values are immutable. If the value has multiple fields they can't be modified. So how do you construct a data structure that refers to itself? This comes up in (for example) cyclic lists, or graph structures that aren't DAGs.

Consider, for example, representing rational numbers in the range [0..1) as a linked list of base 10 digits. The rational \(\frac{1}{8} = 0.125\) is finite, so that's easy. But what about \(\frac{1}{7} = 0.\overline{142857}\), which is a non-terminating repeating decimal fraction?⁹

In an imperative language it isn't hard since you can just clobber a field after it is created.

    var OneEighth = new SinglyLinkedList<int> { 1, 2, 5 };
    Console.WriteLine("1/8 = " + string.Join(",", OneEighth.Take(30).Select(e => e.ToString())));
        Console: 1/8 = 1,2,5

    var OneSeventh = new SinglyLinkedList<int> { 1, 4, 2, 8, 5, 7 };
    OneSeventh.Next.Next.Next.Next.Next.Next = OneSeventh;
    Console.WriteLine("1/7 = " + string.Join(",", OneSeventh.Take(30).Select(e => e.ToString())));
        Console: 1/7 = 1,4,2,8,5,7,1,4,2,8,5,7,1,4,2,8,5,7,1,4,2,8,5,7,1,4,2,8,5,7

In a lazy functional language (like Haskell ¹⁰ ) you must do it differently since lists are immutable. But the language has a feature called "letrec", for "let recursive" (where "let" is the binding construct in the language) that allows you refer to a variable before it has been computed as long as you don't use it!

    oneEighth = 1 : 2 : 5

    oneSeventh = let x = 1 : 4 : 2 : 8 : 5 : 7 : x
                 in  x

Here, the name x refers to the list under construction and is also used to construct the tail of the list. It works because x refers to a memory location that isn't going to be referenced until some code uses the variable oneSeventh and traverses past the 6th element of the list. (Note the difference between a value and a variable: The variable is the location that can hold a value.

That is tying the knot!

Tying the Knot in C#: closures and variable capture

Tired of reading about the darn knots? You just came here to learn how to derive from Task? Skip to the next section, Back to the problem: Subclassing the Task class and you'll be set.

Given that C# is not a lazy evaluation language¹² how is tying the knot to be implemented?

We need both delayed evaluation, and being able to bind a value after a data structure has been built. Two related language mechanisms will work together. First, to provide delayed evaluation, we'll introduce an extra level of indirection ¹³ by using a delegate—a pointer-to-method; typically, in C#, this will be written as a lambda expression. Second, to provide ex post facto binding we'll use the excellent C# implementation of variable capture—C# almost has true closures ¹⁴ —to bind a value after a data structure has been built.

The combination works like this:

Create a variable to provide a binding location for some value, but don't provide the value.

Create a lambda expression that closes over that variable, and returns the value in the variable. Since creating the lambda doesn't dereference the variable to get the value, it is fine.

Create the data structure, passing in the lambda to the creation routine, which will save it somewhere but not invoke it yet.

Store the data structure into the closed-over-variable.

Evaluate the lambda, which dereferences the variable to get the data structure, and does something with it (like store it in some field internal to itself.

To make this concrete, suppose we have a class T where the constructor takes an Action, called A, which it stores away in a readonly field, so that there is no way to (re-)set A after the instance of T is constructed. And then it has a method T.M which is called sometime later, after construction is finished, and invokes the Action A.

(And further suppose that we can't change T to provide a setter for A or anything else to fix up this situation.)

Now suppose that we subclass T with a class D and we want the Action A to run a method on itself, an instance of D. Normally our Action A would look something like this:

class D : T {
    public D(Action a) : base(a) { }
    …
        Action f = () => this.Foo();
    …
}

But that won't work because we don't have this yet, and actually we can't get a reference to our new instance D until after its base constructor, a member of T, runs and the D constructor returns.

The way to solve this is to use variable capture in creating the lambda expression:

    …
        D d = null;			// 1: Create a variable to hold a D.
        Action g = () => d.Foo();	// 2: Capture the variable in our lambda.
        d = new D(g);			// 3: Create a new D, passing in our lambda,
                                        //    and store it in the variable we captured.
        d.M();				// 4: Execute the method that is going to
                                        //    run our Action and call d.Foo().  
   …

The real trick is going to be to get an API that can accept an Action or Func instead of the object—an instance of some specific type—it is expecting. But fortunately, in our problem of deriving from Task, that problem is solved because Task takes two parameters: An Action and an arbitrary object—and we'll pass our delaying lambda in as the arbitrary object!

Back to the problem: Subclassing the Task class

Back to the Task that's left to us: How do we subclass Task?

As previously described, Task takes an Action which is the code to be run when the Task is started.¹⁵ We want it to run a method on our subclass. There are only two problems left to be solved: Where does the Action come from, and how does it get communicated to the Task?

To answer the first question: The Action, as well as the derived instance, will be created in a factory method. The constructor of our derived class will have protected access so a developer can't create one directly.

To answer the second question: It would be easy enough to have our derived class have a constructor that, in addition to its other arguments for initializing itself, took the Action and immediately passed it into its base class:

    protected D(Action a) : base(a) { … }

And then this article would be over. But I prefer to not have to write this code more than once. I would like to provide a generic abstract class to do all the work. It will derive from Task and all my various subclasses will derive from it.

As soon as I do this, however, I run into a problem: my factory method will be generic in my most-derived subclass. That's so that, as a factory, it can return an instance of that subclass (instead of a superclass—its own type—that would need to be cast to the actual subclass). But if it is generic in the type that it is creating (and returning) then it can only create instances of types that have a zero-argument constructor (due to the way the generic constraint new() works). And with that zero-argument constructor, how can the Action be passed in?

The answer is somewhat unsatisfactory: It will be passed in via a static field that the constructor can reference. And that's our last problem to solve: Ensure that the factory method is serialized so that it is safe to set the static field and then immediately create a new object that refers to that field in its constructor, so that if the factory method is called on two different threads simultaneously there are no race conditions that would lead to one of the new instances getting the Action for the other instance.

Really, this is a complication that is only necessary if you want to have a resusable abstract base class to own all the code that handles passing the Action in to the Task.

Anyway, the code for the abstract generic class DeriveFromTaskBase is in the zip archive associated with this article, so I'll only comment on highlights here.

The public API of DeriveFromTaskBase

The public API of DeriveFromTaskBase consists of the factory method Create that creates an instance and starts it, and an abstract method Run that must be overriden to provide the subclass-specific computation that is the entire purpose for subclassing Task.

The Create factory method takes an optional Action<T>, called beforeStartInitializer, which is run just before the instance is started. Its purpose is to provide a chance to initialize the instance and make up for the fact that there is only a zero-argument constructor. The Action<T> you provide will be given the instance itself and can set properties or run methods on that instance. (Remember that when you create the Action with a lambda expression you can capture any values you need at that time.) If you also, or alternatively, have things you can do at construction time (that, necessarily, can't rely on any outside inputs), you can (optionally) override the method Constructor and do that initialization.

    /// <summary>
    /// Abstract base class for classes that want to derive from Task.
    /// </summary>
    public abstract class DeriveFromTaskBase : Task
    {
        #region Public interface
        public static T Create<T>(Action<T> beforeStartInitializer = null)
            where T : DeriveFromTaskBase, new()
        {
            …
        }

        public virtual void Constructor() { }

        public abstract void Run();
        #endregion

The construction of the derived instance

The constructor is fairly simple: referring to a static field that holds the indirection Action it simply passes that Action to the base Task and then calls the (optional) Constructor method.

        private static Action thisDeferred;

        protected DeriveFromTaskBase() : base(thisDeferred)
        {
            Constructor();
        }

The factory which "ties the knot" and creates the derived instance

Interestingly, there are two knots to be tied!

The factory method grabs a lock to ensure serialization. Then it provides a location for the to-be-created instance and provides a location for the true Action. It ties the first knot by creating the indirection-Action that invokes the true Action by capturing its location. It then performs new T() to finally create the derived instance you're really looking for. And it ties the second knot by creating the true Action that captures the location of the new instance. After all that it starts the instance—and the Task calls its start Action which calls the true Action which calls the instance's Run method, and at last the Task is going!

        private static object createLock = new object();
        private static Action thisDeferred;

        private static T Create<T>(Action<T> beforeStartInitializer = null)
            where T : DeriveFromTaskBase, new()
        {
            T t = null;

            lock (createLock)
            {
                Action thisDeferredInner = null;

                thisDeferred = () => thisDeferredInner();

                t = new T();

                thisDeferredInner = () =>
                    {
                        if (null != beforeStartInitializer)
                            beforeStartInitializer(t);
                        t.Run();
                    };
            }
            t.Start();
            return t;
        }

(By now you shouldn't need comments to understand the above code…but no worries: There are comments in the sources that are in the zip.)

Article Summary

Is it worth it? Well, that depends. For the particular case in hand, utilizing a Task to run complicated code that needs ongoing state, if you were starting from scratch then it would be easiest to write a class that, intead of deriving from Task, simply owns a Task instance. That is, use composition.

But, YNK.¹⁶ Now that this class is written for you (and explained!) you may find yourself needing actual Tasks doing complicated operations for one reason or another, and the keeping track of the relationship between calculating instance and Task instance may be annoying.

Then, also, in the general case (moving away from Tasks), knowing how to tie the knot as a technique in general, and the particular way you specifically do it in C#, with lambdas and variable capture, may prove useful to you (as well as interesting.) I hope so.

Oh, one more thing. Now that you understand the previous section you should be easily able to understand the diagram at the top of this article. What's that you say? You can't? Well, it's not your fault. Fact is, it isn't a very good diagram; it's just the best I could do. Please feel free to provide me with a better diagram (let me know in a comment) and I'll be glad to put it up to replace the one I came up with and give you the credit for it.

Article Revision History

01-FEB-2014: Original article.

Footnotes

¹ That is, a System.Threading.Tasks.Task.

² Jon Skeet's C# In Depth is an excellent book, as is Jon Skeet's blog.

³ See Jon Skeet's answer on SO.

⁴ They wrote a book on it: Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries.

⁵ E.g., Haskell.

⁶ A full description of Tying the Knot. The canonical algorithm for tying the knot is repmin, which is a one pass algorithm for building a tree which is the same shape as the given tree except each leaf has the minimum leaf value from the original tree.

⁷ What's with these footnotes anyway? It's a CodeProject article on the web, not an academic paper!⁸

⁸ I dunno, I just thought it would be funny.

⁹ Here's an exercise for the reader: Given this representation of rationals in the range [0..1), write the equality function. Ensure that it finds the following two representations of the same number, \(\frac{1}{5} = 0.2 = 0.1\overline{9}\), to be equal:

¹⁰ Ibid.¹¹ 4.

¹¹ Oh boy! I have always wanted to use Ibid.!

¹² Lazy evaluation (wikipedia) means that an expression is not evaluated when it is bound to a variable, but only when it is used. Nearly all languages use strict evaluation where expressions are fully evaluated when they are bound to a variable (or procedure argument). In fact, this is why some logging libraries sometimes go out of their way to use non-language mechanisms, like preprocessor macros in C/C++, to provide efficient processing of formatted messages and their arguments: They are trying to improve performance and reduce the overhead of logging by avoiding the evaluation of message arguments unless the message's log/trace level is high enough that the log message will actually be written to some sink.

¹³ The Fundamental Theorem of Software Engineering:

    We can solve any problem by introducing an extra level of indirection..

¹⁴ There's some question about whether C# closures are "true" closures or not. There was quite a discussion on Wikipedia, and also at the programming language blog Lambda The Ultimate, with respect to the Wikipedia article on closures. Some people will accept nothing less than closures which "bind return" (that is, allow for call-with-current-continuation). Other people believe that the closures in C# 3.0 and Javascript are as close to "true" closures as makes no difference.

¹⁵ There's a second way to get data into a Task: Pass an Async State Object into it at construction time. But this has the same problem as the start Action: There are no setters that you can use to change the Async State Object after the constructor has run. You could instead provide a custom class wrapper around the instance reference (and whatever other information you want to pass in). Filled with a null, you would pass it in at construction time but keep a reference to it. Then you could fill its field with the reference to the new instance as soon as you had it, then after that, start the instance. But…by the time you did that, you might as well have done it this way.

¹⁶ YNK = You Never Know.