To those who have already read the article
If you have already read the article and only want to know what's new, click here. The second update is here.
I already wrote the article Yield Return Could Be Better and I must say that
async/await could be better if a stack-saving mechanism was implemented to do real cooperative threading. I am not saying that the
async/await is a bad thing, but it could be added without compiler changes (enabling any .NET compiler to use it)
and maybe adding keywords to make its usage explicit. Different from the other time, I will not only talk about the advantages, I will provide a sample implementation of a stacksaver and show its benefits.
Understanding the async/await pair
The async/await was planned for .NET 5 but it is already available in the 4.5 CTP. Its promise is to make asynchronous code easier to write, which it indeed does.
But my problem with it is: Why do people want to use the asynchronous pattern to begin with?
The main reason is: To keep the UI responsive.
We can already maintain the UI responsive using secondary threads. So, what's the real difference?
Well, let's see this pseudo-code:
using(var reader = ExecuteReader())
Very simple, a reader is created and while there are records, they are added to a listbox. But imagine that it has 60 records, and that each
ReadRecord takes one second to complete. If you put that code in the Click of a Button, your UI will freeze for an entire minute.
If you put that code in a secondary thread, you will have problems when adding the items to the listbox, so you will need to use something
listbox.Dispatcher.Invoke to really update the listbox.
With the new
await keyword, your method will need to be marked as
async and you will need to change the
while line, like this:
And your UI will be responsible.
Your UI became responsible by a simple call to
And what's that
Well, here is where the complexity really lives. The
await is, in fact, registering a continuation and then allowing the actual method
to finish immediately (in the case of a Button Click, the thread is free to further process UI messages). Everything that comes
will be stored in another method, and any data used before and after the
await keyword will live in another class created by the compiler and passed
as a parameter to that continuation.
Then there is the implementation of
ReadRecordAsync. This one may be considered the hardest part, as it may use some kind of real asynchronous
completion (like IO completion ports of the Operating System) or it will still use a secondary thread, like a
If it still uses secondary threads, you may wonder how it is going to be faster than a normal secondary thread.
Well... it is not going to be faster, it may be a little slower as it by default needs to send a message back to the UI thread when the process is completed.
But if you are going to update the UI, you will already need to do that.
Some speed advantage may reside on the fact that the actual thread may already start something else (instead of waiting doing nothing) and also on the
ThreadPool usually used by the
Tasks, which forbids too many concurrent work items. That is, some work items need to end so new work
Tasks) can start. With normal threads, we may risk having too many threads trying to run at once (much more than the real processor count),
when it will be faster to let some threads simply wait to start (and also too many threads occupy too many OS resources).
Noticing the obvious
Independent of the benefits of the
ThreadPool and the ease of use of the
async keyword, did you notice that when you put an
in a method the actual thread is free to do another job (like processing further UI messages)?
And that at some point such
await will receive a result and continue? With that you can very easily start
five different jobs. Each one, at the end,
will continue running on the same thread (probably the UI).
It is not hard to see those jobs as "slim" threads. As a
Job, they start, they "block" awaiting, and they continue. The real
thread can do other things in the "blocking" part, but the same already happens with the CPU when a real thread enters a blocking state (the CPU continues
doing other things while the thread is blocked).
Such Jobs don't necessarily have priorities, they run as a simple queue in their manager thread but every time they finish or enter in a "wait state",
they allow the next job to run.
So, they will all run in the same real thread, and one Job must await or finish to allow others to run. That's cooperative threading.
It could be better
I said at the beginning that it could be better so, how?
Well, real cooperative threads will do the same as the
await keyword, but without the
await keyword, without returning a
Task, and consequently
making the code more prepared to future changes.
You may think that code using
await is prepared for future changes, but do you remember my pseudo-code?
using(var reader = ExecuteReader())
Imagine that you update it to use the
await keyword. At this moment, only the
ReadRecord method is asynchronous, so the code ends-up like this:
using(var reader = ExecuteReader())
But in the future, the
ExecuteReader method (which is almost instantaneous today) may take 5 seconds to respond. What do I do then?
I should create an
ExecuteReaderAsync that will return a
Task and should replace all the calls to
await ExecuteReaderAsync(). That will be a giant breaking change.
Wouldn't it be better if the
ExecuteReader itself was able to tell "I am going to sit and wait, so let another job run in my place"?
Pausing and resuming a Job
Here is where all the problems are concentrated and here is the reason
await keyword exists. Well, I think people at Microsoft got so fascinated
that they could change the compiler to manage secondary callstacks using objects and delegates (effectively creating the continuation) that they forgot they can
create a full new callstack and replace it.
If you don't know what the callstack is, you may have already seen it in the debugger window. It keeps track of all methods that are actually executing and all
variables. If method A calls method B, which then calls method C, it will have the exact position in method C, the position it will be when C returns, and also the
position to return to A when B returns.
A continuation is the hard version of this. In fact, simply continuing with another method is easy, the problem is creating a
try/catch block in method
A and putting a continuation to B that is still in the same
try/catch. In fact the compiler will create an entire
try/catch in method A and
in method B, both executing the same code in the
catch (probably with an additional method to be reutilized by the catch code).
If instead of managing a "secondary callstack" in a continuation they created a completely new callstack and replaced the thread callstack by the new and,
at wait points, restored the original callstack, it will be much simpler as all the code that uses the callstack will continue to use it. No additional methods or different
control flows to deal with
Such an alternative callstack is what I call a
StackSaver in the other article but my original idea was misleading. It does not need to save and restore
part of the callstack. It is a completely separate callstack that can be be used in place of the normal callstack (and will restore the original callstack in waits or
as its last action). It will be a "single pointer" change to do all the job (or even a single CPU register change).
Good theory, but it will not work
The .NET team did a lot of changes to support the "compiler magic" to make the async work, and I tell that if we can simply create new callstacks,
we can have the same benefits with an even easier to use and more main tenable code, and that all we need is to be able to switch from one callstack to another.
That looks too simple and maybe you think that I am missing something, even if you don't know what, and so you believe it will not work.
Well, that's why I created my simulation of a
StackSaver to prove that it works.
My simulation uses full threads to store the callstack, after all there is no way to switch from one callstack to another at the moment. But this is a simulation,
and it will prove my point.
Even being full threads, I am not simply letting them run in parallel as that will have all the problems related to concurrency (and will be normal threading).
StackSaver class is fully synchronized to its main thread, so only one runs at a time.
This will give the sensation of:
- Calling the
StackSaver.Execute to start executing the other callstack "in the actual thread";
- When the action running in the
StackSaver ends or calls
StackSaver.YieldReturn, the control goes back to the original callstack.
The only big difference of my
StackSaver is that anything that uses the
Thread identify (like WPF) will notice that it is another thread.
So it is not a real replacement but works for my simulation purposes and already allows to create a
yield return replacement without any compiler tricks.
You didn't see wrong, I am not committing an error, by default the
StackSaver allows for a
yield return replacement, not for an
Doing the async/await replacement with the StackSaver
To use the
StackSaver as an
async/await replacement, we must have a thread that deals with one or more
I am calling the class that creates such a thread as
It runs like an eternal loop. If there are no jobs, it waits (real thread waiting, no job waiting). If there are one or more
Jobs, it dequeues a
and makes it run. As soon as it returns (by a
yield return or by finishing) and the original caller regains execution, it checks if it should put the
Job again in
the queue (as the last one) or not.
The only problem then is to wait for something. When the
Job request a "blocking" operation, it must create a
CooperativeWaitEvent, will set-up how the
async part of the job really works (maybe using the
ThreadPool, maybe using IO completion ports), will mark itself as waiting, and will
The main callstack, after seeing the
Job is waiting, will not put it in the execution queue again. But when the real operation ends and "Sets" the wait event,
it will requeue the job.
It is simple as that and here is the entire code of the
public sealed class CooperativeJobManager:
private readonly HashSet<CooperativeJob> _allTasks = new HashSet<CooperativeJob>();
internal readonly Queue<CooperativeJob> _queuedTasks = new Queue<CooperativeJob>();
internal bool _waiting;
private bool _wasDisposed;
var thread = new Thread(_RunAll);
public void Dispose()
_wasDisposed = true;
public bool WasDisposed
private void _RunAll()
CooperativeJob task = null;
if (_queuedTasks.Count == 0)
if (task == null)
if (_wasDisposed && _allTasks.Count == 0)
_waiting = true;
while (_queuedTasks.Count == 0);
if (task != null)
if (_queuedTasks.Count != 0)
_waiting = false;
task = _queuedTasks.Dequeue();
CooperativeJob._current = task;
if (!task._Continue() || task._waiting)
task = null;
public CooperativeJob Run(Action action)
if (action == null)
throw new ArgumentNullException("action");
var result = new CooperativeJob(this);
var stackSaver = new StackSaver(() => _Run(result, action));
result._stackSaver = stackSaver;
private void _Run(CooperativeJob task, Action action)
CooperativeJob._current = task;
CooperativeJob._current = null;
With it, you can call
Run passing an Action and that action will start as a
If the action never calls a
CooperativeJob.YieldReturn or some cooperative blocking call, it will effectively execute the action directly.
If the action does some kind of yield or cooperative wait, then another job can run in its thread.
Now imagine this in your old Windows Forms application. At each UI event, you
CooperativeJobManager.Run to execute the real code.
In those codes, any operation that may block (like accessing databases, files, or even Sleeps) allows another job to run. And that's all, you have full
asynchronous code that does not have the complication of multi-threading and really looks like synchronous code.
The source for download is done in .NET 3.5 and I am sure it may work even under .NET 1.0 (may require some changes).
The real missing thing is the
StackSaver class which, as I already told
you, uses real threads in this implementation, so it is more useful for demonstration purposes only.
Advantages of cooperative threading over async/await done by the compiler
- Will be available to use any .NET compiler if it is in a class like the one presented here.
- You will not cause a breaking change if one method that today does not "block" starts to "block" in the future.
- You will not have an easier continuation style, because you can simple avoid it. In any place you need a continuation, create a new
may "block" without affecting your thread responsiveness.
- The callstack will be used normally, avoiding a CPUu register used to store a reference to the "state" and another one already
used by the callstack, which should make things a little faster.
- By having the callstack there, it will be easier to debug.
Advantages of the async/await done by the compiler over cooperative threading
I can only see one. It is explicit, so users can't say they faced an asynchronous problem when they did synchronous code.
But that can be easily solved in cooperative threading by flags that will effectively tell the
CooperativeJob that it cannot
"block", raising an exception if a "blocking" call is done. It is certainly easier to make an area as "must not run other jobs here"
than to have to await 10 times to do 10 different reads or writes.
Blocking versus "Blocking"
From my writing, you may notice that a "blocking" call is not the same as a blocking call.
A "blocking" call blocks the actual job but lets the thread run freely. A real blocking call blocks the thread and, when it returns, it continues running the same job.
Surely it may be problematic if we have a framework full of blocking and "blocking" calls. But Microsoft is already reinventing everything
with Metro (and even Silverlight has a network API that is asynchronous only).
So, why not replace all thread-blocking calls with job-blocking calls and make programming async software as easy as normal blocking software?
Did you like the idea?
Then ask Microsoft to add real cooperative threading through a stack-saver by clicking
link and then voting for it.
I only did a very simple sample to show the difference of a real thread-blocking call versus a job-blocking call.
I am surely missing better samples and maybe I will add them later. Do not let the simplicity of the sample kill the real potential of the callstack "switching" mechanism,
which can make better versions of asynchronous code,
yield return, and also open a lot of new scenarios for cooperative programming, making it easier to write
more isolated code that can both scale and be prepared for future improvement without breaking changes.
POLAR - The first implementation of a StackSaver
I am finally presenting the first version of a
StackSaver for .NET itself (even if it is a simulation) but this is not the first time
I show a working version of the concept. I already presented it working in my POLAR language.
The language is still an hybrid between compilation and interpretation, but it uses the
stacksaver as a real callstack replacement
and it will be relatively easy to implement asynchronous calls to it using the
Job concept instead of the
I don't have a date for it as I am doing too many things at the time (like still adapting to a new country), but I can guarantee that it could be capable
of working with such
Jobs without even knowing how to deal with secondary threads.
Coroutines and Fibers
When I started writing this article, I didn't really know what coroutines where and I had no idea what a fiber was.
Well, at this moment I am really considering renaming my
StackSaver class to Coroutine, as that is what it is really providing.
And Fibers are OS resources that allow to save the callstack and jump to another one and is the resource needed to create coroutines.
I did try to implement the
StackSaver class using Fibers through P/Invoke but unfortunately unmanaged Fibers don't really work in .NET.
I really think that it is related to garbage collection, after all when searching for root objects, .NET will not see the "alternative callstacks"
created by unmanaged fibers and will collect objects that are still alive, but unseen.
Either way, at this moment I will keep the name StackSaver and "Jobs", as this is similar to task but does not cause trouble with
Update - Trying to explain better
From the comments I understand that I did not give the best explanation and people
are getting confused by my claims.
If you see the source code of the
StackSaver, you will see threads that block. So don't see the code of the
StackSaver. See its idea:
You create a
StackSaver with a delegate. When you call
stacksaver.Execute, it will execute that delegate until it ends or until it finds
When yielding, the original caller returns to its execution, and when it calls
Execute again, the statement of the delegate just after
YieldReturn will continue. This will generate the exact same effect of the
yield return used by enumerators.
async/await replacement is based on a kind of "scheduler" that I call
CooperativeJobManager. That scheduler is able to wait
if it has 0 jobs scheduled or runs one job after the other when there are jobs scheduled.
The only thing missing by default is the the capacity to "unschedule" a job while it is waiting and to reschedule it again when the asynchronous part gets a result.
That is done by marking the job as waiting and "yield returning". The scheduler then does not reschedule that job immediately, but when the "wait event" is
the job is scheduled again.
If the scheduler was using the
ThreadPool, it will have the same
async/await in the sense that after awaiting, the job may be continued
by another thread.
If that is still not enough to understand, I am already considering creating
a C++ version of the code that does not use
Threads in the
But the rest of the code (that uses the
StackSaver) will be the same... and I am not sure if C++ code will really help get the idea.
A better example on why my proposed approach is more prepared for future changes
I said that my approach is more prepared for future changes but the examples where too abstract. That may be one of the reasons for confusion. So, let's focus on something more real.
Let's imagine a very simple interface for getting
ImageSources. The interface has a
Get method that receives a filename. Very simple, but let's see two completely different implementations. One loads all the images on startup, so when asking for an image, it is always there and returns immediately.
The other always loads images when asked to. It does not try to do any caching.
Now, let's imagine that when I click a button, I get all the images (let's say there are 100s of them), generate the thumbnails for all of them in a single image, and then save them. Here comes the problem with asynchronous code: How can the interface return an
ImageSource if the image loading is asynchronous?
The answer is: The interface can't return an
ImageSource. It should return a
In the end, with the
Task based asynchronous code, we will:
- Create 100
Tasks, even when using the implementation that has all images in memory.
- One extra task will be created for the method that generates the thumbnails.
- Finally, when saving, an extra task is generated for the file save (even if we don't use it, but the asynchronous
Write will create it).
- In fact, there are some more tasks, as the opening and reading are two different asynchronous things, like creating and writing to the files.
As you can see, there are a lot of tasks created here, even when the implementation has all things in memory.
It is possible to store the tasks themselves in the cache (and that will avoid some of the async magic) but we will still have a higher overhead when reading
the results from the cache that has everything in memory.
With my proposed "job synchronous/thread asynchronous code":
- One job is created to execute all the code.
- The 100 image gets will not "block" with the cache that has all images already loaded, or they will block 100 times the
Thread, when loading the images with the other implementation.
- After getting or loading all images with "synchronous" semantics, it will execute the thumbnail generation normally, and then saves the images,
- Then by ending the method, the job ends.
Total jobs? 1. If we use the implementation that has all images in memory, we will have faster code because we will receive
ImageSources as results, not
Tasks to then get their results.
Still think that Task based asynchronity is better?
If you believe that
Task based asynchrony will be better because
it may use secondary threads if needed, then think again, as
based asynchrony can too. The secondary threads, if any, are used by the real asynchronous code (when loading or reading a file, IO completion ports can be used).
After the asynchronous action ends, it should ask the continuation
Task to be scheduled (with a
Job, it will be rescheduled).
If the image loading itself may use hardware acceleration to convert bytes into image representation and so is returning a
Task too, well,
Job can also start that hardware asynchronous code and be put to sleep, returning its execution when the generated image is ready.
All the advantages of the
Task based approach that I can resume as can be continued later, be it on the same thread or on another thread
are there. Most part of the disadvantages (like you getting lost when a
[ThreadStatic] value is not there anymore) are present too. But all your methods can continue to return
the right value types (not
With my proposed solution, if some code may end-up calling synchronous or asynchronous code (like the interface that may return images directly or load them)
you don't need to generate extra
Tasks only to be sure that it will work when the code is asynchronous. Simply let the
Job block and be rescheduled later.
I hope it makes more sense now.
Update 2 - Discussion with Eugene Sadovoi
After a lot of talk with Eugene Sadovoi, I am sure I am not clear enough. So, for those who are still lost, I am sorry. I really tried to omit some things trying to make the article shorter and easier to read, but apparently I did the opposite.
And, for those who simply want more info, I will try to give it now. So, some new "viewpoints" on the matter:
- What differentiates a Task from a Job.
- Under the hood, what really changes.
- What changes for users?
Tasks versus Jobs... or, may I say... Jobs == Tasks
Not only the words Job and Task may have the same meaning, they are effectively the same. In all my article, I tried to use the word Job to represent a cooperative Job,
while a Task represents .NET classes (
But the only thing that is really needed for a
Task to become a
Job is the possibility to "pause" at any moment. With the
keyword, we can only pause the actual method if it returns a
Task. We can't pause the caller of the actual method.
await was capable of pausing the actual
Task, be it the
Task returned by this method, the
that called this method directly, or the
Task that called an unknown number of methods before reaching the actual method, the
will be a Job and
await will really represent a "make the actual Task/Job wait and let the actual thread do something else".
Under the hood
So, all my article is in fact "Under the Hood". How we can make the actual
Task be paused at any moment?
Task is an implementation detail. What users want is to use the
await keyword... and, when using it,
they really want to say: While waiting for this result, allow the actual thread to do something else.
With the actual compiler implementation, it is impossible for a method to return
void and make the caller
Task await. They make the actual
task "return a continuation to continue later". I think that is too much implementation details, users don't want that.
With cooperative threading, which is in fact based on some kind of stack saving/switching mechanism, we can really make a
Task wait at any moment.
It is not required to register a continuation and return all methods on the call stack (and those, to register continuations if needed). It can simple say: "await now,
independent of how many things I have on the callstack" and then have the continuation code as the next instruction. That affects a lot
of the other methods (the callers) not the actual method.
Finally, what changes for users?
Tasks are not created for any method that may
await. They are created at "keypoints" only.
For a WPF or Windows Forms application, that means that every "UI event" must create a
so it can
await at any moment.
As long as you don't need parallel execution, you simply write synchronous code that will work with asynchronous sub-methods. But when you really want parallel
execution, you create
Tasks over the calling methods (that will become delegates) and use things like
OK... let's compare
Maintenance - My solution works as any blocking code and does not require changes if in the future an inner method starts to block.
Learning curve - As you don't really change the code, it is easy to learn.
Speed - Considering a Job can be a "pausable Task", all optimizations done to tasks can be done to jobs.
Speed 2 - Considering the state machine (used by the actual Task implementation), you always have a little "cost" to return to the exact
same position in the method, which is a fixed time with the stack-saving mechanism (and I am not even sure if there aren't optimizations or specific CPU commands to save the stack/registers).
Memory - My approach may use more memory for the callstack, but it may end-up allocating much less Task objects, so it may even end-up using less memory. I will consider here
the actual implementation and mine are equivalent, no one is really better.
Context switches - As happens with any await use, will only happen when the
Operating System re-schedules the actual thread with another real thread (that is unavoidable)
or when the actual "Task/Job" yields or enters some await state.
To compiler developers - It will not require other compilers to change as the Tasks will be a "pausable" and "awaitable" class. As there
are no compiler tricks,
there is no chance of one compiler generating a better state machine than other. There is no chance of one compiler supporting it and other compilers not.
Also, with the exact same implementation, all errors that users may face will be the same independent of the used compiler. With the compiler based trick,
it is possible that some compilers have one kind of issue, while other compilers have others.