I frequently see discussions about asynchronous code in forums and, as a broad concept with many different implementations (that happens to be related to other abstract concepts in the same situation), it creates many divergent opinions and many unnecessary "flame wars". So, I decided to write this post hoping it clarifies some points.
Asynchronous is Not New
Stating the obvious, asynchronous programming is not something new. I will not go too far, so I will use the Windows messages as an example. Every time we use the
PostMessage function, we are dealing with asynchronous execution. That is, we post a message requesting some action to be done and, in the most common situation, this message is queued to be executed later by the same thread. It is asynchronous because the next statement after the
PostMessage is actually going to execute before that message is processed.
If you never used the
PostMessage function, it is possible that you already used things like
Control.BeginInvoke (from Windows Forms),
Dispatcher.BeginInvoke (present in WPF, Silverlight and Store Apps) and even
Control.Invalidate(), also present in Windows Forms. If you code a
Button.Click handler and call
Invalidate() on some control, that control is not redrawn immediately, being redrawn only after the
Click handler returns. Usually, you don't notice the difference but try doing
Invalidate() and then calling
Thread.Sleep(10000) and you will see that the redraw will only happen after the ten seconds, independently that the
Invalidate() call came first. This kind of situation was many times a problem when updating progress bars inside big loops, as the real update only happened after the loop ended.
Yet, most people don't see this kind of call as real asynchronous programming and it is mostly ignored on asynchronous discussions.
Asynchronous Results and Continuations
This is probably what most people think when they talk about asynchronous execution. In this case, we request an action to be executed (maybe in parallel, maybe some time in the future) and we want something to be executed after the requested action finishes, usually because some kind of result is generated and must be processed. This receives names like "continuation", "future" and many others. I will use "continuation".
Talking about the Windows API, the
PostMessage will queue a message to be processed later and the only result the function gives is to tell if the message was posted correctly or not. The
SendMessage is the most common way to send a message that must give a result, but such call will only return after the message is processed, becoming synchronous (and when one thread sends a message to another one, the first thread really waits until the second thread finishes processing the message).
Yet, by using your own messages, it is possible to use
PostMessage to request something to be executed later and still have a way to execute something after that. That is, either expect a "response message" to be sent back or give a function pointer (delegate, callback or whatever term you prefer) to be invoked when the action finishes. In this case, when two threads are involved, one may request one action to be done by the other thread and continue dealing with other stuff. When the other thread finishes, it will either post a result message to the first thread or execute the callback that deals with the result (in this case, the callback is also executed on the secondary thread, not the first).
Threads? Why threads?
My last sentence is problematic. I used two threads as an example for asynchronous execution and many developers immediately get furious if we say "thread" when talking about asynchronous execution, claiming that asynchronous code is unrelated to the use of threads. This is many times the cause of the flame wars.
When meaning the operating system threads, it is true that asynchronous code is not bound to threads. It is possible that some code is posting a message to the current thread and the handler for that message could invoke a callback or send a "finished" message back to the same thread.
Yet, there are two things to consider:
- Why would the code keep posting messages to process things later on the same thread when it is possible to process those messages immediately?
- Conceptually, what is a thread?
Answers For the Question One
The question one has many answers, so I will only focus on 2:
- The most common is that some actions are naturally asynchronous. For example, IO operations. Disks reads/writes, network sends/receives, etc. are much slower than the CPU, so the CPU can actually send the request to a device to do something and can continue processing other things. Eventually, the device will notify the CPU that it finished doing the requested job. In our application, the thread will be free to do something else. The asynchronous action will not be using an actual operating system thread to run, yet it is running in parallel with its own performance, logical steps and even physical steps... and already talking about the second answer, this parallel execution and sequence of steps is another "thread", it is simply not an operating system thread.
Another possible answer is to split long tasks into smaller ones. So, between the small tasks other messages/asynchronous tasks can be processed and it is even possible to put some kind of cancellation logic between the steps. Talking about well known environments, User32 controls, windows forms controls and even WPF controls use messages all the time and most (if not all) UI events are handled by the main thread. So, we should not block the main thread. The example I gave before of doing a
Thread.Sleep(10000) is something that we aren't supposed to do (yet my purpose was to show the thread being blocked).
If we have real actions that will take a long time to run, it is better if we allow the other messages to be processed while we do our long job. This can be achieved by using secondary threads, by explicitly processing other messages in a "nested" manner from time-to-time or by splitting a single long run in many messages that are posted to the same thread.
Actually, none of those solutions is perfect. Using other threads can introduce concurrency problems and may need to post messages back to the main thread anyway. Processing other messages in a nested manner can be very problematic if many different actions do this, as the call stack keeps growing and the actions that start first are the ones that finish last. Splitting a long job in many smaller jobs, especially when the programming language doesn't help with it, can be pretty hard as we can't use high level loops and must create our own "state machine".
Answer for Question Two
All code flows that have a start (possibly many intermediate actions) and an end can be seen as threads.
In fact, when computers had only one CPU the purpose of threads (talking about the operating system resource) was to have parallel execution flows. Of course, they had to be paused from time to time to allow other threads to run as there was only one CPU to do the job and, guess what, having to wait for an IO result is one of the ways to pause a thread.
When we deal with asynchronous execution, we can still see an execution flow happening, which happens to be "paused" between a request and the continuation. For example, to copy a file we open one file for read, we create another file, we allocate memory, we read some bytes from the first file (putting it into the allocated memory), we write those read bytes to the second file and, considering we have big files and a good strategy, we will keep a loop reading and then writing instead of putting everything in memory at once. Finally, we will close each one of the files.
Does it matter if it was synchronous and a secondary OS thread paused during the disk activity or asynchronous and only that "code flow" was paused while the thread was free to do other things?
If you believe it matters, let me explain:
Synchronous code, using an exclusive operating system thread for the entire copy task
Every time an IO operation is invoked, the request is sent to the device, the thread is put to sleep and the CPU is free to do other things, which means executing other threads if there are other threads waiting. When the result is ready, an interrupt is used to give the notification to the CPU and the operating system either wakes up the thread immediately (by executing it) or puts it into the list of threads that have work to do.
Every time an IO operation is invoked, the request is sent to the device, the current call returns and the thread is free to do other things, which means executing other messages if there are any or maybe pausing this thread and allowing other threads to run in the current CPU. When the result is ready, an interrupt is used to give the notification to the CPU and the operating system will end-up posting a message to the thread telling the IO operation finished, also waking up the thread or putting it into the list of threads that have work to do if it was sleeping.
That is, aside from some extra overhead on the second case, it is the same that's happening. But in one case we put an entire operating system thread to sleep while on the other case we use the operating system thread as if it were the CPU and we put only the "conceptual thread" to sleep. But to avoid using the same name, that conceptual thread receives another name, like "Task". And it also happens to be a cooperative kind of threading in all implementations I saw so far.
Overhead and Performance
It seems that every time I end a topic I leave something problematic behind. This time I said the asynchronous code has some extra overhead compared to the synchronous one. Yet, read any document about large scale applications and you will see that creating new threads per task is considered a no-no and that asynchronous APIs should be used whenever possible, improving memory utilisation and even performance. So, how can they have more overhead and be the best option?
The answer to this is much more tied to implementation details than it should and, even between different operating systems, many of these details are the same.
One of the most compelling arguments to use continuations and reuse a single thread to do many things instead of creating a new thread per task is that "operating system threads are expensive". To create. To destroy. To keep in memory.
So, talking about performance only, it is not important if asynchronous calls have a small overhead over synchronous ones when to use the synchronous calls we need to create a new thread. Of course that everything depends, as we may create a single thread and do millions of faster synchronous calls. In this case, talking only about performance, it is a good idea. But if we plan to keep the thread alive for only the duration of a small file copy, maybe we are losing more time creating and destroying the thread than the entire task will take to be executed with the asynchronous overhead.
At least when talking about Windows running in 32-bit, a real problem of threads is the memory consumption. It isn't even the amount of memory really used. It's the amount of memory that's reserved for the callstack.
For example, in .NET, threads have a reserved callstack of 1 MB by default. Windows doesn't allocate 1 MB of real memory immediately, so it is not a real problem to the entire computer if we only use 1 KB of that 1 MB. But it is a problem to the process itself, as the reserved 1 MB must be a continous block of memory that can't be used to allocate anything else. If we create 2 thousand threads, even if they are all sleeping and using a minimum amount of their callstack, we will reserve the entire user address space to the threads' callstacks and the application will not be able to allocate memory to do anything else.
As I said, this is an implementation detail. In some cases, we can really request smaller callstacks, but what if we need that 1 MB, even if for a single millisecond? In this case, we need the callstack to be able to grow to 1 MB, which means having 1 MB reserved from the start.
But compare this to stack objects and lists that use a single continuous block of memory. They can really grow from time to time by doing a new allocation and copying items from the small block of memory to the bigger one. They don't need to reserve all the memory from the beginning. So, why does the callstack works differently?
Mostly, historical reasons. This is how the callstack worked in the past and it is also a common practice to get pointers to items in the callstack. So, if the callstack is "resized" by actually allocating a new block of memory somewhere else, all pointers to the old address will be using invalid memory.
That is, it would be relatively simple to make the callstack work as those stack objects that do such kind of resize, but it will be a problem to make all the existing applications follow the new rules.
So, how do continuations deal with the callstack?
Continuations actually use "new rules". They don't have a callstack reserved for future calls. When an OS thread runs a continuation, the callstack of that thread is used for the synchronous calls. All the information that the continuations may require between calls uses a more modern approach, be it typed objects that are allocated or reused just before entering the "wait mode", be it smart stacks that can actually grow and shrink without having predefined "reserved" sizes.
That is, while OS threads require the memory to be reserved for the future, using a big size that's considered safe for everything, the continuations only use the amount of memory required to hold the information they really need.
Is it possible to create threads that use better stacks as the callstack?
Yes, but maybe it is not acceptable. As I already said, all the existing applications and libraries will need to follow the new rules. Mixing the 2 callstack kinds is simply not possible. Also, as an extra level of implementation detail, many processors actually use the callstack directly and they will not magically use the new callstack approach. That is, it is possible that the fast
CALL or equivalent instructions provided by the CPU will not be adequate for the new strategy, so all the "CALLs" will become slower in this situation, which is clearly a bad thing. Of course, this depends on the processor, but creating a smart solution for one processor that's a terrible solution for other processors is probably not going to become mainstream... Or maybe it will and that may become a major differentiator between "good" and "bad" processors.
Another advantage of continuations that share a thread compared to using one thread per task is the reduced context switch. The continuations are actually a kind of cooperative multi-threading while the normal OS threads are preemptive multithreading, meaning that from time to time the CPU is interrupted to check if there's another thread scheduled to run. If there is, then some more time is spent activating that thread (putting all the CPU register values back to a state needed by that thread).
Putting into numbers, consider that those frequent interruptions make a 1 second action take one second and 10 milliseconds when the actual thread switch is not required (the test happens, and that's all). If there's interruption + activation of another thread, it will take 1.1 seconds (and I am ignoring the time spent processing the other thread).
So, by these numbers only and considering a situation where 1000 external tasks end-up giving a result at the same time (very improbable, but I want to show numbers), the time to process all results with a CPU intensive work of 1 second will be something near 1100 seconds if each result is processed by a different pre-emptive thread, but only 1010 seconds if all the results are processed by the same pre-emptive thread (and probably 1001 seconds if the thread wasn't preemptive at all, but that's not possible when the OS is preemptive). Also, with many threads and a fair distribution, the first result will be ready after 1000 seconds (its like all threads advance very slowly but in "parallel") while with the single-threaded processing of the asynchronous results the first result will be available after 1 second. Of course, all other jobs will not even have being started if you are looking to the progress. Yet, each result is ready after one second from the time it really starts. That is, even if the total time is not that different (from 1010 to 1100) the average to get a result is very different (ranging from one to 1010, the average using a single thread is 505 while using many threads it ranges from a little more than 1000 to 1100... so an average of 1050).
Why are the OS threads preemptive?
Well, actually the lack of preemptiveness is a problem. For example, in a cooperative multithreading operating system and a single CPU, if one application does an infinite loop without allowing other applications to run, the computer simply hangs. There's no way to open something like the task manager to kill the application. Maybe the problem is the fact that all threads became preemptive by default (or as the only option, depending on the operating system and environment), but I am not going to discuss the threading model this time, so let's continue with the asynchronous discussion.
Problems of Asynchronous Code
"The only problem of asynchronous code is that people don't understand continuations. As soon as they understand continuations, there are no problems at all."
I heard and read this kind of argument very often. Well, I agree that things become simpler when people understand continuations, but we still have problems independently of our comprehension. So, let's see them:
Problem 1: Continuations Can Only Exist When Synchronous Results Aren't Expected
Remember my example about copying files? Let's say that the start point is a button click. The user clicks a button, and we start the copy operation. If we have at least one asynchronous continuation, that means that the Click handler will actually return and the continuation will execute sometime after the Click handler finished. This is fine because the Button's Click doesn't expect any results.
Now, what happens if you are implementing an interface and the interface methods give results? And I am not talking about
Task<T> results from .NET or anything similar available on other languages/environments. I mean real results, be it an
string or a
CustomerRecord. Well, at that moment, you can't start a new asynchronous execution anymore. In some cases, it is possible to fake it, like starting the asynchronous operation and synchronously waiting, but this is not always possible (like an asynchronous execution that will start in the same thread, after the current method returns, case that would dead-lock if you try to wait synchronously) and it definitely kills the purpose of the asynchronous execution, which is to allow the thread to be free to do something else.
Of course, some people can still argue that the interface is wrong but let's face it: 90% of the interfaces aren't written to be asynchronous. Yet, many interfaces may require asynchronous actions if, for example, they are used for remote communication. Also, making absolutely all methods on all interfaces have asynchronous signatures is definitely a performance killer and will probably over-complicate the majority of implementations.
Problem 2: Resource Ownership
The first problem is related to a synchronous caller that expects a result. So, if the situation is different and the caller doesn't expect a result, is it safe to start an asynchronous operation?
In .NET, we have this situation very often when we implement events. Most events aren't expected to generate any kind of result (that is, they are
void returning and even the event
args object doesn't have any property used as the result). So, it is safe to start the asynchronous action, right?
The right answer is "not always". If you receive any input parameter that can be modified or freed, then the continuation should not use such a parameter. In some cases, it is possible to copy the needed data while preparing the continuation. In some cases it is not.
Problem 3: "Cooperative multi-threading"
If one continuation doesn't allow other continuations (or even main messages) to run, they will never run. This is the same problem faced by cooperative multi-threading, which pre-emptive OS threads have solved. Continuations that run on the same thread brought it back.
In many cases, things will work fine because many continuations do small jobs and then finish or schedule another continuation, yet it can be a problem by the same reasons it was to real cooperative multi-threading, as some big tasks that are CPU intensive can take minutes, hours or even days to finish (or may never finish) and the other continuations will need to wait to run. This is even worse because of the next problem.
Problem 4: Maybe Synchronous Execution
For the following example, I am using C# code. Consider that
imageProvider is of an interface type and may have different implementations, the
_stopRequested is a boolean variable that will be set to
true when the Cancel button is pressed and that this code is running on the main thread.
var image = await imageProvider.TryLoadNextImage();
if (image == null || _stopRequested)
As there's an
await, many developers may think the main thread will be free to process other messages, including the Cancel button's
Click. Yet, it is possible that an implementation always returns new images synchronously (like a counter, for example).
If that's the case, the application will hang when executing this loop. This happens because
await will not allow other messages to be processed when the asynchronous method returns synchronously. This is actually a performance optimization to avoid the excessive message postings when some actions can finish synchronously. But in this case all continuations will be synchronous, so the main thread will never be able to process the
Cancel button click or any other message.
I know this is a contrived example and using C# but it shows a problem that may happen in many different situations, including asynchronous solutions that exist in other programming languages, as most of them try to avoid posting messages when it is not necessary. Sometimes this problem will not be noticed, sometimes it will simply make the application less responsive and some other times it will completely hang the application.
Problem 5: Threads!
Continuations don't require new threads to be used but that doesn't mean they aren't used at all. Some APIs may actually use a secondary thread to do a big job and they will invoke the continuation from that secondary thread. Sometimes, there's no way to send a message back to the original thread (it wasn't a "message based" thread, for example), so some other thread may be used to execute the continuation.
So, all the problems that exist when multiple threads are involved may still be present when dealing with asynchronous execution. And that's not all, there's an extra twist: Each continuation may be running on a different thread. That means that any thread specific data may be lost, which can affect transaction, security and lots of other things.
For example, imagine a method that does this: Impersonates an account, copies all the files available to the impersonated account and restores the previous impersonation.
If this method is synchronous and the users don't want the application to hang, they will need to create a new thread to invoke this method. Everything will work fine.
If this method is asynchronous? I will give a .NET example again. Methods that use await naturally "capture the calling context". This means that when running this method from the main thread, the continuations will run in the main thread and everything would be fine. Yet the same context doesn't always mean the same thread. When invoking this method from the thread pool or any thread that doesn't have a context, the continuations will execute in any thread from the thread pool, which is not necessarily the same thread that did the call even when this code started on a thread poll thread. Considering the impersonation is per OS thread, bad things will happen.
Considering all the points I've shown here, I believe there are three conclusions that we can draw.
- Asynchronous execution is always tied to threads, at least on the conceptual level of the word.
- Asynchronous execution increases the complexity of applications, even when we understand how they work (but are definitely a source of bugs when we don't).
- Asynchronous execution is needed and gives many advantages when used properly.