I wrote two C# articles talking about things that could be better in C# (Yield Return Could Be Better and Async/Await Could Be Better) and even if they aren't directly related to threading issues, they could've been implemented by better threading options, so why not write another article?
Today Threading - Cooperative or Preemptive
At this moment I think that all major operating systems are preemptive. That is, if one thread enters in an infinite loop, on a single core computer, other threads will be able to run.
Users may notice a slowdown on single core computers, but they can continue to use other programs or even call the task manager to kill an unresponsive application.
Before the use of preemptive threading there was the cooperative threading (and it still exists in many different forms). In it, a thread only pauses its execution to let others thread run when the running thread itself asks to do so, which means that an infinite loop will make the entire computer unresponsive in a single-core computer.
By this simple fact, it is obvious that preemptive threading is better.
Today the number of cores is increasing and, instead of trying to use more and more threads, all languages are apparently creating solutions to reduce the number of threads. So, why is that happening?
I can see three major reasons for this:
- Memory consuption
- Performance loss due to context switches
- Performance loss due to synchronization
At least in .NET, each new thread starts with a callstack size of 1mb. So, on a 2gb computer we aren't able to create more than 2 thousands threads without occupying all the computer's memory. And we must remember that there are always other programs running and that programs use memory for a lot of other things.
Performance loss due to context switches
Changing from one thread to another is a context switch. Even if it is fast, excessive context switches make things slow.
But even if context switches consumed zero time, they can still make things seem slower.
Imagine that you start 10 threads in parallel that take 1 second to finish (really using CPU for one second, not by simple waiting). If each thread runs 1 millisecond and then is switched off, in a single core computer, we will only see the first result after almost 10 seconds.
If we simple avoided the context switching completely, after 1 second we will see the first result, after one more second we will see the second result... and only the last will take 10 seconds.
For the user, this is surely better. In fact, if each thread was responding to a different user, we will have a happy user (only 1 second waiting) and an unhappy user (10 seconds waiting) and all other in the middle. With the preemptive approach, all of them wait 10 seconds (so, all are unhappy).
But the OS doesn't let that happen because it simple does not know that the threads will take 1 second each. After all, what will happen if the first thread takes 2 hours to finish? It is better for only one user to take 2 hours, and all other 10 seconds, than to make everyone wait 2 hours.
Performance loss due to synchronization
Here is where we see the real performance loss on multi-core computers.
Context switching by its own already has a price, be it on a single core on a multi-core computer, but thread-synchronization has much more impact on multi-core computers.
By default, when two or more threads want to work over the same data (considering the code is well done) they use locks. That means 2 things:
- All threads except the one that gains the lock must enter a wait mode. Considering there are other threads to run there will be a context switch (that has performance price). Even if spinlocks are used, that means a small time used to wait.
- CPU caches are flushed. So, even if one thread just comes after another one in the same CPU, as the CPUs are unsure if their data is up-to-date, they will request data from the main-memory again (I am unsure on what kinds of optimizations exist to avoid excessive cache clearing, but there is always some).
What that means?
Having 1000 threads is a memory problem independent of the other problems. This makes many people say that too many threads don't scale because of memory consumption, even if they are kept waiting. But in fact that is the opposite, the more memory you have, the more threads you can keep in memory. So, they scale with memory. They actually don't scale by the performance losses.
Then, the performance losses caused by the context-switching and the synchronizations means that doubling the number of CPUs will not double the performance. Depending on the time spent waiting to obtain locks, that creates a limit. After a certain point, you can put more CPUs to process more threads in parallel, but those CPUs will spend most of their time waiting to acquire locks, so the overall performance may be the same (or even decrease in extreme cases). That's why too many threads don't scale.
How could it be better?
The .NET is already trying to make things better with the ThreadPool and the Task Parallel Library.
Considering a perfectly tunned ThreadPool, we will only have one Thread per CPU. Those threads then will run all the work items they receive, one after the other (each thread runs in parallel, but extra items are simple queued and run as soon as another item is finished). Then, to avoid keeping a CPU waiting for a blocking operation without taking a new work item there are the upcoming async/await keywords, that will cause the actual Task to "pause" and let another work item take its place.
So, while there are items for the CPUs to process, they never stop and they don't switch contexts except if they need to "block" (at least not inside the process).
So, it is already better, isn't it?
I just said that it could be better and explained why it is better (and becoming even better). So, what's the purpose?
Well, here is where I really want to start.
Theory - How the OS could have better threading
If we look at what is happening, the OS creates preemptive threads, always, and the environements, like .NET, try to reutilise those Threads to do a lot of different works.
Those works, to each other, are a kind of cooperative threading. So, why not have both?
To avoid one program making the computer unresponsible, make the programs (processes) preemptives. But, by default, threads inside an application should use cooperative threading.
Note, I am not saying that doing this change should be applied for existing applications.
And I also say by default because there are situations when we must make a thread fully pre-emptive. But those should be the exceptions, not the rules.
With a simple thing like this, we will solve all the problems related to excessive context switches.
You have 1000 threads? Ok, you have the memory consuption of 1000 threads, but if you have only four CPUs, only four threads will be scheduled. If they don't enter in a blocking situation or explicity tell that they want to pause, they will never be paused related to other threads in their process.
As they will never pause in unknown moments this will also reduce some problems due to synchronization. I say some because different CPUs may run threads in parallel that share data, but as soon a thread acquires a lock it will never be unscheduled at a random moment, maybe litting another thread run that will try to acquire the same lock, maybe spin a little, and then be unscheduled to let another thread continue (effectively causing two context switches + a spin delay with no progress).
Memory Consuption - I don't know if windows by default uses a 1mb per Thread or if it is a .NET issue. But I really think that starting with a smaller callstack and dinamically allocating more if needed will make much more threads available to run.
Excessive Synchronization - Simple making Threads non-preemptive and processes pre-emptive already reduces a lot of problems. But good programmers usually know where the biggest issues with synchronization are... unfortunately, they can't tell the OS about that. So, a soluation to avoid synchronization problems will be to tell the OS which threads share data constantly. I will call that a ThreadGroup. The basic idea is: Threads of a ThreadGroup can't be scheduled in parallel. This will avoid the need for locks in many cases and, if they are needed (like when acquiring a lock and then calling a blocking operation) they could be single-threaded locks, which avoid clearing the CPU caches.
Cooperative threading also has a big advantage that is usually overlooked: Control.
For example, when I saw the documentation of fibers I got irritated. They consider fiber to be "slim threads" that don't have a good scheduling mechanism.
But that's not true. If threads only start or continue when we ask we can very easily create Coroutines or simple enumerators that run as another thread fully dependent of the "creator" thread.
So, if the OS starts to use threads the way I am saying, we could:
- Ignore the existence of asynchronous operations. We can simple create new Threads for everything, as by default threads will only cause a context switch if they block or voluntarely yield. In fact, Threads will be like Tasks in .NET.
- We could make things like the "yield return" without compiler tricks and making yielding from inner methods much easily.
What Am I Trying To Achieve?
Nothing really. I simple wanted to show a viewpoint.
I don't expect windows to simple change how it works and that will really cause a lot of problems to existing applications. But, if the option existed for new applications, I will really like to use it. In my applications I never let a Thread enter in an infinite loop. If it takes too many time to return, without a blocking call, I will be fully capable of asking to create a pre-emptive thread.
But in most cases I create secondary threads for long-running operations that are related to long waits, like network ones. With such threading mode, things will be simpler. Being able to create ThreadGroups and also being able to store ThreadGroup variables will really allow me (and many users) to create new threads for long database or network operations and completely avoid synchronization issues. That will be, again, a kind of asynchronous programming, without the problems linked to asynchronous programming.
When it will happen? I hope that in a near future but I am not sure if it will ever happen to an Windows compatible OS.