The Task Scheduler schedules and coordinates tasks at run time. A task is a unit of work that performs a specific job. The Task Scheduler manages the details that are related to efficiently scheduling tasks on computers that have multiple computing resources.
Windows OS provides a preemptive kernel-mode scheduler, it’s a round-robin, priority-based mechanism that gives every task exclusive access to a computing resource for a given time period, and then switches to another task. Although this mechanism provides fairness (every thread makes forward progress), it comes at some cost of efficiency. For example, many computation-intensive algorithms do not require fairness. Instead, it is important that related tasks finish in the least overall time. Cooperative scheduling enables an application to more efficiently schedule work.
Cooperative scheduling is a mechanism that gives every task exclusive access to a computing resource until the task finishes or until the task yields its access to the resource.
The user-mode cooperative scheduler enables application code to make its own scheduling decisions. Because cooperative scheduling enables many scheduling decisions to be made by the application, it reduces much of the overhead that is associated with kernel-mode synchronization.
The Concurrency Runtime uses cooperative scheduling together with the preemptive scheduler of the operating system to achieve maximum usage of processing resources.
In this article, we will discuss the Task Scheduler design and examine how it works internally. To do so, CppDepend is used to analyze the concurrency runtime code source.
The concurrency runtime provides the interface
Scheduler to implement a specific scheduler adapted to application needs.
Let’s discover classes implementing this interface:
The concurrency runtime provides two implementations of the scheduler,
UMSThreadScheduler, and as shown in the dependency graph the
SchedulerBase contains all common behavior of these two classes.
Is the Scheduler flexible? A good indicator of flexibility is to search for all
abstract classes used by the scheduler.
As shown in the dependency graph, the Scheduler uses many
abstract classes, it enforces low coupling, and makes the scheduler more flexible, so adapting it to other needs will be easy.
To explain the role of each
abstract class used by the scheduler, we will discuss its responsibilities.
There are three major responsibilities assigned to the Task Scheduler:
1. Getting Resources (Processors, Cores, Memory)
When the scheduler is created, it asks for resources from runtime resource manager as explained in this article.
The scheduler communicates with the resource manager using
Resources given by resource manager use scheduler policy to allocate resources to the scheduler.
The policy as shown in this dependency graph is assigned when the scheduler is created.
The concurrency runtime creates a default scheduler if no scheduler exists by invoking
GetDefaultScheduler method, and a default policy is used.
The Task Scheduler enables applications to use one or more scheduler instances to schedule work, and an application can invoke
Scheduler::Create to add another scheduler that uses a specific policy.
Concurrency::PolicyElementKey enumeration defines the policy keys that are associated with the Task Scheduler.
Here’s an article explaining the utility of each policy key, and the default value of each key.
The following collaborations between the scheduler and resource manager shows the role of each
interface concerned by the allocation.
Ask for resource allocation:
Getting resources from resource manager:
2) Manage Task Queues
When Scheduler is created, tasks could be assigned to it to be executed, the Scheduler store tasks into queues. To enforce the cohesion of classes, the queues are not managed directly by the
ThreadScheduler class but by the
A schedule group affinitizes, or groups, related tasks together. Every scheduler has one or more schedule groups. Use schedule groups when you require a high degree of locality among tasks, for example, when a group of related tasks benefit from executing on the same processor node.
As shown in the following graph, the runtime provides two kind of
CacheLocalScheduleGroup, choosing between those two groups as it will be explained later impact the algorithm of choosing the next task to execute by the scheduler.
Every Scheduler has a default schedule group for every scheduling node.The runtime creates one scheduling node for every processor package or Non-Uniform Memory Architecture (NUMA) node. If you do not explicitly associate a task with a schedule group, the scheduler chooses which group to add the task to.
As shown in the following dependency graph, the
schedulingRing is responsible for managing Scheduler Groups, it contains a list of groups and creates them.
The Schedule group contains three kind of queues:
1. FIFO Queue
This queue contains lightweight tasks, a lightweight task resembles the function that you provide to the Windows API
CreateThread function. Therefore, lightweight tasks are useful when you adapt existing code to use the scheduling functionality of the Concurrency Runtime.
A lightweight task is represented by
RealizedChore class, and the FIFO queue of the schedule group is represented by the field
Let’s search all methods using directly this queue:
So we can add a lightweight task to the group by invoking
Here’s an interesting article talking about lightweight tasks.
2. Work Stealing Queue
There’s only one FIFO queue associated to the schedule group, but the schedule group references a list of work sealing queues, for each worker thread, there’s a local queue associated to it.
A thread that is attached to a scheduler is known as an execution context, or just context, so actually this local queue is associated to
Context class provides a programming abstraction for an execution context and offers the ability to cooperatively block, unblock, and yield the current context.
And to be sure that only
Context creates this kind of queue, let’s search of method accessing directly to
So the context is responsible for creating this queue, and for each context, there’s a local work stealing queue associated to it.
To illustrate the behavior of the work-stealing algorithm, let’s imagine that we have two worker threads allocated to the scheduler.
As explained before, for each worker thread, there’s a local queue associated with it.
Imagine that we have three tasks in the queue of the Worker Thread 1, tasks 3 and 4 are waiting to be executed while the 5 is running.
Dispatch method finds that there is nothing in the queue, so the task 3 is moved, or “stolen” from its original queue, to be distributed the worker thread available.
How we can create a task managed by work stealing queue?
To answer this question, let’s search for methods indirectly invoking the
As shown in this dependency graph, this kind of task could be created by using
task_group to add new task is better than using
Scheduler::ScheduleTask to create lightweight task, indeed the work stealing algorithm optimizes better the using of virtual processors allocated to the scheduler.
ScheduleTask could be better to migrate easily from the existing code using
3. Unblocked Context Queue
Context class lets you block or yield the current execution context. Blocking or yielding is useful when the current context cannot continue because a resource is not available.
Context::Block method blocks the current context. A context that is blocked yields its processing resources so that the runtime can perform other tasks. The
Context::Unblock method unblocks a blocked context.
When a context is unblocked and it’s available to be executed, it’s added to runnable context queue, this queue is represented by the field
Here’s a dependency graph showing some cases where the context is added to runnable queue:
So the context is added to the queue when it’s unblocked or a virtual processor is retired from the scheduler.
3. Dispatching Tasks
The scheduler tries to search for work to execute, a work could be:
- Unblocked context
- Lightweight task
- Task in work stealing queues
And as explained before, all these works are stored in queues managed by Schedule groups, and each group is managed by a scheduling ring.
When a virtual processor is allocated to the scheduler, a
ThreadProxy class is created and associated to this processor and after the creation the
Dispatch method of the
ThreadProxy is invoked.
As shown in the following dependency graph and as explained before, the concurrency runtime uses
abstract classes to enforces low coupling, and the real dispatch invoked depends on the implementation chosen by the runtime, this choice is given by the scheduler policy.
The concrete implementation of
Dispatch invokes Dispatch Method of the execution context.
Here’s a dependant graph showing methods invoked by concrete implementation of
So the algorithm of searching the next work to execute is implemented by
Let’s discover all classes used directly by
WorkSearchContext to achieve its responsibility:
The responsibility of
WorkSearchContext is to give us a
WorkItem to execute, it could be
To understand better the collaboration between these classes, let’s search for methods used directly by
WorkSearchContext iterates on
ScheduleGroup classes by using
And for each
ScheduleBase, we search for
WorkSearchContext class is created by the
VirtualProcessor class, and as shown in the following dependency graph, the algorithm used is specified when the
VirtualProcessor is initialized, and for that it asks the Scheduler for the
SchedulingProtocol which describes the scheduling algorithm that will be utilized for the scheduler.
WorkSearchContext will be notified by the algorithm to use by passing it a value from
So two algorithms are implemented by this class to find a work:
Cache Local Algorithm
This algorithm will look for runnables context within the current schedule group, then realized chores and unrealized chores, if there’s no more work in the current schedule group, it looks for the next group in the same schedule ring, and when it finishes all work in the current schedule ring, it looks in the next schedule ring.
So the scheduler prefers to continue to work on tasks within the current schedule group before moving to another schedule group.
This algorithm is implemented by
WorkSearchContext::SearchCacheLocal method, and as show by this dependency graph, this method search invokes other methods to search for runnable contexts,
Another specificity of this algorithm is that the unblocked contexts are cached per virtual-processor and are typically scheduled in a last-in first-out (LIFO) fashion by the virtual processor which unblocked them.
And to verify this behavior, here’s a dependency graph of methods invoked when searching for runnable context:
This algorithm is the default algorithm chosen by the scheduler if no one is specified.
In this case, the scheduler prefers to round-robin through schedule groups after executing each task. Unblocked contexts are typically scheduled in a first-in-first-out (FIFO) fashion. Virtual processors do not cache unblocked contexts.
This algorithm is implemented by
WorkSearchContext::SearchFair method, and as show by this dependency graph, this method search invoke other methods to search for runnable contexts,
After this analysis, we can say that the concurrent runtime task scheduler is well designed, it’s characterized by a high cohesion and very low coupling, it’s very flexible and provides by default an optimized algorithm to exploit better hardware resources.
Filed under: CodeProject