Concurrency Runtime: Task Scheduler

Issam Lahlali

0/5 (0 vote)

Sep 15, 2010

CPOL

9 min read

17004

Concurrency runtime: task scheduler

The Task Scheduler schedules and coordinates tasks at run time. A task is a unit of work that performs a specific job. The Task Scheduler manages the details that are related to efficiently scheduling tasks on computers that have multiple computing resources.

Windows OS provides a preemptive kernel-mode scheduler, it’s a round-robin, priority-based mechanism that gives every task exclusive access to a computing resource for a given time period, and then switches to another task. Although this mechanism provides fairness (every thread makes forward progress), it comes at some cost of efficiency. For example, many computation-intensive algorithms do not require fairness. Instead, it is important that related tasks finish in the least overall time. Cooperative scheduling enables an application to more efficiently schedule work.

Cooperative scheduling is a mechanism that gives every task exclusive access to a computing resource until the task finishes or until the task yields its access to the resource.

The user-mode cooperative scheduler enables application code to make its own scheduling decisions. Because cooperative scheduling enables many scheduling decisions to be made by the application, it reduces much of the overhead that is associated with kernel-mode synchronization.

The Concurrency Runtime uses cooperative scheduling together with the preemptive scheduler of the operating system to achieve maximum usage of processing resources.

In this article, we will discuss the Task Scheduler design and examine how it works internally. To do so, CppDepend is used to analyze the concurrency runtime code source.

Scheduler Design

The concurrency runtime provides the interface Scheduler to implement a specific scheduler adapted to application needs.

Let’s discover classes implementing this interface:

The concurrency runtime provides two implementations of the scheduler, ThreadScheduler and UMSThreadScheduler, and as shown in the dependency graph the SchedulerBase contains all common behavior of these two classes.

Is the Scheduler flexible? A good indicator of flexibility is to search for all abstract classes used by the scheduler.

As shown in the dependency graph, the Scheduler uses many abstract classes, it enforces low coupling, and makes the scheduler more flexible, so adapting it to other needs will be easy.

To explain the role of each abstract class used by the scheduler, we will discuss its responsibilities.

There are three major responsibilities assigned to the Task Scheduler:

1. Getting Resources (Processors, Cores, Memory)

When the scheduler is created, it asks for resources from runtime resource manager as explained in this article.

The scheduler communicates with the resource manager using IResourceManager, ISchedulerProxy and IScheduler interfaces.

Resources given by resource manager use scheduler policy to allocate resources to the scheduler.

The policy as shown in this dependency graph is assigned when the scheduler is created.

The concurrency runtime creates a default scheduler if no scheduler exists by invoking GetDefaultScheduler method, and a default policy is used.

The Task Scheduler enables applications to use one or more scheduler instances to schedule work, and an application can invoke Scheduler::Create to add another scheduler that uses a specific policy.

The Concurrency::PolicyElementKey enumeration defines the policy keys that are associated with the Task Scheduler.

Here’s an article explaining the utility of each policy key, and the default value of each key.

The following collaborations between the scheduler and resource manager shows the role of each interface concerned by the allocation.

Ask for resource allocation:

Getting resources from resource manager:

2) Manage Task Queues

When Scheduler is created, tasks could be assigned to it to be executed, the Scheduler store tasks into queues. To enforce the cohesion of classes, the queues are not managed directly by the ThreadScheduler class but by the ScheduleGroupBase class.

A schedule group affinitizes, or groups, related tasks together. Every scheduler has one or more schedule groups. Use schedule groups when you require a high degree of locality among tasks, for example, when a group of related tasks benefit from executing on the same processor node.

As shown in the following graph, the runtime provides two kind of ScheduleGroups: FairScheduleGroup and CacheLocalScheduleGroup, choosing between those two groups as it will be explained later impact the algorithm of choosing the next task to execute by the scheduler.

Every Scheduler has a default schedule group for every scheduling node.The runtime creates one scheduling node for every processor package or Non-Uniform Memory Architecture (NUMA) node. If you do not explicitly associate a task with a schedule group, the scheduler chooses which group to add the task to.

As shown in the following dependency graph, the schedulingRing is responsible for managing Scheduler Groups, it contains a list of groups and creates them.

The Schedule group contains three kind of queues:

1. FIFO Queue

This queue contains lightweight tasks, a lightweight task resembles the function that you provide to the Windows API CreateThread function. Therefore, lightweight tasks are useful when you adapt existing code to use the scheduling functionality of the Concurrency Runtime.

A lightweight task is represented by RealizedChore class, and the FIFO queue of the schedule group is represented by the field m_realizedChores.

Let’s search all methods using directly this queue:

So we can add a lightweight task to the group by invoking ScheduleGroupBase::ScheduleTask or Scheduler::ScheduleTask.

Here’s an interesting article talking about lightweight tasks.

2. Work Stealing Queue

There’s only one FIFO queue associated to the schedule group, but the schedule group references a list of work sealing queues, for each worker thread, there’s a local queue associated to it.

A thread that is attached to a scheduler is known as an execution context, or just context, so actually this local queue is associated to Context class.

The Context class provides a programming abstraction for an execution context and offers the ability to cooperatively block, unblock, and yield the current context.

And to be sure that only Context creates this kind of queue, let’s search of method accessing directly to m_workQueues field.

So the context is responsible for creating this queue, and for each context, there’s a local work stealing queue associated to it.

To illustrate the behavior of the work-stealing algorithm, let’s imagine that we have two worker threads allocated to the scheduler.

As explained before, for each worker thread, there’s a local queue associated with it.

Imagine that we have three tasks in the queue of the Worker Thread 1, tasks 3 and 4 are waiting to be executed while the 5 is running.

The Dispatch method finds that there is nothing in the queue, so the task 3 is moved, or “stolen” from its original queue, to be distributed the worker thread available.

How we can create a task managed by work stealing queue?

To answer this question, let’s search for methods indirectly invoking the CreateWorkQueue method.

As shown in this dependency graph, this kind of task could be created by using task_group class.

Using task_group to add new task is better than using Scheduler::ScheduleTask to create lightweight task, indeed the work stealing algorithm optimizes better the using of virtual processors allocated to the scheduler.

However ScheduleTask could be better to migrate easily from the existing code using CreateThread API.

3. Unblocked Context Queue

The Context class lets you block or yield the current execution context. Blocking or yielding is useful when the current context cannot continue because a resource is not available.

The Context::Block method blocks the current context. A context that is blocked yields its processing resources so that the runtime can perform other tasks. The Context::Unblock method unblocks a blocked context.

When a context is unblocked and it’s available to be executed, it’s added to runnable context queue, this queue is represented by the field m_runnableContexts.

Here’s a dependency graph showing some cases where the context is added to runnable queue:

So the context is added to the queue when it’s unblocked or a virtual processor is retired from the scheduler.

3. Dispatching Tasks

The scheduler tries to search for work to execute, a work could be:

Unblocked context
Lightweight task
Task in work stealing queues

And as explained before, all these works are stored in queues managed by Schedule groups, and each group is managed by a scheduling ring.

When a virtual processor is allocated to the scheduler, a ThreadProxy class is created and associated to this processor and after the creation the Dispatch method of the ThreadProxy is invoked.

As shown in the following dependency graph and as explained before, the concurrency runtime uses abstract classes to enforces low coupling, and the real dispatch invoked depends on the implementation chosen by the runtime, this choice is given by the scheduler policy.

The concrete implementation of Dispatch invokes Dispatch Method of the execution context.

Here’s a dependant graph showing methods invoked by concrete implementation of Context::Dispatch method:

So the algorithm of searching the next work to execute is implemented by WorkSearchContext class.

Let’s discover all classes used directly by WorkSearchContext to achieve its responsibility:

The responsibility of WorkSearchContext is to give us a WorkItem to execute, it could be InternalContextBase, RealizedChore or _UnrealizedChore.

To understand better the collaboration between these classes, let’s search for methods used directly by WorkSearchContext:

So the WorkSearchContext iterates on SchedulingRing and ScheduleGroup classes by using SchedulerBase methods.

And for each ScheduleBase, we search for RunnableContext, realizedChore or UnrealizedChore.

The WorkSearchContext class is created by the VirtualProcessor class, and as shown in the following dependency graph, the algorithm used is specified when the VirtualProcessor is initialized, and for that it asks the Scheduler for the SchedulingProtocol which describes the scheduling algorithm that will be utilized for the scheduler.

The WorkSearchContext will be notified by the algorithm to use by passing it a value from Algorithm enum.

So two algorithms are implemented by this class to find a work:

Cache Local Algorithm

This algorithm will look for runnables context within the current schedule group, then realized chores and unrealized chores, if there’s no more work in the current schedule group, it looks for the next group in the same schedule ring, and when it finishes all work in the current schedule ring, it looks in the next schedule ring.

So the scheduler prefers to continue to work on tasks within the current schedule group before moving to another schedule group.

This algorithm is implemented by WorkSearchContext::SearchCacheLocal method, and as show by this dependency graph, this method search invokes other methods to search for runnable contexts, RealizedChore or _UnrealizedChore.

Another specificity of this algorithm is that the unblocked contexts are cached per virtual-processor and are typically scheduled in a last-in first-out (LIFO) fashion by the virtual processor which unblocked them.

And to verify this behavior, here’s a dependency graph of methods invoked when searching for runnable context:

This algorithm is the default algorithm chosen by the scheduler if no one is specified.

Fair Algorithm

In this case, the scheduler prefers to round-robin through schedule groups after executing each task. Unblocked contexts are typically scheduled in a first-in-first-out (FIFO) fashion. Virtual processors do not cache unblocked contexts.

This algorithm is implemented by WorkSearchContext::SearchFair method, and as show by this dependency graph, this method search invoke other methods to search for runnable contexts, RealizedChore or _UnrealizedChore.

After this analysis, we can say that the concurrent runtime task scheduler is well designed, it’s characterized by a high cohesion and very low coupling, it’s very flexible and provides by default an optimized algorithm to exploit better hardware resources.

Filed under: CodeProject