A developer's log book: csharp

Since my first article on the task asynchronous pattern (TAP) with C# .NET, I have successfully implemented several communication libraries with the async/await syntactic sugars.
I will soon need to train my teammates to this pattern and have been working on a series of short articles to get them started.
The purpose of this first article is to introduce the basic concepts of asynchronous programming in C# with lots of visual content, which is probably what is missing the most in the articles found online. My intention here is to simplify the theory to only focus on the essence of asycnhronous programming.

The theory

Asycnhronous programming is meant to capture parts of your code into schedulable blocks called Task. These blocks will be executed in the background with a set of shared ressources called ThreadPool.

Let's take a look on how a classic single-threaded program would execute on a system.

A program (or code) is made of functions of functions. On a single-threaded application, this code will be executed sequentially by the same Thread.

Now if we want to leverage TAP with the same code, here is what the code would look like.

The functions of my code are now captured in Task objects that are scheduled for execution. This time, it is not necessarily the same Thread that will execute each block, it will depend on threads availability in the ThreadPool. However, the code will be executed in the exact same order.

So why is it different ?

If f(x) executes blocking operations (e.g. I/O read), the Thread will remain blocked until the operation completes. That thread will be unavailable for other operations during that time.
If T executes blocking operations, the execution of the Task will be suspended until the the blocking operation resumes. During that time, the Thread will be released and free to execute other Tasks if necessary.

So functionnally, both codes are equivalent but in terms of system resource consumption, they do not work identically.

How is this possible ?

A Task is a stateful object (see TaskStatus). Whenever the code hits an await statement, it will start a state machine for the execution of the Task.
When hitting a blocking asynchronous sub-function, my Task will enter WaitingForChildrenToComplete state and will be put aside. The system can detect when an IO completes and will resume the execution where it was left by reloading the execution context.

Pros and Cons

A code that is executed synchronously will perform better than its asynchronous version. As previouly explained, the execution of asynchronous code requires the creation of a state machine and is dependent on threads availability.
Using TAP makes my system more scalable than its synchronous version. The resources of my system are only used when necessary which allows me to support a higher workload

Asynchronous vs Parallel

A common mistake is to mix these two concepts. The purpose of asynchronous programming is not to offer a simplified framework for parallel processing. Most of the time, you should not even use Task.Run or Task.Factory.StartNew and I believe that's what creates confusion. TAP is not a multitasking framework, it is a "promise for execution" framework.

With that said, TAP provides a few interesting methods if you want to parallelize the execution of your Task objects with Task.WhenAll or Task.WhenAny.

In this article, we'll focus on C# async/await keywords and explain what they're intended for.

When ?

Asynchronous programming has been mainly thought to avoid blocking a thread because of an I/O operation (serial port read, http request, database access ...). It can also be used to handle CPU-bound operations like expensive calculations.

How ?

C# has built-in syntactic sugar keywords (async/await) for easily writing asynchronous code without dealing with callbacks and helps making asynchronous calls on existing synchronous interfaces/APIs (although it is really not the recommended approach). It is known as Task-based Asynchronous Pattern (TAP).

This entire mechanism relies on Task<T> object.

Task vs Thread

A Thread is a worker. It is an OS object which executes a job (e.g. some code) in parallel.
A Task is a job that needs to be scheduled and executed on available workers, eventually in a ThreadPool. They are a promise for execution.
A ThreadPool is a group of Threads that .NET will handle for you. They are system-shared workers that your application can rely on to execute jobs asynchronously.

With an embedded SW engineering background, it is very tempting for me to instantiate my own Thread objects in my application. It gives me confidence on how my application's jobs are scheduled over time. But generally speaking, it's a mistake.

Threads are expensive objects in terms of memory (1MB / thread in .NET) but also in terms of performances. On a resource-limited system, having several threads per application will exhaust the CPU and slow your system down. .NET will manage the ThreadPool for you, keep threads alive for reuse and take your system's limitations in consideration when doing so. In short, with .NET, prefer using Task.Run for multithreading.

Note: The only situation I came accross that required the creation of my own Thread was when developping a Windows service. When operating system exits, and your background service has pending Task objects in the background, it won't be stopped. This is because the ThreadPool cannot be released.

Definitions

Before showing some examples, we need to understand the meaning of the keywords:

async

Method declarator which allows the usage of await within method's body.

public async static void RequestDataOverHttp()
    {
        await RequestDataAsync();
    }

Keep in mind that declaring a method as async does not make it asynchronous. If await was removed from the code above, the method would execute synchronously despite async declarator.

await

Start execution of a method and yield.

 public async static Task RequestDataOverHttp()
    {
        await RequestDataAsync();

        // The code below won't be executed until RequestDataAsync returns
        Console.WriteLine("Data received"); 
    }

This keyword creates a state engine in the background to handle job completion. This is what's going to happen:

Module A calls RequestDataOverHttp
RequestDataOverHttp schedules execution of RequestDataAsync on the same thread. Here, await captures the SynchronizatioContext before awaiting
await yields and A continue its processing
RequestDataAsync completes and unlocks internal state engine. .NET looks for an available Thread in ThreadPool to resume RequestDataOverHttp. That thread picks up the SynchronizationContext of the original thread.
Console finally shows "Data Received"

The most complicated aspect of this mechanism is in understanding how the processing can continue on same thread when hitting an await statement.That is possible thanks to SynchronizationContext and TaskScheduler objects.

SynchronizationContext & TaskScheduler

SynchronizationContext

It is a representation of the environment in which a job is executed. Concretely, this object contains a worker which is usually a thread but can also be a group of threads (ThreadPool), a network instance, a CPU core...

This is what allows a code to be executed on another Thread. For instance, in WPF & Forms, the edition of controls is only possible from UI Thread. By calling control.BeginInvoke from a regular thread, we're placing a delegate to be executed onto the UI Thread.

Under the hood, delegates are queued with Post() or Send() into the context. That's basically what a context does, it's a sort of queue of work for a Thread.

TaskScheduler

We've seen that calling control.BeginInvoke will queue a delegate for UI Thread, which means that it schedules work. This method is part of ISynchronizeInvoke which is part of Control object.

When creating a Task, the scheduling behavior depends on the situation we're in:

On Task creation, the work will first try to be scheduled into the SynchronizationContext of the current thread.
As all threads do not necessarily have a SynchronizationContext, TaskScheduler will schedule the work using the ThreadPool as default choice.
If the Task has been created into another Task, the context of the primary Task will be reused (this is configurable).

Here is a more detailed summary of situations:

Calling thread	Has SynchronizationContext ?	Behavior
Console application	No	Default TaskScheduler used (ThreadPool)
Custom thread	No	Default TaskScheduler used (ThreadPool)
ThreadPool	Yes	All Tasks executed on ThreadPool
UI Thread	Yes	Tasks queued on UI Thread
.NET Core web application	No	All Tasks executed on ThreadPool
ASP.NET web application	Yes	Each request has its own thread. Tasks are scheduled on these threads.
Library code	Unknown	Unexpected behavior, potential deadlock

Task.ConfigureAwait(bool continueOnCapturedContext)

The default behavior of await can be overriden by calling ConfigureAwait(false):

 public async void ReadStringAsync()
    {
    await httpResponse.Content.ReadAsStringAsync().ConfigureAwait(false);
    }

With this call, we indicate that the Task does not have to be executed in caller's context which means that it will be scheduled on the ThreadPool.
When to do that ? If caller is UI Thread and the method does not update the UI elements, doing so is actually better in terms of performances as it will be executed in parallel. Also, it prevents from deadlocks if caller was doing something like ReadStringAsync().Result (see good practices below) which is also why it is a good practice to call ConfigureAwait(false) in library code.

Usage

Case 1 : I/O bound code

The application awaits an operation which returns a Task<T> inside of an async method.

Synchronous version of an I/O bound method

public string RequestVersion()
    {
        string response = String.Empty;
    
        // Send request
        client.Send(new GetVersionFrame());
        // Wait response
        return client.WaitResponse();
    }

Asynchronous version it

public async Task<string> RequestVersionAsync()
    {
        string response = String.Empty;
    
        // Send request
        await client.SendAsync(new GetVersionFrame());
        // Wait response
        return await client.WaitResponseAsync().ConfigureAwait(false);
    }

Case 2 : CPU bound code

The application awaits an operation which is started on a background thread with the Task.Run method inside an async method.

Synchronous version of a CPU bound method

public List<double> ComputeCoefficients()
    {
        List<double> coefficients = new List<double>();
    
        coefficients.Add(ComputeA());
        coefficients.Add(ComputeB());
        coefficients.Add(ComputeC());
        return coefficients;
    }

Asynchronous version it

public async Task<List<double>> ComputeCoefficientsAsync()
    {
        List<double> coefficients = new List<double>();
    
        coefficients.Add(await Task.Run(() => ComputeA()));
        coefficients.Add(await Task.Run(() => ComputeB()));
        coefficients.Add(await Task.Run(() => ComputeC()));
        return coefficients;
    }

Good practices

Naming

Name asynchronous methods with Async suffix to indicate that the call won't block the caller's thread.

public async void FooAsync()
    {
        await client.DownloadAsync();
    }

Async indicates that the method will offload part of the work to an underlying API (ex: OS networking API).

CPU-bound work

Consider using background threads via Parallel.ForEach or Task.Run for CPU-bound work instead of await unless you're working in a library where you can't do that (see below).

Don't block in async code

1. Bad code

public void Foo()
    {
        client.DownloadAsync().Result;
    }

or 2. Very Bad code

public void Foo()
    {
        Task.Run(() => client.DownloadAsync().Result).Result;
    }

At some point, the async method will be executed/resumed on ThreadPool but if there is no available threads, you'll end with a deadlock. If the example 1 is called from the UI Thread, the task is queued for the UI thread which gets blocked when it reaches Result call --> deadlock.

As asynchronous code relies on execution context, don't block an asynchronous method unless you own the calling thread or if it's the application's main thread. As a general rule : call sync code from sync code and async code from async code, try to not mix them. The application's top layer has control over the context, it can chose whether to use sync or async code.

Note: using Task.Run to delegate some tasks to a ThreadPool while keeping the UI responsive is generally okay

No Task.Run in a library

This rule is related to the previous one. Callers should be the ones to call Task.Run because they have control on the execution context. Functionnally, Task.Run will work but also introduce performance issue because of an additional thread switch.
Additionnally, if a library needs to support both sync and async methods, there should be no relation between them. We can't use async calls in sync code, or we might run into deadlock issues.

Do not use async void

public async void FooAsync()
    {
        await client.DownloadAsync();
    }

As there is no Task object to be returned, exceptions cannot be captured and will be posted in the SynchronizationContext (UI Thread for example).
Also, the caller is unable to know when the execution has finished, it's a "fire and forget" mechanism.

Instead, use

public async Task FooAsync()
    {
        await client.DownloadAsync();
    }

A developer's log book.

Friday, February 21, 2020

Purpose of asynchronous programming (.NET)

The theory

So why is it different ?

How is this possible ?

Pros and Cons

Asynchronous vs Parallel

Friday, September 13, 2019

C# Asynchronous programming

When ?

How ?

Task vs Thread

Definitions

SynchronizationContext & TaskScheduler

Task.ConfigureAwait(bool continueOnCapturedContext)

Usage

Good practices

Popular Posts

Labels

Blog Archive

About