Monday, October 5, 2020

SignalR Core: Create a pub/sub messaging component


I've been using the publish-subscribe pattern a lot in my applications and I really love it. Many times over the years, this simple yet very powerful messaging pattern helped me keep my code clean and was a key service of the systems I've designed because of the flexibility it offers. 

In one of the projects I am working on though, some of the applicative modules of the app have been moved from the host to a remote client. Both are connected to the same network. 

Moving a module isn't a big deal in my architecture because of the DDD approach that I started taking months ago. Each module can be considered as a bounded context and is independent. 

However, as my messaging service was only capable of dispatching the messages locally, the remote modules cannot listen to the app notifications anymore.

The existing code

The interface of my messaging module is very simple:

public interface IServiceMessaging
{    
    void Subscribe<T>(object subscriber, Action<T> handler) where T : new();
    void Unsubscribe<T>(object subscriber) where T : new();
    Task PublishAsync<T>(T data) where T : new();
}

A client would simply do the following to listen to particular message:

messaging.Subscribe<StatusIoMessage>(this, OnStatusIoMessage);

void OnStatusIoMessage(StatusIoMessage message)
{
    Console.WriteLine($"Received : {message.Type} with {message.Symbol} = {message.Value}");
}

At the moment, the implementation of this service is wrapping a 3rd party lib to do the job. The idea is to extend the communication to the remote modules.

SignalR implementation

After looking around for available solutions, SignalR seemed like a good choice to achieve our goal. Other candidates were studied but SignalR stood out for its simplicity and the fact that it is a key component of .NET Core.

On client side, a HubConnection object is needed to start publishing messages to the remote hub. Here is what the code looks like:

private HubConnection BuildConnection()
{    
    string url = $"https://{address}:{port}{MessagingHub.MainRoute}";
    if (hubConnection == null)
    {
        hubConnection = new HubConnectionBuilder()
            .WithUrl((url))
            .AddNewtonsoftJsonProtocol(options =>
            {
              options.PayloadSerializerSettings.TypeNameHandling = Newtonsoft.Json.TypeNameHandling.Objects;
              options.PayloadSerializerSettings.TypeNameAssemblyFormatHandling =
                    Newtonsoft.Json.TypeNameAssemblyFormatHandling.Full;
              options.PayloadSerializerSettings.Converters.Add(new StringEnumConverter());
            })
            .WithAutomaticReconnect()
            .Build();
        hubConnection.On("__unused__", () => { });

        hubConnection.Closed += HubConnection_Closed;
        hubConnection.Reconnecting += HubConnection_Reconnecting;
        hubConnection.Reconnected += HubConnection_Reconnected;
    }

    return hubConnection;
}

Address, port and main route are properties of my messaging class that are read from application's configuration file.

Note that I am using Newtonsoft Json serializer instead of the default Microsoft Json protocol. This is because of the higher configurability of Newtonsoft. 

BuildConnection is called at the initialization of my messaging service, immediately followed by a connection to the hub:

hubConnection = BuildConnection();
await hubConnection.StartAsync(token);

The publishing of a message is now possible with:

await hubConnection.SendAsync(TargetNames.Publish, message);

Why TargetNames.Publish ? SignalR relates client/hub calls thanks to what they call "target methods". It is basically a key that is used to route the calls. When sending a message to the hub, this key is one of the hub's method names. Here, TargetNames refers to a static class where I've listed all the magic strings that are needed to do the binding.

public static readonly string Publish = "PublishAsync";

Alright now let's have a look on the hub's side. My hub is running in a self-hosted mode, without IIS. More information here

The configuration of the hub (MessageHub) is done in the Startup class:

public void ConfigureServices(IServiceCollection services)
{
    services.AddSignalR(options =>
    {
        options.EnableDetailedErrors = true;
    })
    .AddNewtonsoftJsonProtocol(options =>
    {
      options.PayloadSerializerSettings.TypeNameHandling = Newtonsoft.Json.TypeNameHandling.Objects;
      options.PayloadSerializerSettings.TypeNameAssemblyFormatHandling = Newtonsoft.Json.TypeNameAssemblyFormatHandling.Full;
      options.PayloadSerializerSettings.Converters.Add(new StringEnumConverter());
    });
}

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
    app.UseRouting();
    app.UseEndpoints(endpoints =>
    {
        endpoints.MapHub<MessagingHub>(MessagingHub.MainRoute);
    });
}

Then when my app starts, I need to build and configure the OWIN host:

host = Host.CreateDefaultBuilder()
    .ConfigureWebHostDefaults(webBuilder =>
    {
      webBuilder.UseStartup()
          .UseUrls($"https://{ipAddress}:{Port}");
    })
    .ConfigureServices((context, services) => { services.AddSingleton<MessagingHub>(); })
    .Build();

and finally, this is hub's code:

public class MessagingHub : Microsoft.AspNetCore.SignalR.Hub
{
    public async Task PublishAsync(object message)
    {
        if (message == null)
        {
            throw new ArgumentNullException(nameof(message));
        }

        try
        {
            await Clients.Others.SendAsync(
                TargetNames.GetEventNameFor(message.GetType().Name), message);
        }
        catch (Exception e)
        {
            OnError?.Invoke($"Failed to publish message with e={e}");
        }
    }
}

There you can see the PublishAsync method that will be called by remote clients. Pay attention to the target method name used to call subscribers. In my implementation, I am generating a key based on the type of message being dispatched. GetEventNameFor() only prepends "On" to the type name, but that's just an internal convention of my code. Any key would work on the condition that the same is used by the clients on subscription:

public void Subscribe<T>(object subscriber, Action<T> handler) where T : new()
{
    if (subscriber == null)
    {
        throw new ArgumentNullException(nameof(subscriber));
    }
    
    if (handler == null)
    {
        throw new ArgumentNullException(nameof(handler));
    }

    string eventName = TargetNames.GetEventNameFor<T>();
    hubConnection.On<T>(eventName, handler);
}

Now that everything is wired up, my service can be shared by both remote and local modules. 

public void Subscribe<T>(object subscriber, Action<T> handler) where T : new() 
{
    if (subscriber == null) 
    { 
        throw new ArgumentNullException(nameof(subscriber)); 
    }
    
    if (handler == null) 
    { 
        throw new ArgumentNullException(nameof(handler)); 
    }
    
    if (!isSubscribed) 
    { 
        remoteHub.Subscribe<T>(subscriber, handler); 
        localHub.Subscribe<T>(subscriber, handler); 
    } 
}

public void Unsubscribe<T>(object subscriber) where T : new() 
{ 
    if (subscriber == null) 
    { 
        throw new ArgumentNullException(nameof(subscriber)); 
    }

    if (subscription != null) 
    { 
        remoteHub.Unsubscribe<T>(subscriber); 
        localHub.Unsubscribe<T>(subscriber); 
    } 
}

public async Task PublishAsync<T>(T data) where T : new() 
{ 
    if (data == null) 
    { 
        throw new ArgumentNullException(nameof(data)); 
    }

    try 
    { 
        await remoteHub.PublishAsync<T>(data); 
        await localHub.PublishAsync<T>(data); 
    } 
    catch (MessagingHubException e) 
    { 
        Log?.Error($"Failed to publish {typeof(T)} with exception={e}"); 
    } 
}

I have encapsulated the local and the remote implementations in my global messaging service. The components that were using the messaging service are now connected to the remote modules as well.

In my next article I'll introduce how to add a point-to-point messaging capability to this service with SignalR. 

Thursday, May 7, 2020

Using a unit of work for transactional operations

This article shows an example of how a unit of work can be used in an application with a very basic example. If you have chosen to work with the repository pattern and have concerns about transactional operation, this article might be for you.

Initial context


Say we have an e-Shop where clients can place orders. Our database contains tables for our customers, our suppliers and the orders. A first approach based on the repository pattern would look like the following.


The application controller directly accesses the repositories and updates all the tables sequentially when the client places the order. As each access to the database is isolated, it cannot do all the updates together. This structure works but is subject to inconsistencies. The controller needs to perform the updates in the right order and the desired relations are built one at a time. If the application crashes or if the transaction is somehow interrupted before completion, my database is partially updated and contains segments of information that might not be valid anymore.

Unit of work to the rescue


"A unit of work keeps track of everything you do during a business transaction that affect the database"


On this new diagram, the domain logic accesses the unit of work instead of the repositories. The unit of work encapsulates and exposes the repositories required for the order. With this representation, the purpose of the unit of work becomes clearer:

The client has only one method to make them persistent which is Complete()

The implementation of this method can actually bring the notion of transaction into the picture which is safer in terms of data consistency.

Theoretically, your application will need several unit of works to group all the data that changes together in one place. It does not mean that a repository can only be accessible from one unit of work. We can imagine here a second unit of work to manage the user accounts that would encapsulate CustomerSecuritySettingsRepository and CustomerRepository as well. Make sure you're dealing correctly with concurrent accesses in your repositories and you're good to go.

Here is how the unit of work reflects in most popular persistence tools:
* ITransaction in NHibernate
* DataContext in LINQ to SQL
* ObjectContext in EntityFramework

Wrapping-up


This article demonstrated a very easy setup of the "unit of work pattern" to efficiently and consistently do changes in your persistent storage. Martin Fowler depicts it very well here:
https://martinfowler.com/eaaCatalog/unitOfWork.html


Wednesday, March 4, 2020

Continuous package delivery with Azure DevOps

Background


Since my team started to work with Azure DevOps, we've been exploring its potential progressively : Boards, Repos, Pipelines ...
During the first phase we became accustomed to how each of these unitary services work and the entire team now feels confortable with the them.
At this moment, I consider that the team's efficiency is equal to what it was before using Azure except we are using different tools now.

Stopping our exploration there would have been too bad considering the amount of manual labor that can be automated with Azure. Our development process should ideally converge to what is commonly called a "software factory" where most of the steps from code commits to software package delivery are automated.
I am not saying CI/CD here but the philosophy is equivalent. The continuous deployment of our built system is (at the time of writing) incompatible with the business. The packages are deployed manually in production by running an installer from a USB key. There is no automatic update support and the machines connectivity is limited to workshop automation. Not the ideal situation for DevOps but I believe we will get there sooner or later.

Our first step toward continuous deployment is focused on how each of the system components are developped in the team. The source code now resides in Azure Repo but the integrator is still checking-out the code to build the entire solution at once on his computer which means : build scripts to be updated, versions to be bumped and all binaries packed into a single installer. There is no opening for code reusability between teams: our repositories are private and the components are never archived anywhere.

As I used to work with NuGet packages in the past, all my developments are always packed and published to Azure Artifact. These packages are automatically fetched by visual studio at build time and hence, consumed as binaries and not code. My team really wants to do the same but they have limited knowledge with NuGet, pipeline configuration and artifact management. Besides, the integrator complains about the versioning because he's the only one to know how to increment the versions. I seized that opportunity to think about a solution where my team would only focus on code without ever caring about versioning, packing and publishing of the artifact.

Automation stages


Automation = Azure pipeline and the steps listed below will need to be scripted in an azure-pipelines.yml file



Versioning


It is critical to have a consistent versioning strategy among the development team to easily dissociate a major (i.e. breaking) change from a minor (i.e. compatible) by looking at the version. Besides, the version plays a key role in maintainability as it shall point the developers to the right code snapshot in history to fix whatever bug is found on client site.

I have oriented my team to Semantic Versioning 2.0 (SemVer 2) which is compatible with our traditional versioning strategy.
The version is delimited by the classic 3 digits plus additional metadata as shown below.

Major.Minor.Path-PreRelease.Counter+Build

PartReason for changeNature
MajorIncompatible changes made to public APIMandatory version number
MinorNew features added but backward compatibility preservedMandatory version number
PathBug fixes with backward compatibility preservedMandatory version number
PreReleasePre-release alphanumeric tag to denote the versionOptional build metadata
CounterPre-release version counterOptional build metadata
BuildBuild alphanumeric tag to denote the versionOptional build metadata

This versioning pattern is already supported by Azure Artifacts service and NuGet packaging technology since NuGet 4.3.0+ and Visual Studio 2017.

Now that we have the strategy, we have to define how to stamp this version in the binaries.

GitVersion


GitVersion is one of the tools that promises to be an automated SemVer 2.0 versioning system which will generate the right version for your current code depending on the branch you are on and by looking back at the history of the code.

GitVersion comes as a command line tool that can be executed from any git repository. It works out-of-the-box with GitHubFlow and GitFlow branching strategies and will generate a version without ever modifying the code. The version can then be published as an environment variable or injected into AssemblyInfo.

Below is an overview of the versions generated by GitVersion with the default configuration.



On this diagram, each arrow represents a branching or merging operation. The black labels show the version that is returned by GitVersion when executed from each branch.

Here are the default metadata settings built into GitVersion.

masterhotfixreleasesdevfeatures
Versioning ruleLast tag used as base versionPatch incremented on creationBranch name used as next versionMinor incremented when merging from master or release branch
PreRelease betabetaalphabranch name
CounterIncremented and reset automatically by gitversion based on the references of a branch. Will keep incrementing until a higher digit gets incremented.
BuildIncremented on each commit since last tag

Note: To be used in .NET projects, the attributes AssemblyVersion and AssemblyFileVersion shall be deleted from AssemblyInfo and a dependency shall be added to GitVersionTask package.

Build and unit tests


Nothing special to be said on that part.
My team was already using pipelines with that setup and nothing needs to be modified for continuous package delivery.

Pack


GitVersion published multiple versions as environment variables. The output usually looks like this when printed in a json format:

{
  "Major":2,
  "Minor":3,
  "Patch":0,
  "PreReleaseTag":"alpha.2",
  "PreReleaseTagWithDash":"-alpha.2",
  "PreReleaseLabel":"alpha",
  "PreReleaseNumber":2,
  "WeightedPreReleaseNumber":2,
  "BuildMetaData":"",
  "BuildMetaDataPadded":"",
  "FullBuildMetaData":"Branch.dev.Sha.b5753b8ab047485908674e7a0c956009abff5528",
  "MajorMinorPatch":"2.3.0",
  "SemVer":"2.3.0-alpha.2",
  "LegacySemVer":"2.3.0-alpha2",
  "LegacySemVerPadded":"2.3.0-alpha0002",
  "AssemblySemVer":"2.3.0.0",
  "AssemblySemFileVer":"2.3.0.0",
  "FullSemVer":"2.3.0-alpha.2",
  "InformationalVersion":"2.3.0-alpha.2+Branch.dev.Sha.b5753b8ab047485908674e7a0c956009abff5528",
  "BranchName":"dev",
  "Sha":"b5753b8ab047485908674e7a0c956009abff5528",
  "ShortSha":"b5753b8",
  "NuGetVersionV2":"2.3.0-alpha0002",
  "NuGetVersion":"2.3.0-alpha0002",
  "NuGetPreReleaseTagV2":"alpha0002",
  "NuGetPreReleaseTag":"alpha0002",
  "VersionSourceSha":"0f42b52188fcda73f3e407063db85695ce4ace1a",
  "CommitsSinceVersionSource":2,
  "CommitsSinceVersionSourcePadded":"0002",
  "CommitDate":"2020-02-28"
}

There is a version string especially dedicated to NuGet packages : NuGetVersion. So all there is to do here is to inject that value into the packing task :

 # Package assemblies
  - task: NuGetCommand@2
    displayName: 'Packaging the artifact'
    inputs:
      command: 'pack'
      packagesToPack: '**/*.csproj;!**/*Tests.csproj'
      versioningScheme: 'byEnvVar'
      versionEnvVar: GitVersion.NuGetVersion
      includeReferencedProjects: true
      configuration: 'Release'

Publish


When a build completes, the created package will reside in what Azure defines as the staging directory, which is where the repository has been cloned for the build. This location is not accessible and if the team wants to share the package within the organization, they have to publish the artifact.

In Azure, the artifacts are stored in Feeds. A Feed is a repository for specific types of packages (npm, pypi, NuGet,…). All teams in Azure are free to create one or several Feeds depending on their needs.

Each Feed can have several Views. A View acts as an overlay of the Feed and is intended to filter the content. This concept has been originally introduced to defined several stages before releasing an artifact. By default, each Feed comes with 3 Views : @Local, @PreRelease and @Release, which store respectively development, release candidates and production artifacts. The diagram below summarizes these concepts.



By default, all packages are published into @Local. This View shall only be visible by developers to avoid interlocks during development.
When a release candidate is ready, the integrator shall promote the package from @Local to @PreRelease. The package becomes visible to the testers for verification and validation.
When a package is finally validated, the integrator will generate a new package that he will promote to @Release. The package becomes visible to all stakeholders within the organization.

Each Feed can define a maximum retention time for the package it stores. When the delay expires, the package is deleted. This retention delay is only applied to @Local and promoted packages won't be deleted by the defined policy.

It is up to each team to configure the permission levels for each view.

After configuring our Azure Artifact feed with the proper permission levels and retention time, we were ready to rollout the first automated package publication.
It worked as expected for the test project. One of my input requirements was that my team needs to focus on code only which means that they should never configure the pipeline for their project. As the pipeline configuration file stands in the project repository, I looked for a way of reusing existing pipeline configuration files ...

Pipeline template


Since December 2019, Azure supports templating with the reuse of pipeline config files located in external repositories. My team and I arrived just in time ! :)

Below is the template that I have pushed to a 'TeamProcess' repository:

# File : base-netfull-pipeline.yml
#
# Azure pipeline configuration to build .NET Framework projects and publish
# them as NuGet artifacts into GF.MS.LAS.Machine Azure feed
parameters:
# Solution path in repository
- name: 'solution'
  default: '**/*.sln'
  type: string
# Target build platform
- name: 'buildPlatform'
  default: 'Any CPU'
  type: string
# Build configuration
- name: 'buildConfiguration'
  default: 'Release'
  type: string
# Build virtual image
- name: 'vmImage'
  default: 'windows-latest'
  type: string
# Source feed
- name: 'feed'
  default: '7ea4c5d0-fe57-441e-9fac-f026c9bb1207'
  type: string
# Packages to pack
- name: 'packagesToPack'
  default: '**/*.csproj;!**/*Tests.csproj'
  type: string
# Packages to push
- name: 'packagesToPush'
  default: '$(Build.ArtifactStagingDirectory)/**/*.nupkg;!$(Build.ArtifactStagingDirectory)/**/*.symbols.nupkg'
  type: string
# Does NuGet shall include all dependencies as reference package and/or dlls in the artifact ?
- name: packageAddReferences
  type: boolean
  default: true


jobs:
- job: Build
  pool:
    vmImage: ${{ parameters.vmImage }}
  steps:
  # Install NuGet utility
  - task: NuGetToolInstaller@1
    displayName: 'Installing NuGet utility'

  # Generate SemVer version
  - task: DotNetCoreCLI@2
    displayName: 'Install gitversion'
    inputs:
      command: 'custom'
      custom: 'tool'
      arguments: 'install -g gitversion.tool'

  - task: DotNetCoreCLI@2
    displayName: 'Gitversion setup'
    inputs:
      command: 'custom'
      custom: 'gitversion'
      arguments: '/output buildserver'

  # Restore project dependencies
  - task: NuGetCommand@2
    displayName: 'Restoring dependencies of the package'
    inputs:
      command: 'restore'
      restoreSolution: '${{ parameters.solution }}'
      feedsToUse: 'select'
      vstsFeed: '${{ parameters.feed }}'

  # Build
  - task: VSBuild@1
    displayName: 'Building solution'
    inputs:
      solution: '${{ parameters.solution }}'
      platform: '${{ parameters.buildPlatform }}'
      configuration: '${{ parameters.buildConfiguration }}'

  # Execute unit tests
  - task: VSTest@2
    displayName: 'Executing unit tests'
    inputs:
      platform: '${{ parameters.buildPlatform }}'
      configuration: '${{ parameters.buildConfiguration }}'

  # Package assemblies
  - task: NuGetCommand@2
    displayName: 'Packaging the artifact'
    inputs:
      command: 'pack'
      packagesToPack: '${{ parameters.packagesToPack }}'
      versioningScheme: 'byEnvVar'
      versionEnvVar: GitVersion.NuGetVersion
      includeReferencedProjects: ${{ parameters.packageAddReferences }}
      configuration: '${{ parameters.buildConfiguration }}'

  # Publish assemblies
  - task: NuGetCommand@2
    displayName: 'Publishing the artifact to feed'
    inputs:
      command: 'push'
      packagesToPush: '${{ parameters.packagesToPush }}'
      nuGetFeedType: 'internal'
      publishVstsFeed: '${{ parameters.feed }}'

To reuse this template, a project will create its own azure-pipelines.yml file with the following content:

# File: azure-pipelines.yml
resources:
  repositories:
    - repository: templates
      type: git
      name: TeamProcess

# Template reference
jobs:
- template: Process/Pipelines/net/base-netfull-pipeline.yml@templates

Conclusion


With a few days invested in reading msdn and other literature about the setup of Azure, I managed to achieve the creation of a fully automated NuGet package continuous delivery flow. The automatic versioning of the code is something I never thought of in the past but it is a real game changer. Without it, creating this delivery flow would have been trickier with additional scripting and/or code commits prior to build. The only difficulty I face was related to GitVersion add-on(s) in Azure. Many of them co-exist in the marketplace and it's really confusing. My recommendation is to used DotNetCLI instead which is a robust workaround to the add-on.


Friday, February 21, 2020

Purpose of asynchronous programming (.NET)

Since my first article on the task asynchronous pattern (TAP) with C# .NET, I have successfully implemented several communication libraries with the async/await syntactic sugars.
I will soon need to train my teammates to this pattern and have been working on a series of short articles to get them started.
The purpose of this first article is to introduce the basic concepts of asynchronous programming in C# with lots of visual content, which is probably what is missing the most in the articles found online. My intention here is to simplify the theory to only focus on the essence of asycnhronous programming.

The theory


Asycnhronous programming is meant to capture parts of your code into schedulable blocks called Task. These blocks will be executed in the background with a set of shared ressources called ThreadPool.



Let's take a look on how a classic single-threaded program would execute on a system.



A program (or code) is made of functions of functions. On a single-threaded application, this code will be executed sequentially by the same Thread.

Now if we want to leverage TAP with the same code, here is what the code would look like.



The functions of my code are now captured in Task objects that are scheduled for execution. This time, it is not necessarily the same Thread that will execute each block, it will depend on threads availability in the ThreadPool. However, the code will be executed in the exact same order.

So why is it different ?


If f(x) executes blocking operations (e.g. I/O read), the Thread will remain blocked until the operation completes. That thread will be unavailable for other operations during that time.
If T executes blocking operations, the execution of the Task will be suspended until the the blocking operation resumes. During that time, the Thread will be released and free to execute other Tasks if necessary.

So functionnally, both codes are equivalent but in terms of system resource consumption, they do not work identically.

How is this possible ?


A Task is a stateful object (see TaskStatus). Whenever the code hits an await statement, it will start a state machine for the execution of the Task.
When hitting a blocking asynchronous sub-function, my Task will enter WaitingForChildrenToComplete state and will be put aside. The system can detect when an IO completes and will resume the execution where it was left by reloading the execution context.

Pros and Cons 



  • A code that is executed synchronously will perform better than its asynchronous version. As previouly explained, the execution of asynchronous code requires the creation of a state machine and is dependent on threads availability.
  • Using TAP makes my system more scalable than its synchronous version. The resources of my system are only used when necessary which allows me to support a higher workload 


Asynchronous vs Parallel


A common mistake is to mix these two concepts. The purpose of asynchronous programming is not to offer a simplified framework for parallel processing. Most of the time, you should not even use Task.Run or Task.Factory.StartNew and I believe that's what creates confusion. TAP is not a multitasking framework, it is a "promise for execution" framework.

With that said, TAP provides a few interesting methods if you want to parallelize the execution of your Task objects with Task.WhenAll or Task.WhenAny.


 
biz.