Showing posts with label uncategorized. Show all posts
Showing posts with label uncategorized. Show all posts

Wednesday, March 4, 2020

Continuous package delivery with Azure DevOps

Background


Since my team started to work with Azure DevOps, we've been exploring its potential progressively : Boards, Repos, Pipelines ...
During the first phase we became accustomed to how each of these unitary services work and the entire team now feels confortable with the them.
At this moment, I consider that the team's efficiency is equal to what it was before using Azure except we are using different tools now.

Stopping our exploration there would have been too bad considering the amount of manual labor that can be automated with Azure. Our development process should ideally converge to what is commonly called a "software factory" where most of the steps from code commits to software package delivery are automated.
I am not saying CI/CD here but the philosophy is equivalent. The continuous deployment of our built system is (at the time of writing) incompatible with the business. The packages are deployed manually in production by running an installer from a USB key. There is no automatic update support and the machines connectivity is limited to workshop automation. Not the ideal situation for DevOps but I believe we will get there sooner or later.

Our first step toward continuous deployment is focused on how each of the system components are developped in the team. The source code now resides in Azure Repo but the integrator is still checking-out the code to build the entire solution at once on his computer which means : build scripts to be updated, versions to be bumped and all binaries packed into a single installer. There is no opening for code reusability between teams: our repositories are private and the components are never archived anywhere.

As I used to work with NuGet packages in the past, all my developments are always packed and published to Azure Artifact. These packages are automatically fetched by visual studio at build time and hence, consumed as binaries and not code. My team really wants to do the same but they have limited knowledge with NuGet, pipeline configuration and artifact management. Besides, the integrator complains about the versioning because he's the only one to know how to increment the versions. I seized that opportunity to think about a solution where my team would only focus on code without ever caring about versioning, packing and publishing of the artifact.

Automation stages


Automation = Azure pipeline and the steps listed below will need to be scripted in an azure-pipelines.yml file



Versioning


It is critical to have a consistent versioning strategy among the development team to easily dissociate a major (i.e. breaking) change from a minor (i.e. compatible) by looking at the version. Besides, the version plays a key role in maintainability as it shall point the developers to the right code snapshot in history to fix whatever bug is found on client site.

I have oriented my team to Semantic Versioning 2.0 (SemVer 2) which is compatible with our traditional versioning strategy.
The version is delimited by the classic 3 digits plus additional metadata as shown below.

Major.Minor.Path-PreRelease.Counter+Build

PartReason for changeNature
MajorIncompatible changes made to public APIMandatory version number
MinorNew features added but backward compatibility preservedMandatory version number
PathBug fixes with backward compatibility preservedMandatory version number
PreReleasePre-release alphanumeric tag to denote the versionOptional build metadata
CounterPre-release version counterOptional build metadata
BuildBuild alphanumeric tag to denote the versionOptional build metadata

This versioning pattern is already supported by Azure Artifacts service and NuGet packaging technology since NuGet 4.3.0+ and Visual Studio 2017.

Now that we have the strategy, we have to define how to stamp this version in the binaries.

GitVersion


GitVersion is one of the tools that promises to be an automated SemVer 2.0 versioning system which will generate the right version for your current code depending on the branch you are on and by looking back at the history of the code.

GitVersion comes as a command line tool that can be executed from any git repository. It works out-of-the-box with GitHubFlow and GitFlow branching strategies and will generate a version without ever modifying the code. The version can then be published as an environment variable or injected into AssemblyInfo.

Below is an overview of the versions generated by GitVersion with the default configuration.



On this diagram, each arrow represents a branching or merging operation. The black labels show the version that is returned by GitVersion when executed from each branch.

Here are the default metadata settings built into GitVersion.

masterhotfixreleasesdevfeatures
Versioning ruleLast tag used as base versionPatch incremented on creationBranch name used as next versionMinor incremented when merging from master or release branch
PreRelease betabetaalphabranch name
CounterIncremented and reset automatically by gitversion based on the references of a branch. Will keep incrementing until a higher digit gets incremented.
BuildIncremented on each commit since last tag

Note: To be used in .NET projects, the attributes AssemblyVersion and AssemblyFileVersion shall be deleted from AssemblyInfo and a dependency shall be added to GitVersionTask package.

Build and unit tests


Nothing special to be said on that part.
My team was already using pipelines with that setup and nothing needs to be modified for continuous package delivery.

Pack


GitVersion published multiple versions as environment variables. The output usually looks like this when printed in a json format:

{
  "Major":2,
  "Minor":3,
  "Patch":0,
  "PreReleaseTag":"alpha.2",
  "PreReleaseTagWithDash":"-alpha.2",
  "PreReleaseLabel":"alpha",
  "PreReleaseNumber":2,
  "WeightedPreReleaseNumber":2,
  "BuildMetaData":"",
  "BuildMetaDataPadded":"",
  "FullBuildMetaData":"Branch.dev.Sha.b5753b8ab047485908674e7a0c956009abff5528",
  "MajorMinorPatch":"2.3.0",
  "SemVer":"2.3.0-alpha.2",
  "LegacySemVer":"2.3.0-alpha2",
  "LegacySemVerPadded":"2.3.0-alpha0002",
  "AssemblySemVer":"2.3.0.0",
  "AssemblySemFileVer":"2.3.0.0",
  "FullSemVer":"2.3.0-alpha.2",
  "InformationalVersion":"2.3.0-alpha.2+Branch.dev.Sha.b5753b8ab047485908674e7a0c956009abff5528",
  "BranchName":"dev",
  "Sha":"b5753b8ab047485908674e7a0c956009abff5528",
  "ShortSha":"b5753b8",
  "NuGetVersionV2":"2.3.0-alpha0002",
  "NuGetVersion":"2.3.0-alpha0002",
  "NuGetPreReleaseTagV2":"alpha0002",
  "NuGetPreReleaseTag":"alpha0002",
  "VersionSourceSha":"0f42b52188fcda73f3e407063db85695ce4ace1a",
  "CommitsSinceVersionSource":2,
  "CommitsSinceVersionSourcePadded":"0002",
  "CommitDate":"2020-02-28"
}

There is a version string especially dedicated to NuGet packages : NuGetVersion. So all there is to do here is to inject that value into the packing task :

 # Package assemblies
  - task: NuGetCommand@2
    displayName: 'Packaging the artifact'
    inputs:
      command: 'pack'
      packagesToPack: '**/*.csproj;!**/*Tests.csproj'
      versioningScheme: 'byEnvVar'
      versionEnvVar: GitVersion.NuGetVersion
      includeReferencedProjects: true
      configuration: 'Release'

Publish


When a build completes, the created package will reside in what Azure defines as the staging directory, which is where the repository has been cloned for the build. This location is not accessible and if the team wants to share the package within the organization, they have to publish the artifact.

In Azure, the artifacts are stored in Feeds. A Feed is a repository for specific types of packages (npm, pypi, NuGet,…). All teams in Azure are free to create one or several Feeds depending on their needs.

Each Feed can have several Views. A View acts as an overlay of the Feed and is intended to filter the content. This concept has been originally introduced to defined several stages before releasing an artifact. By default, each Feed comes with 3 Views : @Local, @PreRelease and @Release, which store respectively development, release candidates and production artifacts. The diagram below summarizes these concepts.



By default, all packages are published into @Local. This View shall only be visible by developers to avoid interlocks during development.
When a release candidate is ready, the integrator shall promote the package from @Local to @PreRelease. The package becomes visible to the testers for verification and validation.
When a package is finally validated, the integrator will generate a new package that he will promote to @Release. The package becomes visible to all stakeholders within the organization.

Each Feed can define a maximum retention time for the package it stores. When the delay expires, the package is deleted. This retention delay is only applied to @Local and promoted packages won't be deleted by the defined policy.

It is up to each team to configure the permission levels for each view.

After configuring our Azure Artifact feed with the proper permission levels and retention time, we were ready to rollout the first automated package publication.
It worked as expected for the test project. One of my input requirements was that my team needs to focus on code only which means that they should never configure the pipeline for their project. As the pipeline configuration file stands in the project repository, I looked for a way of reusing existing pipeline configuration files ...

Pipeline template


Since December 2019, Azure supports templating with the reuse of pipeline config files located in external repositories. My team and I arrived just in time ! :)

Below is the template that I have pushed to a 'TeamProcess' repository:

# File : base-netfull-pipeline.yml
#
# Azure pipeline configuration to build .NET Framework projects and publish
# them as NuGet artifacts into GF.MS.LAS.Machine Azure feed
parameters:
# Solution path in repository
- name: 'solution'
  default: '**/*.sln'
  type: string
# Target build platform
- name: 'buildPlatform'
  default: 'Any CPU'
  type: string
# Build configuration
- name: 'buildConfiguration'
  default: 'Release'
  type: string
# Build virtual image
- name: 'vmImage'
  default: 'windows-latest'
  type: string
# Source feed
- name: 'feed'
  default: '7ea4c5d0-fe57-441e-9fac-f026c9bb1207'
  type: string
# Packages to pack
- name: 'packagesToPack'
  default: '**/*.csproj;!**/*Tests.csproj'
  type: string
# Packages to push
- name: 'packagesToPush'
  default: '$(Build.ArtifactStagingDirectory)/**/*.nupkg;!$(Build.ArtifactStagingDirectory)/**/*.symbols.nupkg'
  type: string
# Does NuGet shall include all dependencies as reference package and/or dlls in the artifact ?
- name: packageAddReferences
  type: boolean
  default: true


jobs:
- job: Build
  pool:
    vmImage: ${{ parameters.vmImage }}
  steps:
  # Install NuGet utility
  - task: NuGetToolInstaller@1
    displayName: 'Installing NuGet utility'

  # Generate SemVer version
  - task: DotNetCoreCLI@2
    displayName: 'Install gitversion'
    inputs:
      command: 'custom'
      custom: 'tool'
      arguments: 'install -g gitversion.tool'

  - task: DotNetCoreCLI@2
    displayName: 'Gitversion setup'
    inputs:
      command: 'custom'
      custom: 'gitversion'
      arguments: '/output buildserver'

  # Restore project dependencies
  - task: NuGetCommand@2
    displayName: 'Restoring dependencies of the package'
    inputs:
      command: 'restore'
      restoreSolution: '${{ parameters.solution }}'
      feedsToUse: 'select'
      vstsFeed: '${{ parameters.feed }}'

  # Build
  - task: VSBuild@1
    displayName: 'Building solution'
    inputs:
      solution: '${{ parameters.solution }}'
      platform: '${{ parameters.buildPlatform }}'
      configuration: '${{ parameters.buildConfiguration }}'

  # Execute unit tests
  - task: VSTest@2
    displayName: 'Executing unit tests'
    inputs:
      platform: '${{ parameters.buildPlatform }}'
      configuration: '${{ parameters.buildConfiguration }}'

  # Package assemblies
  - task: NuGetCommand@2
    displayName: 'Packaging the artifact'
    inputs:
      command: 'pack'
      packagesToPack: '${{ parameters.packagesToPack }}'
      versioningScheme: 'byEnvVar'
      versionEnvVar: GitVersion.NuGetVersion
      includeReferencedProjects: ${{ parameters.packageAddReferences }}
      configuration: '${{ parameters.buildConfiguration }}'

  # Publish assemblies
  - task: NuGetCommand@2
    displayName: 'Publishing the artifact to feed'
    inputs:
      command: 'push'
      packagesToPush: '${{ parameters.packagesToPush }}'
      nuGetFeedType: 'internal'
      publishVstsFeed: '${{ parameters.feed }}'

To reuse this template, a project will create its own azure-pipelines.yml file with the following content:

# File: azure-pipelines.yml
resources:
  repositories:
    - repository: templates
      type: git
      name: TeamProcess

# Template reference
jobs:
- template: Process/Pipelines/net/base-netfull-pipeline.yml@templates

Conclusion


With a few days invested in reading msdn and other literature about the setup of Azure, I managed to achieve the creation of a fully automated NuGet package continuous delivery flow. The automatic versioning of the code is something I never thought of in the past but it is a real game changer. Without it, creating this delivery flow would have been trickier with additional scripting and/or code commits prior to build. The only difficulty I face was related to GitVersion add-on(s) in Azure. Many of them co-exist in the marketplace and it's really confusing. My recommendation is to used DotNetCLI instead which is a robust workaround to the add-on.


Wednesday, September 4, 2019

Leading a software project from A to Z

Although building a project from ground up is a very challenging task that requires solid nerves, patience, attention to details, good communication skills and mostly motivation, it is also the most interesting work that you will do in your career. You will live the full adventure, not just a part of it where you're usually asked to do something with little to no freedom at all. This time, you are setting up the rules, you are picking the tools and the technologies and you are designing YOUR solution.

Here are some rules that you better follow if you don't want to turn your project into a nightmare.

1. Do a pre-study


The initial requirements are usually very general, giving an overview of what we want but not how to make it. Even if I tell you to develop the exact same watch as the Fitbit Ionic, you can't just rush into the development. You will have to understand the technology, analyze the materials, define how the product shall interact with user, computers, etc...

Build a solid pre-sudy document where every aspect of the product is described, eventually listing different potential solutions with pros and cons. This will naturally lead you to your next step : the product architecture specification.

2. Take all the time you need to write good specifications


Spoiler alert: if you were to develop your product alone, this step is where you would spend one third of your time.

Business managers will always push you to provide the product asap, never caring about QUALITY. To be honest, they do care but differently:

  • For developers, quality is all about architecture, code metrics, tests
  • For business managers, quality is about functional aspects. As long as it works, the quality level is high
  • For QA, quality means following the process. Are all the documents ready and signed ? If yes, quality is alright

What about clients ?

Clients will always want the best product for their money. Their quality perception is based on usability, performances, materials, robustness and ... support services !
If a company cannot sustain its own products, it will dig its own grave because of its high operational cost.

How to prevent this ?

It all starts with specifications. Developers complain about documents that we write but never read and there is a little bit of truth but the exercice worth it because it will highlight many aspects that you'd have ignored in your design (error cases, serviceability, deployability, update startegy, ...).

Where do I start ?

My advice is that you first create diagrams to visualize the features. If your project is not 100% software, then create an SADT diagram with interactions between functions (or go for sysML if you have time). Doing this will help you to distribute the functions between hardware and software.



Now, on software side, think about a design concept and try to put it in a document where you'll elaborate the following points:

* Use case view: Think about how the users will access/use your product and put it in a simple UML Use Case diagram



* Development view: define the modules that will compose your software and use a diagram to show how they'll interact with simple arrows. For instance, this is where I usually put my layered architecture overview



* Logical view: Use this part to define your objects/interfaces. Do not go into details ! At this stage, we just want to understand the intent of each module. I usually insert class diagrams with my interfaces in here.



* Process view: Up to here, the reader only had a static overview of your design. Now you tell him how the modules that you described will behave at runtime (processes, threads, message queues, timers, ...). If the product has performance requirement, explain how they'll be met here.

* Components/package view: A package is a deployable unit. How will you organize your modules/components into these packages ?

Résultat de recherche d'images pour "package diagram"

Once done, you can move forward with the Software Requirements Specifications (usually called SRS). I strongly recommend you to google 'IEEE SRS' and to get inspired by the IEEE template. It gives very good indications on how your specifications should be written.

In a nutshell:
  • Each requirement shall be testable: do not use vague words like 'hot', 'cold', 'fast enough' ... Give concrete values that a tester can validate.
  • Each requirement shall be uniquely identified: use IDs for all of your requirements. They will be used for traceability in the entire V cycle.
  • Split the specifications following the component structure that you defined in the design concept. For each component, define the inputs, the processing and the outputs
  • For views, insert mockups and specify the actions of each control

3. Pick up your gears


Some technology choices might already be defined at this stage (they influence your design concept). If not, select the frameworks, development environments, source code manager, issue tracker and all other tools that will be necessary to develop/build/deploy your project.
This information usually goes into a 'quality plan' document. This is also where we usually define the branching model, coding rules, planning overview... DO THIS. Especially if you're a team of developers. You'll notice inconsistencies in code, in source code manager and/or in issue tracker if you don't specify the rules somewhere.

4. Organize the work


The components you defined have natural dependencies with each others. Focus on the ones that have less dependencies first.

Tasks
For each components, create a story/task in your issue tracking tool. This task can be broken down into smaller tasks by the developer when taken care of. Evaluate the difficulty of developments for each task and rate them properly either by duration or with story points (recommended).
If you're lucky to work with tools like Jira, create epics for the features that you'd like to implement and organize your tasks. At the very beginning, you can create an epic for the first prototype.

Milestones
Your business manager expects only one delivery but creating intermediate releases along the way is very helpful for demos, regression investigation, test of deployments/updates...
Define the milestones (v1.0.0, v1.1.0, v1.2.0...) and detail what's going to be included in each of them. Be realistic with delivery dates and do not try to anticipate too much. See, when managing a project, people tend to make promises and set unrealistic release dates. In the end, almost all the projects I've seen have been delayed and the funny fact is that it wasn't even due to the development. Clients can make new requests or change existing ones, business can down-prioritize your project, one of the technology used can be not supported anymore, a key developer can resign ... Unexpected things will happen for sure, be prepared.

Ideally, adopt agile methodology and work in sprints.

5. Plan -> Execute -> Check -> Act -> Plan ...


Note: If you're working agile, you're already covering this.

Always re-assess your software requirements depending on business needs. Ideally, bring the client into the loop and ensure that he is aligned with your plan and satisfied with your latest features. If not, plan for the changes and execute.

Also, even if it seems obvious to me, unit tests are not optional. I really encourage you to track your metrics during this phase and to do the efforts to keep them in the green zone.

If you're coordinating other developers, make sure they always understand what they should do. If necessary, take time to explain them your design over and over again. Don't be afraid of repeating things, it can be annoying but you have to be tolerant with your manpower as much as possible.

6. Release often


If your milestone is supposed to end in 3 months, do not wait 3 months to release.
Releasing means bringing all the features together, versioning and deploying. Do it at least at the end of each sprint (if you're not working agile, plan weekly builds). It will reduce the integration penalty and, ideally, will allow you to anticipate on functional/user acceptance tests.

For the versioning, we usually use 3 digits to do this:
  • Major: Only incremented when major changes are made to the code with an impact for the end-user.
  • Minor: Only incremented when minor changes are made to the code without really changing the features but rather extending them slightly. "Look and feel" changes are also usually considered to be minor changes if they do not impact the UX (e.g. usability) heavily.
  • Micro/Build: Incremented for code rework, minor bug fixes, improvement of test coverage, ...

It's not a rule though, you're free to use your own versioning as long as it is consistent over time and that it helps you to identify your software easily.


7. Validation


Most neglected aspect that nevertheless makes a total difference if executed properly, the validation is in my eyes a must-do. Ideally, try to think about validation when writing functional requirements because each of the requirement will need to be testable. If a requirement is too vague to be tested, it can't be implemented, just as simple as that. As a V&V engineer, I should be able to write my test plan based on the functional requirements without any doubt on what needs to be tested.

To illustrate this, here are some examples:

Req 1.1: As a driver, when I press the brakes pedal, my car should stop.

NOT TESTABLE: I see at least 2 reasons :
  • this requirement is not time-bound: if my test proves that the car stops after 30 minutes, the system is valid. 
  • 'should' to be replaced by 'shall'. We want no doubt about what the system shall do.

Req 1.1: As a driver, when I press the brakes pedal, my car shall stop within x seconds with x given with the following relation: x = (speed/10)²

TESTABLE

8. Conclusion


This article gives you an overview of what you'll have to do if you drive a project and lots of other aspects have not been detailed such as reporting, interfacing with other stakeholders, CI/CD ... It can be scary and you'll make mistakes (all of us do) but as long as you try to setup a framework for you and your team and keep executing as defined, you'll be right enough. As a developer, I like to have consistency in my code. Well, as I project leader, I like to have consistency in my project & team.



Thursday, October 4, 2018

Create NuGet packages for Managed and Native libraries

Managed libraries


Nuspec file


Before jumping into nuspec file creation, one should know that nuget can generate a package directly from a visual studio project:

nuget pack MyPackage.csproj

If you're willing to enter advanced mode, then you need a nuspec file.

First of all, instead of recreating the wheel, check the official documentation here : https://docs.microsoft.com/en-us/nuget/create-packages/creating-a-package

Now, here is a shortcut :
    1. Create a .nuspec file and fill it as follows:

    <?xml version="1.0"?>
    <package >
    <metadata>
        <id>Mypackage</id> <!-- Must be unique to avoid confusion during package resolutions -->
        <version>1.0.0</version> <!-- major.minor.micro[.optional] -->
        <title>My first package</title> <!-- Title of the library as it will appear in nuget manager -->
        <authors>Deadpool</authors> <!-- Author of the component -->
        <owners>Marvel</owners> <!-- Legal owner of the component -->
        <description>
        This is a short description of the component and describes what it is intended for.
        </description>
        <releaseNotes>
        [Optional] If this version has any particularity, it should be listed here
        </releaseNotes>
        <summary>
        Same as description but shorter
        </summary>
        <language>en-US</language>
        <projectUrl>http://mygitserver.com/mypackage</projectUrl>
        <requireLicenseAcceptance>false</requireLicenseAcceptance>
        <licenseUrl>http://opensource.org/licenses/Apache-2.0</licenseUrl> <!-- Or any other -->
        <copyright>Copyright if any</copyright>
        <dependencies> <!-- List any dependency that should be checked out with this package -->
            <group targetFramework="net35"> <!-- check all target frameworks here : https://docs.microsoft.com/en-us/nuget/reference/target-frameworks#supported-frameworks -->
                <dependency id="MyDependency" version="2.0.1" /> <!-- Here we specify to use MyDependency v2.0.1 built for .NET3.5 -->
                <!-- Others listed here -->
            </group>
        </dependencies> 
        <references></references> 
        <tags></tags> <!-- [Optional] any tags that would help your team finding your package here -->
    </metadata>
    <files>
        <!-- This will copy all the files from release build output to a lib subfolder in the package -->
        <!-- All references placed under lib will be automatically added to your project -->
        <!-- lib subfolder is mandatory. Create a folder for each supported frameworks -->
        <file src="bin\Release\**" target="lib\net35" exclude="**.pdb" /> 
        <!-- [Optional] Add documentation to the library -->
        <file src="doc\**" target="doc" />
    </files>
    </package>

Once this is done, creating the package is straightforward:

nuget pack Mypackage.nuspec

Notes that might help :

  •  Be careful with dependencies. If your package A depends on a package B 1.0.0.3258 with the latest digit being a build number, ensure that the exact same version of package B is packaged and stored in your nuget server. The build number is re-generated at each build and if it changes, the dependency resolution will fail. Ideally, remove this last digit from version reference.
  •  If your managed library relies on a native dll that is platform sepcific, you'll need to add the dlls to your package as follows:

<?xml version="1.0"?>
<package >
  <metadata>
    <id>Phidgets</id>
    <version>2.1.8</version>
    <title>Phidgets</title>
    <authors>Me</authors>
    <owners>Me</owners>
    <description>
      Phidget interfacing library
    </description>
    <releaseNotes></releaseNotes>
    <summary>
      Phidget interfacing library for .NET
    </summary>
    <language>en-US</language>
    <projectUrl>https://mygitserver.com/Phidgets</projectUrl>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <licenseUrl>http://opensource.org/licenses/Apache-2.0</licenseUrl>
    <copyright></copyright>
    <dependencies>
    </dependencies>
    <references></references>
    <tags></tags>
  </metadata>
  <files>
        <file src="lib\net20\**" target="lib\net20" exclude="**.pdb;" />
        <!-- Platform specific libraries are usually placed under a runtime subfolder in the package. -->
        <!-- The second level specifies the platform with an RID identifier (see here : https://docs.microsoft.com/en-us/dotnet/core/rid-catalog) -->
        <!-- Those libraries won't be copied to your output directory automatically. For that, either use a .targets file (see further in this article) or add a build step in your main project -->
        <file src="native\x86\**" target="runtimes\win-x86\native" />
        <file src="native\x64\**" target="runtimes\win-x64\native" />
  </files>
</package>

Native libraries


Native packages can be generated with a third party app called CoApp if you're not running under Windows 10. Unfortunately, this tool is not maintained anymore, and Win10 users will have to do that manually.

The good news is: it's not that hard.

If you understood the steps for a managed package, you've almost understood how to create a native package as well. Here is an example of nuspec file for a native library:

<?xml version="1.0"?>
<package >
  <metadata>
    <id>stringlib</id>
    <version>4.1.0</version>
    <title>formatlib</title>
    <authors>Me</authors>
    <owners>Me</owners>
    <description>
      Provides string formatting functions for C/C++
    </description>
    <releaseNotes></releaseNotes>
    <summary>
      String formatting library for C/C++
    </summary>
    <language>en-US</language>
    <projectUrl>https://mygitserver.com/stringlib</projectUrl>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <licenseUrl>https://opensource.org/licenses/BSD-3-Clause</licenseUrl>
    <copyright></copyright>
    <dependencies>
    </dependencies>
    <references></references>
    <tags>native</tags> <!-- This is important ! -->
  </metadata>
  <files>
      <!-- Note that now, binaries are placed conventionally under a directory named build -->
      <file src="bin\x86\Release\**" target="build\native\libs" exclude="**.pdb"/>
      <file src="include\**" target="build\native\interface" />
      <file src="doc\**" target="doc" />
  </files>
</package>

AFAIK, there is no clear rule on how one should manage the package structure. The structure I use is:

build
|-native
|-|-libs --> binaries go here
|-|-interface --> public headers go here
doc 

If I want to support several platforms, I can do that:
build
|-native
|-|-libs 
|-|-|-win-x86 --> binaries go here
|-|-|-win-x64 --> binaries go here
|-|-|-linux-x86 --> binaries go here
|-|-interface --> public headers go here
doc 

or

build
|-native
|-|-libs 
|-|-|-v100 --> VS2010 binaries go here
|-|-|-v110 --> VS2012 binaries go here
|-|-interface --> public headers go here
doc 

It is up to you to chose how you'd like them stored. Try to be generic (platform->architecture->toolchain).

Once the nuspec file is done, we have to add build rules that will automatically be merged into our solution. To do that, we can create .props file or .targets file or both !
These files will be imported into our project and Visual Studio will execute them during build.

These files need to be placed under 'build' subfolder in your nuget package and shall be named with your package id:

build
|-native
|-|-libs 
stringbuild.props
stringbuild.targets
|-|-interface --> public headers go here
doc 

Copy and paste the following code into your props file :

<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" 
    xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <ItemDefinitionGroup>
        <ClCompile>
            <AdditionalIncludeDirectories>$(MSBuildThisFileDirectory)native\interface\;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
        </ClCompile>
    </ItemDefinitionGroup>
    <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
        <ClCompile>
            <RuntimeLibrary>MultiThreadedDebug</RuntimeLibrary>
        </ClCompile>
        <Link>
            <AdditionalLibraryDirectories>$(MSBuildThisFileDirectory)native\libs\;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
            <!-- Put your lib name here -->
            <AdditionalDependencies>stringlib.lib;%(AdditionalDependencies)</AdditionalDependencies>
        </Link>
    </ItemDefinitionGroup>
    <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
        <ClCompile>
            <RuntimeLibrary>MultiThreaded</RuntimeLibrary>
        </ClCompile>
        <Link>
            <AdditionalLibraryDirectories>$(MSBuildThisFileDirectory)native\libs\;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
            <!-- Put your lib name here -->
            <AdditionalDependencies>stringlib.lib;%(AdditionalDependencies)</AdditionalDependencies>
        </Link>
    </ItemDefinitionGroup>
</Project>
This file tells Visual studio to add your package directory in include dirs and lib dirs. It also adds a dependency to the project. Say you'd like to automatically copy your dlls to your output directory, in this case, copy and paste the following code into your targets file:
<?xml version="1.0" encoding="utf-8" ?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <ItemGroup>
        <StringLibBinaries Include="$(MSBuildThisFileDirectory)native\libs\*.dll" />
    </ItemGroup>
    <Target Name="StringLibCustomTarget" AfterTargets="Build">
        <Copy SourceFiles="@(StringLibBinaries)" DestinationFolder="$(OutDir)" />
    </Target>
</Project>

We're doing this in a targets file because props files can only add a build step that will overwrite any existing build steps in your project.

Alright, now that you're done, just pack your nuget package

nuget pack stringlib.nuspec

Notes that might help :

  •  If you're planning to pack static libraries, you might encounter link issues due to incremental debug level. Unfortunately, the only solution I found here was to pack both Debug and Release libraries and to link the right one against your project depending on your configuration

<AdditionalLibraryDirectories>$(MSBuildThisFileDirectory)native\libs\$(Configuration)\;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>

  •  Same trick can be used if you have packed libraries for different visual studio toolchains

<AdditionalLibraryDirectories>$(MSBuildThisFileDirectory)native\libs\$(PlatformToolset)\;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>

  •  [Still to be investigated] Some packages are not installing properly from NuGet manager. The installation starts and stops without the green check mark that confirms the installation. I noticed that this was related to the name (e.g. ID) of the package. Suprisingly, adding numbers to the ID (and hence to the names of props and targets files) resolved the issue. This has been seen with VS2012, using Nuget v2.8.

Other things to know about Nuget

Check which version of nuget you need according to the version of visual studio you are using to avoid confusion !

List remote packages

nuget list -Source <url or server name>

Delete package on remote

nuget delete mypackage 1.0.0 -Source <url or server name>

Push a package to remote

nuget push mypackage.1.0.0.nupkg -Source <url>[/subfolder]

with subfolder typically being native or managed.

Force visual studio to install packages in a local directory

Nuget configuration can be overriden by adding a nuget.config file at the root of your solution.
To change the install directory, add this to the file:

<?xml version="1.0" encoding="utf-8"?>
<settings>
  <repositoryPath>..\packages</repositoryPath>
</settings>

In this example, packages will be placed in an upper folder.
Other options can be overriden as listed here : https://docs.microsoft.com/en-us/nuget/reference/nuget-config-file#example-config-file

Monday, August 13, 2018

Migration from SVN to GIT

Step 0 : Define migration strategy



Option 1 [recommended] : If possible, do not bother to migrate the SVN history to git but rather keep the old repositories archived and start over in git from the tip of the trunk.

Option 2 : Migrate trunk and tags only.

Option 3 [can be complex] : Migrate the entire repository.

I picked option 3 because my colleagues wanted to keep all of the history. IMO this is not necessary as long as the old repositories are still accessible. Plus, the migrated history is not perfectly identical to SVN and can be more confusing than helpful.


Step 1 : Create a file with authors


We should convert authors' svn identifiers to git format (Firstname Name <mail>).
To do this, run the following command from your SVN repo

svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt


Now edit the file and fill the missing information to finally have something like that:

MMO = Mickey Mouse <mickey.mouse@disney.com>
DDU = Donald Duck <donald.duck@disney.com>

Step 2 : Migrate dependencies first


If your main project is using externals, those will need to be converted to submodules (other options are possible like subtree or third party git externals but in most cases, submodule will do just fine).

To do that, repeat the following commands for each dependency repository:

git svn clone --trunk=<trunk_dir> --branches=<branches_subdir> --tags=<tags_subdir> --no-metadata --authors-file=authors.txt <svn_repo_url> .


Note: if the svn repository follows the standard 'trunk,branches,tags' structure, replace '--trunk=<trunk_dir> --branches=<branches_subdir> --tags=<tags_subdir>' with --stdlayout

Now export the branch list to a file.

git branch -r > branches.txt


Filter the list of branches in branches.txt and keep only those you want to migrate.
Copy tags to a separate file for now (tags.txt).

Now for each entry in branches.txt, do:

git checkout -b <local_branch_name> <remote_branch_name>


Use the branch name in branches.txt for <remote_branch_name>.

For each tag, do the following:

git checkout -b <local_branch_name> <remote_branch_name>
git tag <local_branch_name>
git checkout master
git branch -D <local_branch_name> 

Tags are exported as branches by git-svn so we are forcing a checkout and we manually add the tag in git repository.

Now run the following command to clean history from empty commits (migration side-effect):

git filter-branch --prune-empty -f -- --all


Delete trunk branch

git branch -D trunk


Configure remote git repository

Here I suppose that you already have created a remote git repository to host your code.

git remote add origin <git_remote_repo_url>


Finally push the repo

git push origin --mirror


Step 3 : Migrate the main project repository


Repeat steps from Step 2 until trunk deletion.
From there, we will need to resolve externals.

We will start with master branch, but these steps need to be repeated for each branch.

git checkout master


Run the following command to export the list of externals to a file.

git svn show-externals --id=origin/trunk > externals.txt


Note: In my case, this command wouldn't work for other branches and it is particularily slow. I managed to achieve the same thing from svn command line in the old repository, for each branch.

svn propget svn:externals -R <externals_dir> > externals.txt


Suppose our svn:externals were in a folder named Externals, here is what should be done for each submodule:

git submodule add --force <dependency_git_remote_repo_url> ./Externals/<submodule_dir>


Then we need to fix the submodule to the commit that matches the version used in svn.

cd ./Externals/<submodule_dir>


For tags:
git checkout <commit_id_or_tag_label>


For branches:
git checkout -b <local_name> <remote_git_branch_name>


cd ../..


Repeat these steps for all submodules. Once done, commit:

git commit -m "Migrated svn:externals properties to .gitmodules"


Repeat these steps for all branches. Before starting, for each branch, do cleanup to avoid confusing error messages:

rm -rf ./Externals


Finally, when all branches have been processed, push the repository to remote server:

git remote add origin <git_remote_repo_url>

git push origin --mirror

Tuesday, November 21, 2017

Host a django app in Apache [Windows]

This article explains how to host a django application in Apache server. The following steps are valid for both 32 and 64 bits machine but be sure to install all of the tools for the same architecture.

For the following steps, we imagine that our project is located in C:/tools/SupervisionTool and is structured as shown below:

SupervisionTool/
  Analyzers/
    migrations/
    static/
    templates/
    ...
  SupervisionTool
    settings.py
    urls.py, wsgi.py
  manage.py


Before going further, be sure that you have the Administrator rights on the machine.

1. Install Python 3.6 and ensure that it is added to PATH
2. Install the last version of WAMP
3. Install VC15 redistributable tools
4. Create an environment variable named MOD_WSGI_APACHE_ROOTDIR and set it to apache install directory (c:/<wamp install dir>/bin/apache/apache<version>)
5. If you haven't created a virtual environment for your django application, start with

pip install virtualenvwrapper-win
mkvirtualenv SupervisionTool

<install all the packages necessary for your application (django,pypiwin32,...)>

Then

workon SupervisionTool
pip install mod_wsgi
mod_wsgi-express module-config

The last command will print the lines that will be necessary to configure Apache. You will see something like:


LoadFile "c:/python36/python36.dll"
LoadModule wsgi_module "c:/users/administrateur/envs/supervisiontool/lib/site-packages/mod_wsgi/server/mod_wsgi.cp36-win_amd64.pyd"

Copy these lines and add them to c:/<wamp install dir>/bin/apache/apache<version>/conf/httpd.conf

6. Open c:/<wamp install dir>/bin/apache/apache<version>/conf/extra/httpd-chosts.conf and replace the content with the following lines:

# virtual SupervisionTool
<VirtualHost *:80>
    ServerName localhost
    ServerAlias supervisiontool@lni-swissgas.lni.ads
    ErrorLog "logs/supervisiontool.error.log"
    CustomLog "logs/supervisiontool.access.log" combined
    WSGIScriptAlias /  "C:/tools/SupervisionTool/SupervisionTool/wsgi.py"
    <Directory "C:/tools/SupervisionTool/SupervisionTool">
        <Files wsgi.py>
            Require all granted
        </Files>
    </Directory>

    Alias /Analyzers/static "C:/tools/SupervisionTool/static"
    <Directory "C:/tools/SupervisionTool/static">
        Require all granted
    </Directory>  
</VirtualHost>
# end virtual SupervisionTool


Note: Replace C:/tools/SupervisionTool with your install directory.

7. Run the following command from the project root dir to serve all of the static file (admin included) from a unique directory (defined by STATIC_ROOT):

python manage.py collectstatic

8. Set ALLOWED_HOSTS in SupervisionTool/settings.py to ['*'] (or list the IP adresses or symbols that will give access to the app)
9. Open SupervisionTool/wsgi.py and replace its content with :

activate_this = 'C:/Users/administrateur/Envs/SupervisionTool/Scripts/activate_this.py'
# execfile(activate_this, dict(__file__=activate_this))
exec(open(activate_this).read(),dict(__file__=activate_this))

import os
import sys
import site

# Add the site-packages of the chosen virtualenv to work with
site.addsitedir('C:/Users/administrateur/Envs/SupervisionTool/Lib/site-packages')

# Add the app's directory to the PYTHONPATH
sys.path.append('C:/tools/SupervisionTool')
sys.path.append('C:/tools/SupervisionTool/SupervisionTool')

os.environ['DJANGO_SETTINGS_MODULE'] = 'SupervisionTool.settings'
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "SupervisionTool.settings")

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()


9. Open Services in Windows and fin Apache (here wampapache) and set the start mode to Automatic. Finally, start apache service.

10. Type localhost in your web browser and verify that your app is accessible.


Troubleshooting



  • Compilation error with mod_wsgi-express module-config : The most probable reason is that you mixed 32/64 bits architectures when installing the tools. Try a cleanup and reinstall.
  • Apache fails to start in Services : Potentially a syntax error in configuration files. To get more information about this, right click on Wamp tools in tray icon menu, select Tools->Show Apache loaded Modules
  • Still not working ? Ensure that you configured PYTHONPATH correctly on host (here, it should be set to C:/tools/SupervisionTool/Analyzers)


Thursday, October 5, 2017

Application logs

Logging is a fundamental mechanism for the application. It brings valuable information to support teams when something goes wrong and saves a lot of time in bug investigation. But this utility can easily turn into a "spam engine" if the developers do not carefully select what/when/how much to log.

What do we expect from logs

A support team will look for relevant events or user actions that can explain a troublesome situation. Here is an example:

09:50:05 [INFO] [AppOperation] Mix configured with CMP=Nitrogen, CST=SynthAir, C=245,8 ppm
09:50:05 [INFO] [Solver] Computed flows for (C=2445,8 pp) : MFC1=1200 mlmin, MFC2=2450 mlmin
09:50:06 [INFO] [AppCalibration] Compensation of MFC1=1200 mlmin to MFC1=1202 mlmin
09:50:06 [INFO] [AppCalibration] Compensation of MFC2=2450 mlmin to MFC1=2449 mlmin
09:50:06 [INFO] [Peripherals] Setting MFC1=1200 mlmin
09:50:06 [INFO] [Peripherals] Setting MFC2=2449 mlmin
09:50:06 [INFO] [Peripherals] Setting Hardware state to Mix2Levels
09:50:06 [INFO] [AppOperation] Mix started
09:52:08 [ERROR] [AppAlarm] (Stability Alarm ON) MFC1 flow is not stable (Measured=1015 mlmin)

In this scenario, the user complains about poor results on his device. We suppose that the Alarm was not signaled on a visual indicator to inform the user. With this bunch of logs, developers can immediately:
  • Point the origin of the issue 
  • Get the conditions of the operation that lead to the issue and try to run the same experiment at his office
Obviously, none of these actions will solve the problem but the information that logs gave to the developer will highlight which functional part went wrong, and help him to follow the right track to resolve the client issue.
 It is essential for developers to have the following information in logs:
  • Timestamp 
  • Log level : developers will filter the errors or warnings first to find an obvious reason for the malfunction 
  • Synthetic information : Long phrases are useless: a log must go the point, there is no litterature trophy for well-written logs. Most of the time, use action verbs and parameters. 
  • Workable data : the parameters listed in our logs specify numerical values with a unit. It seems obvious to say that without the unit, the developer cannot investigate. Plus, the location where the error occured is necessary. In our example, the name of classes are displayed. Either do this or use functional area names logged (ex: Config, Security, Alarms, ...). 

Other things to have in mind:
  • Keep files small : do not go over a few MB for text files (I do not include DB here). Opening and parsing a heavy file is long and inefficient. Use several small rolling files instead. 
  • Keep short history : there is no need to keep traces from last week if nothing happened there. This is closely related to file size for text logs.

What to log

  • Application events: notifications, alarms, state changes, ... 
  • Operation-related information : settings update, user actions, ...

What NOT to log

  • Periodic data : if live values need to be logged regularly, put them in a separate log file and preferably go for a DB that can handle large amount of entries better. Developers can then refer to timestamps for investigations. 
  • GUI input errors : if a user enters invalid value, it is not an error for the application, it is an error for the form. As long as the frontend does not interact with the application, whatever happens between the user and the form does not have an impact on the application.

Where to log

A software architecture is built with modules interacting with each others. The business logic of the application is dispatched between these modules. So basically, each functional area of the application shall manage its own logs. To do things neatly, centralize the logging features in a service shared by all components.
Do not repeat information. Take the following example: A calls B, B fails. This is the type of log that we should expect from it:

10:00:01 [INFO] [A] Action A1 started
10:00:01 [INFO] [B] Setting xxx=1234
10:00:01 [ERROR] [A] Action A1 failed (Code=Invalid Setting Error)

and not something like :

10:00:01 [INFO] [A] Action A1 started
10:00:01 [INFO] [B] Setting xxx=1234
10:00:01 [ERROR] [B] Given value is invalid
10:00:01 [ERROR] [A] Action A1 failed (Code=Invalid Setting Error)

The error occured at a known place, so either log it from there or from the client (recommended).

Debugging information

Take the following example of code:

void ExecuteUserRequest()
{
   env = GetEnvironmentValue();
   if(env < THRESHOLD)
   {
      doA();
   }
   else
   {
      doB();
   }
}
Here, myFunc behaves depending on an external factor. Here, we can chose to log the action (doA or doB) with an INFO level and add debugging information to the user to help him understand why this precise action occured :

void ExecuteUserRequest()
{
   env = GetEnvironmentValue();
   LogDebug("env = %d", env);
   if(env < THRESHOLD)
   {
      LogInfo("Executing A");
      doA();
   }
   else if(env >= THRESHOLD)
   {
      LogInfo("Executing B");
      doB();
   }
}
Generally speaking, Debug level gives execution details and exposes implementation data to the developer.

Memo

Here are some basic rules that I use to write my application logs.

Code type Log level
user input error (GUI) No log.
Frequently accessed functions (ex: sensor value read) Debug or no log if we log in a text file. Use a DB if possible.
Implementation detail (subroutines, intermediate values,...) Debug
User action or application event Info
Alarm or abnormal situation that does not result in an error immediately Warn
Operation failure on user action Warn if the error is normal given the context (the product acts as specified). Error otherwise.
Application error (internal non-critical errors or errors managed by the application) Error
Critical non recoverable error (unmanaged exception, errors that lead to a non-functioning application) Fatal

Friday, July 21, 2017

Build U-Boot for BeagleCore

BeagleCore is a variant of the BeagleBone Black. It consists of 2 modules : BCM1 and BCS2:
* BCM1 is the System-On-Module that contains most of the BeagleBone Black design
* BCS2 is a mainboard with the form factor of the original BeagleBone Black. The BCM1 is soldered on BCS2 and the pair is an equivalence of the BBB.

Recently, I needed to run U-Boot on a custom board built with a BCM1. I followed some tutorials on the internet and the result was that U-Boot never started. After some investigations, I found out that the EEPROM that contains BBB identification is not present on BCM1, but is on BCS2 instead. The consequence is that U-Boot cannot identify the board and the initialization procedure fails.
In this post, I will detail every step to make U-Boot run on BCM1.

Note: All of these steps were performed on Windows 10 with bash (ubuntu terminal).

Install tools

sudo apt-get install gcc-arm-linux-gnueab

Create an alias to simply following commands


alias armmake='make -j8 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- '

Grab U-Boot

git clone
git://git.denx.de/u-boot.git u-boot
cd u-boot

Edit the code


Usually, we do a patch for these type of things. I'll try to post it later.
We will modify some parts of the code to stub the EEPROM so that U-Boot behaves exactly like it was a BBB board.

Open board_detect.c and add the following right after the #include statements:
static struct ti_common_eeprom stub_eeprom = 
{
    TI_EEPROM_HEADER_MAGIC,
    "A335BNLT",
    "00C0",
    "1716BBBK2450",
    "",
    {{0x11,0x22,0x33,0x44,0x55}, // Dummy Mac addresses
     {0x11,0x22,0x33,0x44,0x55},
     {0x11,0x22,0x33,0x44,0x55}},
    11749353662462671858,
    12656917672993063804
};

Replace the content of the function 'eeprom_am_set' with
{
    return 0;
}

Do the same with 'eeprom_am_get'.

Open board_detect.h.

Replace TI_EEPROM_DATA definition with:
 #define TI_EEPROM_DATA ((struct ti_common_eeprom *)\
                &stub_eeprom)

Config & Compile

armmake  distclean
armmake am335x_boneblack_defconfig
armmake

Copy to an SD Card


Format an SD card to FAT32 (boot flag ON).
Go to your U-Boot directory and copy MLO to SD Card first.
Then copy u-boot.img to SD-Card.

And there you go ! U-Boot shall start normally.
Here is an example of my uEnv.txt :
bootpart=0:1
bootdir=
bootfile=uImage
fdtfile=am335x-boneblack.dtb
fdtaddr=0x80F80000
loadaddr=0x80200000
optargs=quiet
mmcdev=0
mmcroot=/dev/mmcblk0p2 ro
mmcrootfstype=ext4 rootwait
loadbootenv=load mmc ${mmcdev} ${loadaddr} ${bootenv}
loadfdt=load mmc ${bootpart} ${fdtaddr} ${bootdir}/${fdtfile}
uenvcmd=load mmc 0 ${loadaddr} ${bootfile};run loadfdt;setenv bootargs console=${console} ${optargs};bootm ${loadaddr} - ${fdtaddr}


Tuesday, June 20, 2017

Machine learning basics

There are several types of machine learning but we will focus on the followings in this article:

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

Supervised learning


Basically used to make predictions about future data from labeled/categorized training dataset.


A dataset is a table where:

  • each row is a sample
  • each column is a feature
  • each row is labeled with a class label

 Classification


A supervised learning task with discrete class labels is called a classification task.
Classification is a subcategory of supervised learning where the goal is to predict the categorical class labels of new instances based on past observations.

Example:

Dataset with  [data; label]
Data 0 : [ I like sport; Present]
Data 1 : [ I love shopping, Present]
Data 2 : [ I was in Amsterdam, Past]
Data 3 : [ I did something wrong, Past]
Data 4 : [ I do exercise every day, Present]

New data:

I did not know --> Past
I love running --> Present

With 2 possible class labels, the task is a binary classification task.
With more, it is a mutli-class classification task.

Example of multi-class dataset:

[Picture of cat; cat]
[Picture of dog; dog]
[Picture of mouse; mouse]

Here the machine learning system would be able to recognize a dog, a cat or a mouse but wouldn't succeed with any other animal because it is not part of our dataset.

Typical example of two-dimensionnal dataset for a binary classification task:

Data 0 : [  [0;0] ; Orange]
Data 1 : [  [1;1.5] ; Orange]
Data 2 : [  [1;2] ; Orange]
Data 3 : [  [1;2.8] ; Orange]
Data 4 : [  [2;1.5] ; Orange]
Data 5 : [  [2;2.5] ; Orange]
Data 6 : [  [3;0] ; Blue]
Data 7 : [  [3;1.5] ;Blue]
Data 8 : [  [3;2] ; Blue]
Data 9 : [  [4;2.8] ; Blue]
Data 10 : [  [4;1.5] ; Blue]
Data 11 : [  [4;2.5] ; Blue]
Data 12 : [  [5;3] ; Blue]

It is two-dimensionnal because each sample of the dataset has 2 values (usually named x1,x2). If we represent these samples on a 2-dimensionnal graph, we would see this:


The prediction would be based on the distribution of the sample. A point with x1 > 3 would be predicted as Blue and a point with x1 < 2 would potentially be red.

Regression


Regression is also called prediction of continuous outcomes. In regression analysis we give a serie of numbers (x or predictor) and response variables (y or outcome) and we try to find a relationship between them to predict a future outcome.

Ex:

with [x;y]
Data 0 : [ 0 ; 0 ]
Data 1 : [ 1 ; 1.5 ]
Data 2 : [ 1.5 ; 1 ]
Data 3 : [ 2 ; 2 ]
Data 4 : [ 2.5 ; 2.6 ]
Data 5 : [ 3 ; 3.2 ]
Data 5 : [ 4 ; 3.9 ]

Several types of algorithm can be selected to process input data. The following figure illustrates the concept of linear regression:


The computed curve will be used to predict the outcome of new data.

Reinforcement learning


Here the goal is to develop a system (agent) that improves its performances based on interactions with environment. The system will receive a feedback (reward) for every one of its actions. Each reward informs him of the quality of his action.
The agent will learn a series of actions that maximizes this reward via an empirical try-and-error approach.

A typical example is Google's Deepmind which beat the best Go players.

Unsupervised learning


In supervised learning, we include the right answer (labels) into the dataset. Here, we don't know the right answer beforehand. We are dealing with uncategorized data with an unknown structure.
With unsupervised learning, we can explore the structure of our data to extract meaningful information without an outcome or a reward.

Clustering


Clustering is an exploratory data analysis technique which groups data together by similarity (unnsupervised classification).

Dimensionality reduction


Dimensionality reduction is another unsupervised learning field. To prevent against the computation of huge amounts of data which results in performance and storage issues, unsupervised dimensionality reduction preprocesses data to remove noise and retain relevant information.


Thursday, June 15, 2017

Django + Js : display counting timedelta in view

Suppose you have an entry in DB with a timestamp 'start_date'. You want to display the time elapsed since 'start_date' in your view and you want that delay to grow in real time like a clock. In the following example, the user can select one entry at a time and the counter has to be updated accordingly.

Note: To achieve this it is important to work with UTC timestamps in both DB and your view

Here is what my code in django views.py looks like:

def get_entry_info(request):
    """
    Recover information for the specified entry name
    """
    entry_name = request.POST.get("name", None)
    if request.method == "POST" and entry_name is not None:
        data = {}
        # Get entry object 
        entry = MyEntries.objects.get(name=entry_name)
        data['start_date'] = int(time.mktime(entry.start_date.replace(tzinfo=None).timetuple())) * 1000 
        return JsonResponse(data)
    else:
        return HttpResponse("Invalid entry name")

Here is what my javascript code looks like:
var startDate;

// Loads content into the information panel
// @param data : data to be displayed in the information panel
function reloadInformationPanel(entryname){
    // Get job information from server
    $.ajax({
        headers: { "X-CSRFToken": '{{ csrf_token }}' },
        url: "{% url 'get_entry_info' %}",
        method: 'POST', 
        dataType: 'json',
        data: {
            'name': entryname, // outgoing data
        },
        success: function (data) {        
            startDate = new Date(data.start_date);
            startTime();
        },
        error: function(xhr,errmsg,err) {
            console.log(xhr.status + ": " + xhr.responseText); 
        }
        });
}

// Starts timer of job duration
function startTime() {
   var now = convertDateToUTC(new Date());
   var delay = new Date(now - startDate);
   document.getElementById('entry_duration').innerHTML = delay.getUTCHours() + "h" + delay.getUTCMinutes() + "m" + delay.getUTCSeconds() + "s";
   var t = setTimeout(startTime, 500);
}

// Converts a Date object to UTC
function convertDateToUTC(date) { 
   return new Date(date.getUTCFullYear(), date.getUTCMonth(), date.getUTCDate(), date.getUTCHours(), date.getUTCMinutes(), date.getUTCSeconds()); 
}

Now simply map reloadInformationPanel to the entry selection button ;)

Thursday, April 27, 2017

What is AJAX ?

AJAX = Asynchronous Javascript And Xml

Ajax is not a tehnology by itself but rather a combination of existing technologies (HTML/CSS/DOM/Javascript/XML/JSON).

What is it intended for ?


Ajax is typically used to refresh data on a web page without having to reload the entire page (e.g. asynchronously). It involves a web browser sending HTTP requests (GET/POST)  to server and processing the response to finally manipulate the page DOM (e.g. HTML tags). The user's view is thereby dynamically updated.


As you can see, Ajax requests are executed by Javascript code and the response is also handled in Javascript. The orange part stands on client's side (the web browser).

What if the format of data ?


Originally, it was XML but nowadays, JSON is preferred (JavaScript Object Notation).


Example of ajax requests


jQuery

<script>
...
$.ajax({
        headers: { "X-CSRFToken": getCookie("csrftoken") },
        url: "myurl",
        method: 'POST', // or another (GET), whatever you need
        data: {
            'mydata': 'value', // outgoing data
        },
        
        success: function (data) {        
            // success callback
            // you can process data returned by server here
        }
        });
...
</script> 

Pure javascript

<script type="text/javascript">
function ajaxFunction()
{
var xmlhttp;
if (window.XMLHttpRequest)
  {
  // code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else if (window.ActiveXObject)
  {
  // code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }
else
  {
  alert("Your browser does not support XMLHTTP!");
  }
xmlhttp.onreadystatechange=function()
{
if(xmlhttp.readyState==4)
  {
  document.myForm.time.value=xmlhttp.responseText;
  }
}
xmlhttp.open("GET","time.asp",true);
xmlhttp.send(null);
}
</script>

Friday, April 7, 2017

Hard drive layout and boot requirements

How does software boot from hard drive ?
What are the requirements to boot any operating system from a hard drive ?
What means MBR, FAT, cluster, sector ... ?

 

MBR, the cornerstone

The Master Boot Record is a memory zone located at:
  • Logical address 0 (if the hard drive uses LBA)
  • Cylinder 0, Head 0, Sector 1 (if the hard drive uses CHS addressing. Used by old hard drives)
which contains the OS boot code (i.e. bootstrap) and a partition table.
This memory area has a size of 512 bytes.
mbr.gif
Basically, if anything happens to this memory zone, the hard drive is brain dead.
Usually, the machine will execute the bootstrap which is the 1st level bootloader that scans the partition table, finds the active partition and jumps to it where the 2nd level bootloader shall be located at address 0. Others, like embedded systems where the CPU has its own 1st level bootloader, will read directly the partition table and achieve the same task.
partitionentry.jpg
All partition have a beginning address, ending address, a size given in sectors.
By reading this table, a machine locates the active partition, and jumps to its starting address where another boot record is located. Basically, a boot record is present at the first sector of each partition.

 

Wait, a sector ? What is this ?

When speaking about hard drives, the same vocabulary is always used:
  • sector: amount of bytes (almost always 512 byte for compatibily purpose)
  • cluster: amount of sectors. This number depends on the disk format settings and can be read from boot record.
Note: When working with non-volatile memories like eMMC, another unit is sometimes uses : block. If the memory is formatted as a bootable hard drive, a block equals a sector.
The cluster is the unit used by file systems.

 

FAT

File Allocation table is one of the disk formats used by computers (old Windows mashines, boot disks for some linux machines, ...).
There are 3 types of FAT that are commonly used : FAT12, FAT16 and FAT32.
Here is an overview of the layout of a FAT disk:
fatlayout.jpg
One of the biggest difference between FAT32 and older FAT layouts is that the root directory section is now part of the data area.

 

OK about FAT layout, but where is the MBR ?

In the reserved area, the first sector of a FAT disk is ... the MBR !
Here are the settings that are registered in the boot code for a FAT12 disk:
Bytes Description
0-2 Jump to bootstrap Legacy bytes almost always equals to 0xeb 0x3C 0x90 where 0xEB is the jump instruction for x86 machine (sometimes 0xEA is used instead)
3-10 OEM Name / Version (string)
11-12 Number of bytes per sector (512,1024,2048,4096)
13 Number of sectors per cluster (1,2,4,8,16,32,64,128)
14-15 Number of reserved sectors (FAT32 -> 32, others -> 1)
16 Number of FAT copies
17-18 Number of root directory entries (0 for FAT32, 512 recommended for FAT16)
19-20 Total number of sectors in the filesystem (N/A for FAT32)
21 Media descriptor type (0xF0: Floppy disk, 0xF8: Hard disk)
22-23 Number of sectors per FAT (0 for FAT32)
24-25 Number of sectors per track
26-27 Number of heads (2 for a double sided floppy disk)
28-29 Number of hidden sectors
30-509 Bootstrap
510-511 Signature, always equal to 55h AAh (signature of the MBR)
Here are the FAT16 extensions:
Bytes Description
11-27 Identical
28-31 Number of hidden sectors
32-35 Total number of sectors in the filesystem (replaces byte 19-20 of FAT12)
36 Logical drive number
37 Reserved
38 Extended signature
39-42 Serial Number of partition
43-53 Volume label or no name (string)
54-61 File system type ("FAT32 ", "FAT16 ", "FAT ")
62-509 Bootstrap
510-511 Signature 55h AAh
Here are the FAT32 extensions:
Bytes Description
11-35 Identical
36-39 Sectors per FAT
40-41 Mirror flags (Bits 0-3: number of active FAT, Bits 4-6: reserved, Bit7; 1=single active FAT, 0=all FATs are updated at runtime)
42-43 Filesystem version
44-47 First cluster of root directory (usually 2). This is the important improvement of FAT32. Now the root directory area is not at a fixed place with a fixed size and grow.
48-49 Filesystem information sector (FSINFO) in reserved area (usually 1)
50-51 Backup boot sector location (can be 0)
52-63 Reserved
64 Logical drive number
65 Reserved
66 Extended signature
67-70 Serial number of partition
71-81 Volume label
82-89 Filesystem type ("FAT32 ")

The FAT area

A FAT is table with one entry per cluster. Each entry has a size of:
  • 12 bits for FAT12
  • 16 bits for FAT16
  • 32 bits for FAT32
Each entry specified if the cluster has data in it or if there is another cluster following it.
The first 2 entries of a FAT table are unused. The first one contains 0xFFFFFFF8 with 0xF8 the media descriptor while the other one contains the end of file marker (0xFFFFFFFF).
The FAT area is split in two :
FAT #1 FAT #2
where FAT #2 is a mirror of FAT #1.

 

FSINFO

File system information sector is located at logical sector 1 (in reserved area).
The purpose of this sector is not clear.
Offset Description Size
00h First Signature (52h 52h 61h41h) 1 Double Word
04h Unknown, Currently (Mightjust be Null) 480 Bytes
1E4h Signature of FSInfo Sector(72h 72h 41h 61h) 1 Double Word
1E8h Number of Free Clusters (Setto -1 if Unknown) 1 Double Word
1ECh Cluster Number of Clusterthat was Most Recently Allocated. 1 Double Word
1F0h Reserved 12 Bytes
1FCh Unknown or Null 2 Bytes
1FEh Boot Record Signature (55hAAh) 2 Bytes

Root directory

This area contains an entry for each file located at the root of the filesystem. Every entry has the following properties:
Offset Description Size
00h Filename 8 bytes
08h Filename extension 3 bytes
0Bh File attributes 1 byte
0Ch Reserved 10 bytes
16h Time created or last updated 2 bytes
18h Date created or last updated 2 bytes
1Ah Starting cluster for this file 2 bytes
1Ch File size in bytes 4 bytes
A subdirectory is a file with specific attributes (byte 0Bh).

 
biz.