Thursday, October 5, 2017

Application logs

Logging is a fundamental mechanism for the application. It brings valuable information to support teams when something goes wrong and saves a lot of time in bug investigation. But this utility can easily turn into a "spam engine" if the developers do not carefully select what/when/how much to log.

What do we expect from logs

A support team will look for relevant events or user actions that can explain a troublesome situation. Here is an example:

09:50:05 [INFO] [AppOperation] Mix configured with CMP=Nitrogen, CST=SynthAir, C=245,8 ppm
09:50:05 [INFO] [Solver] Computed flows for (C=2445,8 pp) : MFC1=1200 mlmin, MFC2=2450 mlmin
09:50:06 [INFO] [AppCalibration] Compensation of MFC1=1200 mlmin to MFC1=1202 mlmin
09:50:06 [INFO] [AppCalibration] Compensation of MFC2=2450 mlmin to MFC1=2449 mlmin
09:50:06 [INFO] [Peripherals] Setting MFC1=1200 mlmin
09:50:06 [INFO] [Peripherals] Setting MFC2=2449 mlmin
09:50:06 [INFO] [Peripherals] Setting Hardware state to Mix2Levels
09:50:06 [INFO] [AppOperation] Mix started
09:52:08 [ERROR] [AppAlarm] (Stability Alarm ON) MFC1 flow is not stable (Measured=1015 mlmin)

In this scenario, the user complains about poor results on his device. We suppose that the Alarm was not signaled on a visual indicator to inform the user. With this bunch of logs, developers can immediately:
  • Point the origin of the issue 
  • Get the conditions of the operation that lead to the issue and try to run the same experiment at his office
Obviously, none of these actions will solve the problem but the information that logs gave to the developer will highlight which functional part went wrong, and help him to follow the right track to resolve the client issue.
 It is essential for developers to have the following information in logs:
  • Timestamp 
  • Log level : developers will filter the errors or warnings first to find an obvious reason for the malfunction 
  • Synthetic information : Long phrases are useless: a log must go the point, there is no litterature trophy for well-written logs. Most of the time, use action verbs and parameters. 
  • Workable data : the parameters listed in our logs specify numerical values with a unit. It seems obvious to say that without the unit, the developer cannot investigate. Plus, the location where the error occured is necessary. In our example, the name of classes are displayed. Either do this or use functional area names logged (ex: Config, Security, Alarms, ...). 

Other things to have in mind:
  • Keep files small : do not go over a few MB for text files (I do not include DB here). Opening and parsing a heavy file is long and inefficient. Use several small rolling files instead. 
  • Keep short history : there is no need to keep traces from last week if nothing happened there. This is closely related to file size for text logs.

What to log

  • Application events: notifications, alarms, state changes, ... 
  • Operation-related information : settings update, user actions, ...

What NOT to log

  • Periodic data : if live values need to be logged regularly, put them in a separate log file and preferably go for a DB that can handle large amount of entries better. Developers can then refer to timestamps for investigations. 
  • GUI input errors : if a user enters invalid value, it is not an error for the application, it is an error for the form. As long as the frontend does not interact with the application, whatever happens between the user and the form does not have an impact on the application.

Where to log

A software architecture is built with modules interacting with each others. The business logic of the application is dispatched between these modules. So basically, each functional area of the application shall manage its own logs. To do things neatly, centralize the logging features in a service shared by all components.
Do not repeat information. Take the following example: A calls B, B fails. This is the type of log that we should expect from it:

10:00:01 [INFO] [A] Action A1 started
10:00:01 [INFO] [B] Setting xxx=1234
10:00:01 [ERROR] [A] Action A1 failed (Code=Invalid Setting Error)

and not something like :

10:00:01 [INFO] [A] Action A1 started
10:00:01 [INFO] [B] Setting xxx=1234
10:00:01 [ERROR] [B] Given value is invalid
10:00:01 [ERROR] [A] Action A1 failed (Code=Invalid Setting Error)

The error occured at a known place, so either log it from there or from the client (recommended).

Debugging information

Take the following example of code:

void ExecuteUserRequest()
{
   env = GetEnvironmentValue();
   if(env < THRESHOLD)
   {
      doA();
   }
   else
   {
      doB();
   }
}
Here, myFunc behaves depending on an external factor. Here, we can chose to log the action (doA or doB) with an INFO level and add debugging information to the user to help him understand why this precise action occured :

void ExecuteUserRequest()
{
   env = GetEnvironmentValue();
   LogDebug("env = %d", env);
   if(env < THRESHOLD)
   {
      LogInfo("Executing A");
      doA();
   }
   else if(env >= THRESHOLD)
   {
      LogInfo("Executing B");
      doB();
   }
}
Generally speaking, Debug level gives execution details and exposes implementation data to the developer.

Memo

Here are some basic rules that I use to write my application logs.

Code type Log level
user input error (GUI) No log.
Frequently accessed functions (ex: sensor value read) Debug or no log if we log in a text file. Use a DB if possible.
Implementation detail (subroutines, intermediate values,...) Debug
User action or application event Info
Alarm or abnormal situation that does not result in an error immediately Warn
Operation failure on user action Warn if the error is normal given the context (the product acts as specified). Error otherwise.
Application error (internal non-critical errors or errors managed by the application) Error
Critical non recoverable error (unmanaged exception, errors that lead to a non-functioning application) Fatal

0 comments:

Post a Comment