Observability patterns Flashcards
Observability patterns
Log aggregation Application metrics Audit logging Distributed tracing Exception tracking Health check API Log deployments and changesnew
Log aggregation: contexts
You have applied the Microservice architecture pattern. The application consists of multiple services and service instances that are running on multiple machines. Requests often span multiple service instances.
Each service instance generates writes information about what it is doing to a log file in a standardized format. The log file contains errors, warnings, information and debug messages.
Log aggregation: problem
How to understand the behavior of an application and troubleshoot problems?
Log aggregation: forces
Any solution should have minimal runtime overhead
Log aggregation: solution
Use a centralized logging service that aggregates logs from each service instance. The users can search and analyze the logs. They can configure alerts that are triggered when certain messages appear in the logs.
Log aggregation: examples
AWS Cloud Watch
Log aggregation: result issue
handling a large volume of logs requires substantial infrastructure
Log aggregation: related
Distributed tracing - include the external request id in each log message
Exception tracking - as well as logging exceptions, report them to an exception tracking service
Application metrics: contexts
You have applied the Microservice architecture pattern.
Application metrics: problem
How to understand the behavior of an application and troubleshoot problems?
Application metrics: forces
Any solution should have minimal runtime overhead
Application metrics: solution
Instrument a service to gather statistics about individual operations. Aggregate metrics in centralized metrics service, which provides reporting and alerting. There are two models for aggregating metrics:
- push - the service pushes metrics to the metrics service
- pull - the metrics services pulls metrics from the service
Application metrics: example
Instrumentation libraries: - Coda Hale/Yammer Java Metrics Library - Prometheus client libraries Metrics aggregation services: - Prometheus - AWS Cloud Watch
Application metrics: result benefits
It provides deep insight into application behavior
Application metrics: result drawbacks
Metrics code is intertwined with business logic making it more complicated
Application metrics: result issues
Aggregating metrics can require significant infrastructure
Audit logging: context
You have applied the Microservice architecture pattern.
Audit logging: problem
How to understand the behavior of users and the application and troubleshoot problems?
Audit logging: forces
It is useful to know what actions a user has recently performed: customer support, compliance, security, etc.
Audit logging: solution
Record user activity in a database.
Audit logging: result benefits
Provides a record of user actions
Audit logging: result drawbacks
The auditing code is intertwined with the business logic, which makes the business logic more complicated
Audit logging: related
Event Sourcing is a reliable way to implement auditing
Distributed tracing: context
You have applied the Microservice architecture pattern. Requests often span multiple services. Each service handles a request by performing one or more operations, e.g. database queries, publishes messages, etc.
Distributed tracing: problem
How to understand the behavior of an application and troubleshoot problems?
Distributed tracing: forces
- External monitoring only tells you the overall response time and number of invocations - no insight into the individual operations
- Any solution should have minimal runtime overhead
- Log entries for a request are scattered across numerous logs
Distributed tracing: solution
Instrument services with code that
- Assigns each external request a unique external request id
- Passes the external request id to all services that are involved in handling the request
- Includes the external request id in all log messages
- Records information (e.g. start time, end time) about the requests and operations performed when handling a external request in a centralized service
This instrumentation might be part of the functionality provided by a Microservice Chassis framework.
Distributed tracing: result benefits
- It provides useful insight into the behavior of the system including the sources of latency
- It enables developers to see how an individual request is handled by searching across aggregated logs for its external request id
Distributed tracing: result issues
Aggregating and storing traces can require significant infrastructure
Distributed tracing: related
Log aggregation - the external request id is included in each log message
Exception tracking: context
You have applied the Microservice architecture pattern. The application consists of multiple services and service instances that are running on multiple machines. Errors sometimes occur when handling requests. When an error occurs, a service instance throws an exception, which contains an error message and a stack trace.
Exception tracking: problem
How to understand the behavior of an application and troubleshoot problems?
Exception tracking: forces
- Exceptions must be de-duplicated, recorded, investigated by developers and the underlying issue resolved
- Any solution should have minimal runtime overhead
Exception tracking: solution
Report all exceptions to a centralized exception tracking service that aggregates and tracks exceptions and notifies developers.
Exception tracking: result benefits
It is easier to view exceptions and track their resolution
Exception tracking: result drawbacks
The exception tracking service is additional infrastructure
Exception tracking: related
Log aggregation - exceptions should be logged as well as reported to a tracking service
Health Check API: context
You have applied the Microservice architecture pattern. Sometimes a service instance can be incapable of handling requests yet still be running. For example, it might have ran out of database connections. When this occurs, the monitoring system should generate a alert. Also, the load balancer or service registry should not route requests to the failed service instance.
Health Check API: problem
How to detect that a running service instance is unable to handle requests?
Health Check API: forces
- An alert should be generated when a service instance fails
* Requests should be routed to working service instances
Health Check API: solution
A service has an health check API endpoint (e.g. HTTP /health) that returns the health of the service. The API endpoint handler performs various checks, such as:
- the status of the connections to the infrastructure services used by the service instance
- the status of the host, e.g. disk space
- application specific logic
A health check client - a monitoring service, service registry or load balancer - periodically invokes the endpoint to check the health of the service instance.
Health Check API: result benefits
The health check endpoint enables the health of a service instance to be periodically tested
Health Check API: result drawbacks
The health check might not sufficiently comprehensive or the service instance might fail between health checks and so requests might still be routed to a failed service instance
Health Check API: related
Service registry - the service registry invokes the health check endpoint
Log deployments and changes: context
You have applied the Microservice architecture pattern.
Log deployments and changes: problem
How to understand the behavior of an application and troubleshoot problems?
Log deployments and changes: forces
It useful to see when deployments and other changes occur since issues usually occur immediately after a change
Log deployments and changes: solution
Log every deployment and every change to the (production) environment.
Log deployments and changes: result benefits
Enables deployments and changes to be easily correlated with issues leading to faster resolution.