High Availability Practices Flashcards
To enhance knowledge of providing high availability of applications
What can affect availability
System Maintenance, Software Updates, Infrastructure issues, Malicious Attacks, System load and dependencies. Additionally, in the cloud, latency and provider issues.
How is availability measured
Availability is typically measured by SLA and using 9s. For example, Five 9s mean 99.999%
How do you monitor availability
Create a Health Check Endpoint
What should a health check endpoint monitor
Subsystems like storage, databases and third-party dependencies
What should a health check endpoint return and should you secure a health check endpoint
Status Code content, yes it should be secure
What are some methods that can be employed to ensure high availability
Queues/Streams, Throttling,
How can throttling be employed
Set a limit to individual user access, monitor metrics and reject when limit is exceeded
Disable or degrade nonessential services so that critical services can function, for example, a video call can switch to audio only during bandwidth issues
Prioritize certain users to satisfy high impact customers’ requirements
How can a queue be employed
Introduce a Queue between the task and service
The tasks are placed in the Queue
The Service can possibly be autoscaled based on Queue Size in some advanced implementations.
If a response is expected, the service must provide a suitable implementation, however, this pattern isn’t suitable for low latency response requirements
What are some resiliency patterns
Bulk Head, Circuit Breaker, Compensating Transaction, Retry, Leader Election, Scheduler Agent Supervisor, If on AWS: Multiserver Pattern, MultiDatacenter Pattern, Floating IP
What is the bulk head resiliency pattern
Partition services into groups, Limit service resources to that group, Define partitions into business and tech requirements, hiPri customers get more resources, Leverage frameworks like polly/hystrix that limit containers resources
What is the circuit breaker resiliency pattern
If a service negatively affects applications if it were to continue to run, it is shut down.
What is the compensating transaction resiliency pattern
Records all steps to a workflow and undoes them if there is a failure.
What is the retry resiliency pattern
Intelligently attempt to reestablish contact with a failing service
What is the leader election resiliency pattern
A single task instance should be elected as leader. This will coordinate the actions with other subordinate instances.