SRE Flashcards
What is an SRE?
Site Reliability Engineer
Their approach to operations uses software as the primary tool for managing systems or improving the system (automation).
What is toil?
Toil is mundane and repetitive tasks that could be automated because it follows a strict pattern.
3 good practices for managing toil
1) Allocating strategic time to reducing/automating toil.
2) Do a cost benefit analysis to determine if automating the toil is worth it.
3) Keep a toil reduction backlog.
What are 3 responsibilities of an SRE?
1) Eliminating toil
2) Working to service levels
3) Managing failure
What is SLA?
Service Level Agreement
It is the target time for a service that you enter into contract with someone that uses that service.
What is SLO?
Service Level Objective
- target goal/objective for the uptime of a service
- service health is defined in terms of multiple SLOs
- service level is more user focused and based on user experience
- goals have to be actually achievable
What is SLI?
Service Level Indicator
- ongoing measure of system to make sure that an SLO is met
- has to be measurable
- for example, the availability/success rate is determined by the equation: (all status code besides 500 that occur) / (number of requests made)
What are 3 ways that SRE differ from the traditional dev/ops split?
- SREs review nuances of production; Dev/ops review the entire life cycle.
- SREs focus on standards and metrics; Dev/ops focus on team synthesis.
- SREs focus on a product’s system; Dev/ops focus on the development and delivery.