Final Part 1 Flashcards
Blue/Green Deployment
Enables you to launch a new version of your application alongside the old version, and monitor and test the new version before you reroute traffic to it, rolling back on issue detection.
Canary Deployment
Releases software changes to select customers as a way of
testing functionality and reliability in production while limiting the number
of customers potentially impacted.
What to do after you’re satisfied that no customer is negatively impacted from the Canary Deployment?
You can slowly increase the number of customers who receive the new version of your application.
Rolling Deployment
Installation of software updates on one server or server subset at a time, rather than updating all servers or server subsets at the same time.
How often is a new version deployed at each instance in Rolling Deployment?
One at a time(in clusters)
Window Size
The size of your grouping
A window size of one will proceed______ machine at a time whereas a window size of four will deploy the new version to four servers at the same time.
one
What’s the advantage of Rolling Deployment?
The contrast between it and a traditional upgrade.
Feature flag or Feature toggle
Conditional feature that can be hidden from customers.
What does feature flags help solve?
Maintaining continuous delivery or continuous deployment while not releasing the functionality to customers until you’re ready.
What happens if you deploy partially completed work?
It reduces the number of feature branches and merging you have to manage throughout the process and alerts other developers of your work long before it’s finished.
What needs to be monitored after releasing software?
Performance, Availability, Security and More
Telemetry
Fancy word for collecting data on the behavior of your systems.
What does telemetry enable your system to do?
Regularly update you on how things are going, which keeps you from digging into logs only when something goes wrong.
How does telemetry create records on its own behavior?
Independently
What can you do to benefit from telemetry?
You must set your application and infrastructure up in such a way that data collection and reporting are possible
What are the two components that Telemetry requires?
Data collection and Metrics management
What three terms in site reliability engineering is important to keep in mind for monitoring systems.
Service-Level Agreement (SLA), Service-Level Objective (SLO), and Service-Level Indicator (SLI):
Service-Level Agreement(SLA)
Defines the level of service expected by a customer from a supplier - Availiability
Service-Level Objective (SLO)
Is the specific reliability, availability, or performance target that a vendor promises to meet within an SLA.
Service-Level Indicator (SLI)
Measures how well a company actually meets the SLO promises that it sets within SLAs
Similarities and differences between SLAs, SLOs and SLIs
An SLA is a contract.
An SLO is a specific goal that is defined in a contract.
An SLI measures the extent to which teams comply with the SLO promises they make in SLA contracts.
Release engineering
Relatively new and fast-growing discipline of software engineering that can be concisely described as building and delivering software.
What do release engineers have a solid(if not expert) understanding of?
Source code management, compilers, build configuration languages, automated build tools, package managers, and installers.
Also development, configuration management, test integration, system administration, and customer support.
What do running reliable services require?
A reliable release process
What should SRE’s know about their binaries and configurations?
That they are built in a reproducible, automated way so that releases are repeatable and aren’t “unique snowflakes.”
What kind of tools that Release Engineers mostly envision and develop?
Tools that report on a host of metrics, such as how much time it takes for a code change to be deployed into production (in other words, release velocity) and statistics on what features are being used in build configuration files.
What does Google’s best practices cover?
All elements of the release process. Examples include compiler flags, formats for build identification tags, and
required steps during a build.
What’s an incident or service outage?
Essentially, an incident is any technical disruption of your business. Incidents come in all shapes, sizes, and severities.
Instead of focusing on preventing humans from making mistakes what does DevOps processes recommend?
Creating and implementing automated systems along the entire development process.
What are four areas primed for automation?
– Code: Software developers design and build solutions via code. Developers
manage their source code and often work on the same portion of a codebase
simultaneously.
– Integration: Code changes must be merged from multiple developers into the
master branch of a code repository.
– Deployment: After being merged, the code must be deployed. This can often
mean releasing updates, changing configurations, and even deprecating
services.
– Infrastructure: An application must be run on hardware. Depending on the
updates to code, infrastructure may need to be instantiated, provisioned, or
terminated.
What are some benefits of moving with a crowd for solutions for your team?
Popular tools often have the best documentation and answers on technical forums.
Popular tools are often open-source software(OSS)
Empirically
By means of observation or experience rather than theory or pure logic. “empirically tested methods”
Mean Time To Repair(MTTR)
The average time your business is impacted during incidents. Also sometimes used to describe the mean time to recovery, the amount of time your team takes to resolve an issue as well as mean time to respond, or the time an organization takes to acknowledge and initiate a response to a problem.
What to include when collecting the MTTR?
Latency, the time from when the failure first occurred to when it was detected. This is likely calculated after the incident is resolved.
MTTR formula
MTTR = total time of impact / number of incidents
What steps can incidents be broken down into?
Discovery, Response, Restoration, Reflection, and Preparation.
What is the purpose of breaking an incident into different phases?
To better understand each step of the unplanned work.
Discovery
This phase starts when the issue is detected. Services can be
impacted for a period of time before you realize it.
Response
This phase is the scramble of trying to determine the source of the issue.
Restoration
At this point, you’ve identified the issue and are working on solving it. This phase is often one of the shortest ones of the incident. After you know what’s happening, you usually discover a straightforward fix, even if it means rolling back a deployment or reverting to the last issue-free build.
Reflection
This phase is where a post-incident review takes place.
Preparation
During the preparation phase, engineers complete the work determined necessary during the post-incident review.
What two incident phases are often forgotten after the service is restored?
Reflection and Preparation.
Distributed System
A collection of components networked across multiple computers.
Are the components of a distributed system independent or dependent?
They are independent. They can fail without impacting other services and work concurrently.
Why are companies moving to cloud hosting?
Running applications at scale requires efficient use of infrastructure and the cost of underutilizing hardware adds up quickly.
What are two concepts that accompany the transition to distributed systems?
Microservices and Containers
Microservices
Style of architecture that separates logic into loosely coupled services.
Containers
Enable engineering teams to package applications with dependencies and provide an isolated, ephemeral environment(an environment meant to last for a limited amount of time).