Final Part 1 Flashcards

Question

What do running reliable services require?

Answer 1

A reliable release process

Answer 2

That they are built in a reproducible, automated way so that releases are repeatable and aren’t “unique snowflakes.”

Answer 3

Tools that report on a host of metrics, such as how much time it takes for a code change to be deployed into production (in other words, release velocity) and statistics on what features are being used in build configuration files.

Answer 4

All elements of the release process. Examples include compiler flags, formats for build identification tags, and required steps during a build.

Answer 5

Essentially, an incident is any technical disruption of your business. Incidents come in all shapes, sizes, and severities.

Answer 6

Creating and implementing automated systems along the entire development process.

Answer 7

– Code: Software developers design and build solutions via code. Developers manage their source code and often work on the same portion of a codebase simultaneously. – Integration: Code changes must be merged from multiple developers into the master branch of a code repository. – Deployment: After being merged, the code must be deployed. This can often mean releasing updates, changing configurations, and even deprecating services. – Infrastructure: An application must be run on hardware. Depending on the updates to code, infrastructure may need to be instantiated, provisioned, or terminated.

Answer 8

Popular tools often have the best documentation and answers on technical forums. Popular tools are often open-source software(OSS)

Answer 9

By means of observation or experience rather than theory or pure logic. "empirically tested methods"

Answer 10

The average time your business is impacted during incidents. Also sometimes used to describe the mean time to recovery, the amount of time your team takes to resolve an issue as well as mean time to respond, or the time an organization takes to acknowledge and initiate a response to a problem.

Answer 11

Latency, the time from when the failure first occurred to when it was detected. This is likely calculated after the incident is resolved.

Answer 12

MTTR = total time of impact / number of incidents

Answer 13

Discovery, Response, Restoration, Reflection, and Preparation.

Answer 14

To better understand each step of the unplanned work.

Answer 15

This phase starts when the issue is detected. Services can be impacted for a period of time before you realize it.

Answer 16

This phase is the scramble of trying to determine the source of the issue.

Answer 17

At this point, you’ve identified the issue and are working on solving it. This phase is often one of the shortest ones of the incident. After you know what’s happening, you usually discover a straightforward fix, even if it means rolling back a deployment or reverting to the last issue-free build.

Answer 18

This phase is where a post-incident review takes place.

Answer 19

During the preparation phase, engineers complete the work determined necessary during the post-incident review.

Answer 20

Reflection and Preparation.

Answer 21

A collection of components networked across multiple computers.

Answer 22

They are independent. They can fail without impacting other services and work concurrently.

Answer 23

Running applications at scale requires efficient use of infrastructure and the cost of underutilizing hardware adds up quickly.

Answer 24

Microservices and Containers

Answer 25

Style of architecture that separates logic into loosely coupled services.

Answer 26

Enable engineering teams to package applications with dependencies and provide an isolated, ephemeral environment(an environment meant to last for a limited amount of time).

Final Part 1 Flashcards

(50 cards)