Enterprise Flashcards

1
Q

What is the Waterfall Model?

A

Cascades the three fundamental activities of the software development process (exploration, development, operation) so that they happen sequentially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the iterative/incremental model?

A
  • Iterative because the feed-forward between activities is augmented with feed-back between them
  • Incremental because the interleaved activities regularly deliver small additional pieces of functionality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the enabler of the exploration activity, and how is the activity described?

A
  • The enabler is lean cycle evolution
  • It is described by monoliths and microservices
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the enabler of the development activity, and how is the activity described?

A
  • The enabler is version control
  • It is described by Continuous Integration and Continuous Delivery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the enabler of the operations activity, and how is the activity described?

A
  • The enabler is cloud computing
  • It is described by DevOps and Site Reliability Engineering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is the incremental model better than the waterfall model?

A

It is much easier to go back and change any issues that may occur during development, reducing the cost of failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the disadvantages of the waterfall model?

A

Project specifications may change over time, and using this type of model means that you cannot change the specifications late in the project, making the project obsolete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is version control important?

A

Companies in industry may still use an earlier version of software and refuse to upgrade due to potential issues that may arise from doing so

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are two ways for an enterprise to apply scientific method to its activity?

A
  • Build-Measure-Learn Cycle
  • Learn-Measure-Build Cycle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the learn phase?

A

Enterprise comes up with a hypothesis about the marketplace and decides what empirical data would validate this hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the measure phase?

A

Enterprise tests its hypothesis by collecting the empirical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the build phase?

A

Enterprise creates a Minimum Viable Product if the empirical data looks good

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a technology pivot?

A

This happens when it becomes clear that the product could deliver its value more efficiently using a different technology.
Example: Microsoft Office was purchased up-front and installed on a single computer with a one-time license, now it is a subscription service on the cloud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a zoom-in pivot?

A

This happens when a product feature becomes a product.
Example: Flickr was a game called Game Neverending. One feature allowed players to share photographs, which became more popular than the game itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a zoom-out pivot?

A

This happens when a product becomes a product feature.
Example: DotCloud allowed developers to focus on code while scaling, deployment and load balancing were taken care of for them. Customers wanted to move applications between clouds and DotCloud was reorganised to manage new Docker containers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a customer segment pivot?

A

A customer segment pivot happens when a product solves a problem, but not the one intended.
Example: YouTube was an online dating site, but no one uploaded a dating video so any video could be uploaded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a customer need pivot?

A

A customer need pivot happens when a product solves a problem, but not the most important one
Example: Twitter was a podcasting platform made obselete by Apple’s iTunes and in a last-ditch effort, they provided an SMS-based social network instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Concierge MVP?

A

Gets a person to work with the customer to refine how the product will work - there is no clear solution hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a Wizard of Oz MVP?

A

Gets a person to simulate how the product will work - there is a clear solution hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a landing page MVP?

A

Creates a page where potential customers can find out about a product idea, and perhaps even pledge money to fund its development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a video MVP?

A

Shows how the product might work, and asks customers to sign up for it - there is a clear solution hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What sort of experiment is a startup?

A

An experiment where you have a hypothesis that you are trying to test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the biggest waste that product development faces today?

A

Building things very efficiently that nobody wants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the universal constant of all successful startups?

A

The pivot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is validated learning?
Learning (quantitatively) how to build a sustainable business, everything else is a waste of time
26
Is agile development right for a startup?
No, in agile development the customer is known whereas the specifications are not. In a startup, neither is known
27
What is a monolithic system?
It consists of a single program run by a single process formed from a collection of modules that communicate by procedure calls
28
Why is it easy to develop a monolith?
All the code is in one language in one place, this makes it easier as it stops unnecessary translation between many different languages
29
Why is it easy to test a monolithic system?
It is straightforward to build the system under test as a single executable against which a suite of tests can be run automatically. It is then simple to locate the error
30
How is a monolithic system scaled?
Vertical scaling, whuch invloves replacing an existing machine with a new, more powerful one that can run the monolithic system better
31
How is an enterprise using a monolith likely to store its data?
Using a centralised database, which makes it easier to ensure the accuracy, completeness, and consistency of data
32
What are the two states of a transaction?
- Commits and the database is moved to a new consistent state - Aborts and the database is restored to its previous consistent state
33
What properties does a transaction have?
- Atomicity: transactions either succeed completely or fail completely - Consistency: transactions begin with the database in one consistent state and end in another consistent state - Isolation: effect of performing transactions concurrently is the same as performing them sequentially - Durability: Once a transaction has succeeded, its effects persist, even in the presence of system failures
34
What is Fred Brooks' observation?
The first version should always be viewed as a prototype
35
What is Gall's Law?
A complex system that works evolves from a simple system that worked. For monolithic MVPs, it implies starting with a simpler design that can be refined over time
36
What is the You Aren't Gonna Need It observation?
Functionality should not be developed unless it is required to avoid unnecessary complexity
37
How might Conway's Law apply to teams creating a monolithic MVP?
A system's design mirrors the structure of the organisation that created it. For monolithic MVPs, the design will reflect the communication structure of the team.
38
Why might it be better to invert Conway's Law?
Structuring the organisation based on the structure of the system design, e.g. one department for each module
39
What does the phrase Eating Your Own Dogfood?
Development teams should use their own product to identify and address issues, ensuring its quality and usability
40
What needs to be in the first version of a product?
Only what is necessary to learn whether the plan is correct or not
41
What should the heuristic be for any kind of startup advice?
Does it minimize total time through the build-measure-learn loop?
42
What are actionable metrics?
Ones about per customer behaviours that can be measured
43
What did the eBay V2 architecture consist of?
The eBay V2 architecture consisted of a single 3.4 million lines-of-code C++ library, which hit the 16k compiler limit of the number of methods per class.
44
What lesson can be learned from the evolution of the eBay architecture (and others)?
- No one starts with microservices. - Past a certain scale, everyone ends up with microservices. - Most enterprises (< 1%) never reach the scale where microservices become necessary.
45
What architecture is appropriate for the Starting phase of an enterprise?
A monolithic architecture is appropriate, designed to be "just enough" to meet near-term, evolving customer needs as cheaply as possible.
46
What does "just enough" architecture look like?
"Just enough" architecture employs simple, familiar technology. This is often a rapid prototyping framework such as Ruby on Rails or PHP, allowing quick iterations and minimal complexity.
47
Why should an enterprise prefer to buy rather than build software?
Buying software is typically faster, cheaper, and better than developing it in-house. Open-source solutions are preferred where possible to avoid unnecessary reinvention.
48
What architecture is appropriate for the Scaling phase of an enterprise?
A microservices architecture is appropriate, allowing teams to design, develop, deploy, and operate their services independently. This provides scalability and flexibility as the enterprise grows.
49
How should incremental changes be handled?
Incremental changes should be as small as possible. Large changes should be decomposed into smaller ones while maintaining backward/forward compatibility of data and interfaces to minimize disruption.
50
What is the system of record?
The system of record is the single service that owns any given piece of data. Any other copies of that data are read-only, non-authoritative cached versions.
51
What architecture is appropriate for the Optimizing phase of an enterprise?
The architecture should be stable, focusing on sustainable, incremental improvements in functionality and efficiency rather than large-scale changes.
52
How does a microservices system work?
It consists of multiple programs that run as multiple processes and communicate by sending messages over a network
53
What is one commonly used network protocol?
Representational State Transfer (REST) - a conventional form of the HyperText Transfer Protocol (HTTP)
54
What are the resources described by archetypes?
- Document for file-like resources, where a GET operation reads the resource, a PUT operation updates it, and a DELETE operation deletes it - Controller for external resources, where a POST operation causes the resources to carry out some task - Collection for directory-like resources, where a GET operation lists the resources in the directory and a POST operation creates a new one with an invented name - Store for directory-like resources, where a GET operation lists the resources in the directory and a PUT operation creates a new one with a given name
55
How is an enterprise using microservices most likely to store its data?
In a distributed database made up of databases accessed by individual microservices. The distributed organisation makes it difficult to ensure that the accuracy, completeness, and consistency of data is maintained
56
How does a "two-phase commit" work in a distributed transaction?
- A coordinator transaction asks a number of participant transactions to vote on whether they are prepared to commit to a change - Each participant holds locks on its data involved in the transaction until the coordinator decides to commit or abort - If all participants vote to commit, then the coordinator instructs all participants to commit. Otherwise, if any participant votes to abort or times out, the coordinator instructs all participants to abort
57
How do "sagas" work in a distributed transaction?
- A coordinator transaction asks each participant to commit or abort in sequence - Each participant holds locks on its data involved in the transaction only until it decides to commit or abort - If a participant aborts, the sequence ends immediately and compensating transactions are made to undo the work of those participants that have already committed - A saga sacrifices atomicity and relies on eventual consistency
58
How is a system of microservices most easily scaled?
By horizontal scaling, which involves adding machines, each of which can run one or more microservices instances. The number of machines allocated to run one microservice need to be the same as the number allocated to run another
59
Why might it be risky to migrate from a monolithic MVP to a microservices one?
It must be done quickly, as taking too long may lose customers
60
What is the Strangler Fig Design Pattern?
Initially, the modules of the monolith are put behind a facade, which serves as a proxy that manages all communication with and between them. Subsequently, modules may be replaced by microservices one-at-a-time, updating the facade after each replacement
61
What is the second-system effect, and how does it relate to microservices migration?
It suggests that engineers tend to overcomplicate their second system. In microservice migration, teams must avoid unnecessary complexity when breaking apart a monolith
62
What is Jeff Bezos' two-pizza rule?
Teams should be small enough to be fed with two pizzas. In microservices, small, independent teams are ideal for maintaining and developing individual services efficiently
63
What is the Big Ball of Mud architecture?
Making a mess of code, unstructured. Migration to a microservice when the monolith has become just this
64
What is technical debt?
Coding functionality in a quick and dirty fashion. Migration to a microservice may occur due to getting into technical debt and needing to code properly
65
What are the nine common characteristics of microservices?
- Componentization via services - Organized around business capabilities - Products not Projects - Smart endpoints and dumb pipes - Decentralised governance - Decentralised data management -Infrastructure automation - Design for failure - Evolutionary design
66
What is a component?
- Independently replaceable - Independently upgradable
67
Why should one organise around business capabilities?
The focus is on the customer not on internal metrics.
68
Should endpoints be smart or dumb?
Endpoints should be smart, with the pipes around it being simple (dumb).
69
What is the rule for microservice data management?
Every service should be responsible for its own data store. You can only talk to another data store through its API
70
What do you have to assume in any distributed system?
You must assume things are going to break. Each part of the distributed must be tested to ensure other parts are not affected if something breaks
71
How big is a microservice?
Wakeling: Size of the API Fowler: There is a wide, undefined range of how big a microservice is (i.e. 4 people, 200 services or 30 people, 60 services)
72
What things must be sorted out, before going down the microservices route?
- Rapid provisioning - Basic monitoring - Rapid application - DevOps culture
73
What are two common repo models?
Monorepo: One giant repo for all microservices - any commit triggers the production of multiple microservices Multirepo: One repo per service - any commit triggers the production of a single service
74
What are common branching models?
Feature-based development: Developers create new branches based on the needs of the project. Long-lived feature branches may be created that are merged back weeks or months later Trunk-based development: Developers work on a single main branch. Short-lived branches may be created and merged back within minutes
75
What are the essential practices of version control?
- Run commit tests locally - Wait for commit tests - Avoid commits on a broken build - Never go home on a broken build - Be prepared to revert - Avoid commenting out tests - Take responsibility for breakages (EXPLAIN?)
76
VERSION CONTROL 2
77
What three benefits does a version control system provide?
- Step back to safety - Share changes easily - Store changes somewhere safe
78
What are the three models of version control?
- Mono-repo: Everything in one big repository - Multi-repo: Independent things in repositories - Multi-repo' - Interdependent things in repositories
79
Why does a mono-repo have the three benefits that a version control system provides?
- Step back to safety by stepping back all components; - Share changes easily by changing any component; - Stores changes somewhere safe by saving all components/dependencies together.
80
Why might a multi-repo not have the three benefits that a version control system provides?
The communication between components and the specification of which versions of the components work together are not stored anywhere.
81
What are two solutions to the multi-repo problem?
Two solutions to the multi-repo problem are to build independently deployable components that: - Have fixed, well-understood APIs; - Have flexible, backwards/forwards compatible APIs
82
Why do the solutions to the multi-repo problem have the three benefits that a version control system provides?
The solutions to the multi-repo problem have the three benefits that a version control system provides because it is possible to: - Step back to safety by stepping back any component; - Share changes easily by coordinating updates — not easy; - Store changes somewhere safe by storing components separately.
83
Why is multi-repo’ the worst of all worlds?
Components cannot be developed independently or deployed independently.
84
What is continuous integration?
Continuous Integration (CI) is the practice of quickly integrating newly developed code with the rest of the application code. This saves time when the application is ready to be released. This process is usually automated and produces a build artefact at the end of the process.
85
What are the four releases in the Traditional Product Delivery?
- The alpha release - The beta release - The release candidate - The release
86
What are the three environments in Modern Feature Delivery?
- The development environment: The work of a single development team is put together. Updated throughout a two-week sprint. - The staging environment: The work of multiple development teams is put together. Updated at the end of a two-week sprint - The production environment: The work of multiple development teams becomes available to customers. Updated when the business considers the time is right.
87
What is shift left testing?
An approach in software development that emphasizes moving testing activities earlier in the development process for improved software quality, better test coverage, continuous feedback and a faster time to market.
88
What tests are carried out in a test pyramid in development?
- Unit tests - Service tests - End-to-end tests
89
What is a unit test?
Unit tests are run to ensure that functions work properly. There may be thousands of unit tests, performed in seconds by testing frameworks.
90
What is a service test?
Service tests are run to ensure that services work properly. There may be hundreds of service tests, performed in a few minutes by testing frameworks
91
What is an end-to-end test?
End-to-end tests are run to ensure that the application works properly. There may be tens of end-to-end tests, performed in several minutes by mimicking user interaction, often through a GUI.
92
What is a test snow cone?
This phenomenon appears when test automation mainly focuses on E2E testing with fewer IT and even fewer UT. With software testing ice cream cones, the majority of testing is done manually. UI automated tests are a close second, integration tests in the middle, with unit testing lagging completely. This is not scalable. This is something to avoid.
93
What is a brittle test?
A test that fails because another service fails
94
What is a flaky test?
A test that sometimes fails because another service fails -perhaps due to a time-out or race condition
95
What is the Normalisation of Deviance?
The idea that over time we become so accustomed to things being wrong that we start to accept them as being normal and not a problem. This means that we need to find and eliminate flaky tests as soon as we can before we start to accept failing tests as being normal and not a problem — “it always fails like that”.
96
What are build light indicators?
A build light indicator displays the current status of a continuous integration pipeline — green when the build is successful, and red when it fails. As the number of build targets increases, build light indicators have to be replaced by monitor screens throughout the building to display the current status of a continuous integration pipeline
97
CONTINUOUS INTEGRATION 2
98
What is integration hell?
An anti-pattern of software development that brings together the pieces of a software system (far too) late.
99
What is the point of Rule 1: run commit tests locally?
The point of Rule 1: run commit tests locally is that the deployment pipeline is a valuable shared resource that one should avoid blocking with unnecessary test failures.
100
What is the point of Rule 2: Wait for the results?
The point of Rule 2: Wait for the results is that those who make changes are there ready to fix any problems immediately.
101
What is the point of Rule 3: Fix or Revert Failures Within 10 Minutes?
The point of Rule 3: Fix or Revert Failures Within 10 Minutes is to avoid blocking useful progress by others.
102
What is the point of Rule 4: If a team mate breaks the rules, revert their changes?
The point of Rule 4: If a team mate breaks the rules, revert their changes is to avoid others blocking useful progress.
103
What is the point of Rule 5: If someone else notices you caused a failure before you notice its a build sin?
The point of Rule 5: If someone else notices you caused a failure before you notice, it’s a build sin is to encourage you to pay more attention.
104
What is the point of Rule 6: Once commit passes, move on to you next task?
The point of Rule 6: Once commit passes, move on to you next task is that rapid, automated testing frees up time to do new, useful work.
105
What is the point of Rule 7: if any test fails, it is the responsibility of the committer?
The point of Rule 7: If any test fails, it is the responsibility of the committer is that someone takes responsibility for a failure and its fix
106
What is the point of Rule 8: It is the responsibility of everyone who may be responsible to agree who will fix a failure?
The point of Rule 8: It is the responsibility of everyone who may be responsible to agree who will fix a failure is that someone (of many people) takes responsibility for a failure and its fix.
107
What is the point of Rule 9: Monitor the progress of your change?
The point of Rule 9: Monitor the progress of your change so that the software can be rejected as soon as it is shown not to be in a releasable state
108
What is the point of Rule 10: address any pipeline failure immediately?
The point of Rule 10: address any pipeline failure immediately is to keep the pipeline clear for other changes, whatever that costs.
109
What is Continuous Delivery?
Continuous Delivery (CD) automatically moves a software product from a source code repository through to the staging environment. At the press of a “release” button, it could be moved on to the production environment for use by customers.
110
What is Continuous Deployment?
Continuous Deployment (CD) automatically moves a software product from a source code repository to the production environment. Without the need to press a “release” button, it is available for use by customers.
111
What are the principles of Continuous Delivery?
- Create a repeatable process - Automate almost everything - Version control for everything - If it hurts, do it more frequently - Build quality in - Done means released - Everyone is responsible - Continuous improvement
112
What is A/B testing?
A small percentage of customer traffic is sent to a new, working interface in the production. If customers appear unhappy, all customer traffic is sent to the old interface
113
What is canary testing?
A small percentage of customer traffic is sent to a new maybe working version in the production environment. If customers appear unhappy, all customer traffic is sent to the old version.
114
What is Blue/Green Testing?
The production environment (blue) is exchanged with the staging environment (green) - this may be done by updating a routing table. If customers appear unhappy, the exchange is reversed; otherwise, it is made permanent.
115
How can we achieve continuous delivery?
Through fast, automated feedback on the production readiness of your applications every time there is a change — to code, infrastructure, or configuration.
116
What condition should software always be in?
The condition software should always be in is production-ready or releasable
117
How does continuous delivery help to avoid the biggest source of waste in the software development process?
Continuous delivery helps to avoid the biggest source of waste in the software development process because so much easier to get new, experimental features into production.
118
When should testing be done?
All the time, not just once the software has been developed.
119
Who is responsible for quality?
Everyone is responsible for quality.
120
What is more important than delivering functionality?
Keeping the system working and in a good state is more important than delivering functionality.
121
How does continuous delivery reduce the risk of release?
Continuous delivery reduces the risk of release because releasing a small, extensively tested change, and being able to revert immediately is not a risky thing to do
122
What is cloud computing?
Cloud computing is the delivery of computing services—including servers, storage, databases, networking, software, and more—over the internet ("the cloud"). This allows businesses to avoid the costs of owning and maintaining physical data centers and servers.
123
How does cloud computing achieve economies of scale?
Cloud providers centralize computing resources in large data centers, allowing them to optimize resource usage, reduce operational costs, and pass savings onto customers. This model mirrors the way electricity utilities function.
124
What is a staging environment in cloud computing?
A staging environment is a scaled-down replica of the production environment where applications are tested before deployment. In cloud computing, many companies rent staging environments rather than owning them.
125
Why has cloud computing become successful?
- Broad network access - On-demand self-service - Measured service - Rapid elasticity - Resource pooling
126
What does broad network access mean in cloud computing?
It means cloud services are accessible over standard networks, including Virtual Private Networks (VPNs), allowing users to connect from anywhere.
127
What is on-demand self-service in cloud computing?
This allows users to provision and manage computing resources as needed without requiring human interaction with the provider, typically through a web interface or API.
128
What is measured service in cloud computing?
Cloud providers monitor and measure resource usage (such as compute power and storage) for billing and optimization purposes.
129
What is rapid elasticity in cloud computing?
Rapid elasticity allows users to quickly scale computing resources up or down based on demand, ensuring efficient resource utilization.
130
What is resource pooling in cloud computing?
Cloud providers allocate virtual machines to physical ones dynamically, enabling multitenancy where multiple customers share the same infrastructure securely.
131
What are the two phases of cloud computing?
- Serverful computing - Serverless computing
132
What is serverful computing?
Serverful computing involves renting virtualized computing resources where users manage their applications and infrastructure.
133
What are the models of serverful computing?
- Infrastructure-as-a-Service (IaaS): Provides raw computing resources (e.g., virtual machines). - Platform-as-a-Service (PaaS): Provides computing platforms with built-in tools and services. - Software-as-a-Service (SaaS): Provides access to applications on a subscription basis.
134
What are virtual machines (VMs)?
VMs are software-based emulations of physical computers, managed by a hypervisor, that allow multiple operating systems to run on a single physical server.
135
What are containers in cloud computing?
Containers are lightweight, isolated environments that run applications without needing a full operating system. They share the host OS kernel, making them more efficient than VMs.
136
How does the cost model of serverful computing work?
Serverful computing charges customers based on resource allocation on a rental basis, similar to renting a car for transportation.
137
How are microservices implemented in a serverful model?
Microservices in a serverful model can be implemented by running each microservice on either a dedicated virtual machine or within a container.
138
What is serverless computing?
Serverless computing abstracts infrastructure management away from developers, allowing them to deploy code that runs only when needed, without provisioning or managing servers.
139
What are the models of serverless computing?
- Backend-as-a-Service (BaaS): Provides pre-built backend services (e.g., authentication, database storage). - Function-as-a-Service (FaaS): Runs code in response to triggers or events without requiring persistent infrastructure.
140
How does serverless computing work under the hood?
Serverless implementations use "hidden" containers to execute function code on-demand, with cloud providers managing scaling and resource allocation.
141
How does the cost model of serverless computing work?
Serverless computing charges customers based on execution time (pay-as-you-go), similar to paying for a taxi ride rather than renting a car.
142
How are microservices implemented in a serverless model?
Microservices can be implemented by mapping a single microservice to a single function instance or multiple function instances. The latter may introduce maintenance and performance challenges.
143
What are some challenges of mapping microservices to multiple function instances?
- Maintenance complexity (tracking multiple function instances). - Performance issues (cold start delays and instance lifecycle management).
144
How long did it take to get a new server at the FT ready for code to be deployed in (1) an FT data centre, and (2) an AWS data centre?
To get a new server at the FT ready for code to be deployed took in: - An FT data centre = 120 days - An AWS data centre = minutes
145
Should one worry about vendor lock-in?
One should worry less about vendor lock-in than about moving slowly by choosing to do everything oneself.
146
What was the deployment frequency before the FT moved to the cloud and afterwards?
The deployment frequency before the FT moved to the cloud was 12 release per year and afterwards was about 30,000 changes per year.
147
Do you have to choose between speed and stability?
You do not have to choose between speed and stability — moving fast means breaking things less, and fixing things faster.
148
Why should you use a queue?
You should use a queue to avoid coupling with synchronous calls — producers and consumers are not reliant on each other.
149
What should you focus on when developing a distributed system?
One should you focus on resilience and redundancy when developing a distributed system.
150
Why should one adopt business-focused monitoring?
One should adopt business-focused monitoring because these few key capabilities show that fundamentally, the system is OK.
151
Why should one test infrastructure recovery plans?
One should test infrastructure recovery plans because until you do, you cannot be sure that the plan works.
152
Why does the team that builds a system have to be the one that runs it too?
The team that builds a system has to be the one that runs it too because only the team than works on a system day-to-day has a chance of working out what is wrong with it and you build things differently if you have to respond a 3am.
153
What is DevOps?
DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). Its goal is to shorten the system development life cycle while delivering high-quality software continuously.
154
How is DevOps currently viewed in the industry?
DevOps is in a state of flux, with some viewing it as a concrete methodology and others as an evolving concept.
155
What is the significance of the phrase "You Build It, You Run It" in DevOps?
This phrase, attributed to Amazon CTO Werner Vogels, emphasizes that developers should take operational responsibility for their code. This approach fosters better service quality, customer interaction, and continuous improvement through feedback loops.
156
What does the CALMS acronym in DevOps stand for?
Culture Automation Lean Measurement Sharing
157
Why is culture important in DevOps?
A strong DevOps culture promotes collaboration with shared values, reducing conflicts and fostering innovation.
158
What is a blameless culture in DevOps?
A blameless culture focuses on learning from mistakes instead of assigning blame. This promotes continuous improvement and knowledge sharing.
159
How did Toyota’s NUMMI plant demonstrate the importance of culture?
At NUMMI, Toyota retrained GM workers with a high-trust, continuous improvement culture, which led to the production of the highest quality cars in America within three months.
160
Why is automation critical in DevOps?
Automation minimizes manual tasks, reducing the probability of deployment failures and increasing operational efficiency.
161
What are the benefits of automation in DevOps?
Automation ensures repeatable, documented processes, improving velocity, transparency, and freeing up time for innovation.
162
What is Jidoka in Toyota’s automation approach?
Jidoka, or "automation with a human touch," integrates human wisdom into automation. Machines can detect abnormalities and halt processes, while human operators can intervene when necessary.
163
What is the Lean principle in DevOps?
Lean in DevOps focuses on eliminating waste to enhance efficiency and reduce unnecessary delays.
164
What are some ways to eliminate waste in DevOps?
- Limiting work in progress (WIP) to prevent interruptions. - Reducing handoffs to enhance communication and coordination.
165
How do Kanban boards help eliminate waste?
Kanban boards visualize work, helping teams identify inefficiencies such as waiting, overproduction, and unnecessary motion.
166
What role does measurement play in DevOps?
Measurement involves obsessively monitoring metrics and logs to detect and resolve problems quickly.
167
What are metrics in DevOps?
Metrics are recorded values that measure system behavior over time, providing insights into system performance and potential issues.
168
How does Toyota use measurement in its production process?
Toyota marks factory floors in tenths of their length to track bottlenecks and guide managers to areas needing improvement.
169
Why is sharing important in DevOps?
Sharing knowledge fosters collaboration between development and operations teams, leading to quicker problem detection and resolution.
170
How can teams foster better sharing in DevOps?
Teams can build relationships by inviting members from different departments to meetings, informal gatherings, and problem-solving discussions.
171
What is Genchi Genbutsu, and how does it relate to DevOps?
Genchi Genbutsu ("go and see") is a Toyota principle where managers observe processes firsthand, ensuring a deeper understanding of operations and fostering collaboration.
172
What are the differing concerns of developers and operators?
The differing concerns of developers and operators are agility and stability.
173
What is DevOps in its purest form?
DevOps in its purest form is about breaking down the (metaphorical) wall between developers and operators.
174
Why should one reduce organisation silos?
One should reduce organisation silos because success comes from cooperation between cross-functional teams.
175
Why should one accept failure as normal?
One should accept failure as normal because any system that humans build is inherently unreliable.
176
Why should one implement gradual change?
One should implement gradual change because it is hard to find bugs in large, million-line changes.
177
Why should one leverage tooling and automation?
One should leverage tooling and automation because work must be turned into repeatable patterns that can be automated.
178
Why should one measure everything?
One should measure everything because we must have numbers to support the DevOps investment and there must be clear metrics for success.
179
How does SRE reduce organisational silos?
SRE reduces organisational silos by sharing ownership with developers and using the same shared tooling and by adopting measures of availability that force conversations between SRE and development.
180
How does SRE accept failure as normal?
SRE accepts failure as normal by using Service Level Objectives (SLOs), which force one to admit a system may be unreliable, and by conducting blameless postmortems when that unreliability occurs.
181
How does SRE implement gradual change?
SRE implements gradual change by moving fast to reduce the cost of failure through small iterative deployments.
182
How does SRE leverage tooling and automation?
SRE leverages tooling and automation by ensuring that tasks done manually this year should be done automatically next year, so eliminating toil.
183
How does SRE measure everything?
According to Vargo, SRE measures everything by not only measuring system metrics, such as reliability, but also human metrics, such as the amount of toll
184
What is Site Reliability Engineering (SRE)?
Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. Its main goals are to create scalable and highly reliable software systems. It emerged from Google as a way to ensure that large-scale services remain reliable and scalable.
185
How does SRE relate to DevOps?
SRE can be seen as an implementation of DevOps principles. In particular, SRE teams often implement the CALMS principles—Culture, Automation, Lean, Measurement, and Sharing—as a structured way of improving reliability and collaboration between development and operations.
186
What does CALMS stand for in DevOps?
Culture: Promoting a culture of shared responsibility. Automation: Automating repetitive tasks. Lean: Reducing waste and inefficiency. Measurement: Tracking performance with metrics. Sharing: Open communication and knowledge exchange.
187
How does SRE implement the 'Culture' principle of DevOps?
SRE often has its own team or embeds engineers within development teams. A key cultural practice is conducting blameless postmortems after incidents, which aim to foster a safe environment for learning rather than assigning blame.
188
What is a blameless postmortem and why is it important in SRE?
A blameless postmortem is a document created after an incident that focuses on understanding what went wrong and how to prevent it in the future, without blaming individuals. This encourages transparency and continuous improvement.
189
What role does organizational learning play in SRE?
Organizational learning in SRE involves circulating postmortem reports to improve collective knowledge and resilience. It transforms failures into opportunities for system-wide improvement.
190
How does SRE implement the 'Automation' principle?
SREs use software to eliminate toil—manual, repetitive operations work that adds little enduring value. Automation allows teams to focus more on engineering tasks than on reactive support.
191
What is 'toil' in the context of SRE?
Toil is work that is manual, repetitive, automatable, tactical, devoid of lasting value, and scales linearly with service growth. Reducing toil through automation is a key objective of SRE.
192
What is the ideal balance between engineering and operations work for an SRE team?
At companies like Google, SREs are expected to spend at least 50% of their time on engineering work, with the remainder handling support tickets, incidents, and on-call duties.
193
What is 'pager fatigue' and how is it managed in SRE?
Pager fatigue occurs when SREs are overwhelmed with too many incidents during on-call shifts. Best practice suggests handling no more than two incidents per 8–12 hour shift to allow thorough resolution and proper postmortems.
194
How does SRE implement the 'Lean' principle of DevOps?
SRE reduces waste by limiting work-in-progress using control loops like error budgets, and by polarizing time—clearly separating development and operational tasks by time blocks.
195
What is an error budget?
An error budget is defined as the difference between the agreed reliability level (SLO) and the observed reliability. If the system performs within this budget, new features can be released; if not, development is paused to focus on stability.
196
What does 'polarizing time' mean in the context of Lean SRE practices?
Polarizing time means dedicating distinct periods solely to development or operations work, reducing context-switching and enhancing focus and productivity.
197
How does SRE approach the 'Measurement' principle?
SREs obsessively monitor a few key metrics chosen based on user needs, intuition, and experience. These metrics guide decisions and inform about service health.
198
What are SLIs, SLOs, and SLAs in SRE?
SLI (Service Level Indicator): A quantitative metric that measures service performance. SLO (Service Level Objective): A target value or range for an SLI. SLA (Service Level Agreement): A formal agreement that outlines the consequences of meeting or failing to meet an SLO.
199
What is the purpose of obsessive metric monitoring in SRE?
Obsessive monitoring ensures that the system remains within acceptable performance boundaries and enables early detection and resolution of problems before they impact users.
200
How does SRE implement the 'Sharing' principle of DevOps?
Through open communication and the dissemination of knowledge, tools, and techniques. Both development and operations must share insights to align their objectives and enhance service reliability.
201
Why is tool sharing important in SRE?
Sharing tools ensures consistency in managing environments and enables self-service deployments, reducing dependencies and delays in workflows.
202
Why is knowledge sharing critical in SRE?
Knowledge sharing ensures that development is aware of operational concerns and vice versa. This bidirectional communication improves system design and reliability.
203
What makes a good alert, and why might a Site Reliability Engineer (SRE) care?
A good alert is one that is actionable, and is for something that could not be fixed without a human being - if automated remediation is possible, at least try that. A Site Reliability Engineer cares about good alerts, because they lose sleep over bad ones.
204
What is a reliability theatre, and why might a Site Reliability Engineer care?
A traditional Network Operations Centre (NoC) or war room is seen as a reliability theatre that impresses only the general public. An SRE cares about a reliability theatre because it may limit the effectiveness of incident response.
205
What is a snowflake and why might an SRE care?
A snowflake is a production server that is kept running through regular manual configuration tweaks made via the command line. An SRE cares about snowflakes because they are hard to reproduce and debug
206
What are pets, cattle and poultry, and why might an SRE care?
Pets are virtual (snowflake) servers with names that need individual attention; cattle are virtual servers with numbers that need group attention; poultry are virtual containers with numbers that need group attention. An SRE cares about pets, cattle and poultry because of their (decreasing) administrative cost.
207
Why is autonomous > automated, and why might an SRE care?
Autonomous > automated because it is less work. An SRE cares about this because autonomous systems can take away a world of pain from the on-call rotation.
208
What advantages are there to embedding an SRE in a development team?
The advantages of embedding an SRE in a development team are that it builds trust and development and SRE gets input into system design from the very beginning
209
What is the right number of nines?
The right number of nines is a decision made on the basis of how much downtime the business can tolerate.
210
Why is it dangerous to improve a system without revising its Service Level Agreement?
It is dangerous to improve a system without revising its Service Level Agreement because customers will consider the delivered level of reliability to be the agreed level