Well Architected Framework - Operational Excellence Flashcards

Question 1

Q

What is the Operational Excellence Pillar?

Answer

A

It includes the ability to support development and run workloads
effectively, gain insight into their operations, and to continuously improve supporting processes and procedures to deliver business value.

The operational excellence pillar provides an overview of design principles, best practices, and questions.

You can find prescriptive guidance on implementation in the Operational Excellence Pillar whitepaper.

Question 2

Q

What are the Operational Excellence Design Principles?

Answer

A

Perform operations as code
Make frequent, small, reversible changes
Refine operations procedures frequently
Anticipate failure
Learn from all operational failures

Question 3

Q

Perform Operations as code

Answer

A

In the cloud, you can apply the same engineering discipline that you use
for application code to your entire environment. You can define your entire workload (applications, infrastructure) as code and update it with code. You can implement your operations procedures as code and automate their execution by triggering them in response to events. By performing operations
as code, you limit human error and enable consistent responses to events

Question 4

Q

Make frequent, small, reversible changes

Answer

A

Design workloads to allow components to be updated regularly. Make changes in small increments that can be reversed if they fail (without affecting customers when possible).

Question 5

Q

Refine operations procedures frequently

Answer

A

As you use operations procedures, look for opportunities to improve them. As you evolve your workload, evolve your procedures appropriately. Set up regular game days to review and validate that all procedures are effective and that teams are familiar with them.

Question 6

Q

Anticipate failure

Answer

A

Perform “pre-mortem” exercises to identify potential sources of failure so that
they can be removed or mitigated. Test your failure scenarios and validate your understanding of their impact. Test your response procedures to ensure that they are effective, and that teams are familiar with their execution. Set up regular game days to test workloads and team responses to simulated events.

Question 7

Q

Learn from all operational failures

Answer

A

Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and through the entire organization.

Question 8

Q

Four best practice areas for operational excellence in the cloud

Answer

A

Organization
Prepare
Operate
Evolve

Question 9

Q

OPS 1: How do you determine what your priorities are?

Answer

A

Everyone needs to understand their part in enabling business success. Have shared goals in order to set priorities for resources. This will maximize the benefits of your efforts.

Question 10

Q

OPS 2: How do you structure your organization to support your business outcomes?

Answer

A

Your teams must understand their part in achieving business outcomes. Teams need to understand their roles in the success of other teams, the role of other teams in their success, and have shared goals. Understanding responsibility, ownership, how decisions are made, and who has authority to make decisions will help focus efforts and maximize the benefits from your teams.

Question 11

Q

OPS 3: How does your organizational culture support your business outcomes?

Answer

A

Provide support for your team members so that they can be more effective in taking action and supporting your business outcome

Question 12

Q

Prepare for operational excellence

Answer

A

Understand your workloads and their expected behaviors
Design workload providing necessary info to understand internal state (for example, metrics, logs, events, and traces)
Iterate - develop monitoring for the health of your workload
identify when outcomes are at risk, and enable effective responses.
enable situational awareness (changes in state, user activity, privilege access, utilization counters)
improve flow of changes into production that enable refactoring, fast
feedback on quality, and bug fixing.
Provide fast feedback on quality-enable rapid recovery undesired outcomes
Mitigate impact of issues introduced through deployment changes.
Plan for unsuccessful changes so that you are able to respond faster if necessary
Test and validate the changes you make.
Be aware of planned activities in your environments to manage risk
Emphasize frequent, small, reversible changes to limit the scope of change.
Evaluate the operational readiness of your workload, processes, procedures, and personnel to understand the operational risks related to your workload. You should use a consistent process (including manual or automated checklists) to know when you are ready to go live with your workload or a change.
This will also enable you to find any areas that you need to make plans to address. Have runbooks that document your routine activities and playbooks that guide your processes for issue resolution.
Understand the benefits and risks to make informed decisions to allow changes to enter production.
AWS enables you to view your entire workload (applications, infrastructure, policy, governance, and operations) as code. This means you can apply the same engineering discipline that you use for application code to every element of your stack and share these across teams or organizations to magnify the benefits of development efforts. Use operations as code in the cloud and the ability to safely experiment to develop your workload, your operations procedures, and practice failure. Using AWS CloudFormation enables you to have consistent, templated, sandbox development, test, and production environments with increasing levels of operations control.

Question 13

Q

OPS 4: How do you design your workload so that you can understand its state?

Answer

A

Design your workload so that it provides the information necessary across all components (for example, metrics, logs, and traces) for you to understand its internal state. This enables you to provide effective responses when appropriate

Question 14

Q

OPS 5: How do you reduce defects, ease remediation, and improve flow into production?

Answer

A

Adopt approaches that improve flow of changes into production, that enable refactoring, fast feedback on quality, and bug fixing. These accelerate beneficial changes entering production, limit issues deployed, and enable rapid identification and remediation of issues introduced through deployment activities.

Question 15

Q

OPS 6: How do you mitigate deployment risks?

Answer

A

Adopt approaches that provide fast feedback on quality and enable rapid recovery from changes that do not have desired outcomes. Using these practices mitigates the impact of issues introduced through the deployment of changes

Question 16

Q

OPS 7: How do you know that you are ready to support a workload?

Answer

Study These Flashcards

A

Evaluate the operational readiness of your workload, processes and procedures, and personnel to understand the operational risks related to your workload