Well Architected Framework WP - Operational Excellence Flashcards

1
Q

Operational Excellence

A

practices and procedures for managing production workloads

how planned changes are executed and responses to unexpected events

change execution and responses should be automated. All processes should be documented, tested, reviews

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Design Principles (PAMRAL)

A

Perform operations with code

Annotated documentation

Make frequent, small reversible changes

Refine operations procedures frequently

Anticipate Failure

Learn from all operational failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Definition of Operational Excellence (POE)

A

Prepare

Operate

Evolve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Preparation for Operational Excellence

A

To prepare consider:
operational priorities
design for operations
operational readiness

======

use checklists to ensure workloads are ready for production

Workloads should have runbooks and playbooks

runbook - operations guidnace

playbook - for responding to unexpected events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Preparation best practices

A

In AWS use Cloudformation to ensure environments have all required resources and configuration is based on tested best practices

Use Autoscaling

Use AWS Config to make rules for automatically tracking and responding to changes

Use tagging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Preparation questions

A

what best practices are you using

how are you doing configuration management

Keep documentation current

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Operational Excellence - Operations

A

operations should be standardized and managemable

Focus on automation, small frequent changes, QA testing

Use logs and metrics

Setup pipelines for continuous integration and deployment

Should be able to revert changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Operations - questions

A

How are you evolving your workload while minimizing impact of change

how do you monitor workload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Operational Excellence - Responses

A

responses should be automated

for alerting, mitigation, remediation, rollback and recovery

responses should follow a predefined playbook

in AWS you can use SNS for some of this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

responses questions

A

how do you respond to unplanned events

how is escalation managed when responding to unplanned events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Key AWS Services for defining priorities

A

AWS Config inventories your AWS resources and configurations

Service Catalog creates stand set of service offerings

Use autoscaling, SQS to increase automation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Key AWS Services for Operations

A

Codecommit
Code Deploy
Code Pipeline to manage code changes
Cloud Trail to audit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Key AWS Services for Responses

A

Cloudwatch alarms for setting thresholds for alerting, notification

Cloudwatch events for triggering notifications and automated responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Key AWS Services for defining priorities / preparation

A

AWS Support, including support center. Business and Enterprise Support customers get access to additional checks and reviews

AWS Cloud compliance for regulatory, compliance requirements

AWS Trusted Advisor for optimizations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Key AWS Services for designing for operations

A

Cloudwatch to monitor resources and applications
CloudFormation to create version-controlled templates for your infrastructure
DeveloperTools to enable safe, rapid delivery of software
AWS X-Ray to trace user requests through entire application for analysis, debugging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Design for Operations - Key Points

A

View entire workload as code, define and update as code.

Align engineering practices for defect reduction, rapid fixes. Use logging for visibility into architecture

17
Q

Key AWS Services for operational readiness

A

AWS Lambda to enable operational procedures as code that can be triggered by events

AWS Config to track changes to CloudFormation Stacks

EC2 Systems Manager to automate management tasks on EC2 instances

18
Q

2 Considerations for Operational Success

A

Understanding operational health

responding to events

19
Q

Operate - Understanding Operational Health

A

To understand Operational Health, use metrics to implement dashboards

Send log data to CloudWatch Logs, define baseline metrics

Send CloudWatch Logs to Elasticsarch and use Kibana

20
Q

AWS Service health and personal health dashboards

A

In the AWS Shared Responsibility Model, these provide portions of monitoring to you for alerting and premeditation guidance when AWS experiences events

21
Q

Key AWS Services for understanding Operational Health

A

CloudWatch - metrics, dashboards
CloudWatch Logs - monitor, store logs from various sources
ElasticSearch (ES) - use for log analytics, monitoring
Personal Health Dashboard - alerts, remediation when AWS experiences issues
Service Health Dashboard - shows realtime AWS service availability

22
Q

Operate - Responding to Events

A

Anticipate planned and unplanned operational events

AWS lets you script responses and trigger their execution via code

Automate execution of runbook and playbook actions

23
Q

Ways to automatically respond to events

A

create CloudWatch rules to trigger responses through CloudWatch Targets like Lambda functions, SNS Topics, ECS Tasks

CloudWatch alarms that perform actions using EC2 actions, AutoScaling actions or sending SNS notifications to SNS topic

Use SNS to invoke Lambda

24
Q

Key AWS Service for responding to events

A

AWS Lambda to define operational procedures as code that can be triggered

also:
CloudWatch - collect logs, metrics, enables triggered execution of events
CloudWatch Events - deliver realtime stream of events that can be matched to rules
SNS - lets you invoke Lambda
AutoScaling
EC2 Systems Manager - automate management tasks on EC2 instances

25
Q

Evolve

A

Continuously improve over time

implement small, frequent changes

Learn from experience

Share learnings

26
Q

Evolve - what to do with aggregated logs in AWS?

A

create detailed history of all your operational activities, workloads and infrastructure to analyze operations over time

27
Q

Evolve - how to use CloudTrail?

A

track API activity to know what’s happening across your accounts

Track AWS developer tools activities with CloudTrail and CloudWatch

These add detailed activity history of deployments and outcomes to CloudWatch Logs data

28
Q

Evolve - why ingest CloudWatch Logs data into ElasticSearch?

A

to use built in support for Kibana to create visualizations and perform analysis

29
Q

Evolve - why export CloudWatch data to S3?

A

To analyze it with Amazon Athena and use Quicksight to perform analysis, create visualizations

30
Q

Key AWS Services for Evolving

A

ElasticSearch to analyze log data and gain insights

also:
Amazon Quicksight - BA service for visualization, analysis
Amazon Athena - serverless interactive query service to analyze data in S3
S3 - collect and archive logs
CloudWatch - collect logs and metrics, create dashboards

31
Q

Evolve - ways to share learnings

A

Use IAM to give access to resources across accounts

Use AWS CodeCommit to share applications, procedures, libararies, documentation

Share compute standards by giving access to AMIs

Share CloudFormation templates

Authorize lambda functions across accounts

32
Q

Key AWS Services for sharing learnings

A

IAM

also:
SNS - notify subscribers when resources are published
CodeCommit - version controlled repository for operations as code
Lambda
CloudFormation - standardized templates
AMI’s