Well Architected Framework WP - Operational Excellence Flashcards
Operational Excellence
practices and procedures for managing production workloads
how planned changes are executed and responses to unexpected events
change execution and responses should be automated. All processes should be documented, tested, reviews
Design Principles (PAMRAL)
Perform operations with code
Annotated documentation
Make frequent, small reversible changes
Refine operations procedures frequently
Anticipate Failure
Learn from all operational failures
Definition of Operational Excellence (POE)
Prepare
Operate
Evolve
Preparation for Operational Excellence
To prepare consider:
operational priorities
design for operations
operational readiness
======
use checklists to ensure workloads are ready for production
Workloads should have runbooks and playbooks
runbook - operations guidnace
playbook - for responding to unexpected events
Preparation best practices
In AWS use Cloudformation to ensure environments have all required resources and configuration is based on tested best practices
Use Autoscaling
Use AWS Config to make rules for automatically tracking and responding to changes
Use tagging
Preparation questions
what best practices are you using
how are you doing configuration management
Keep documentation current
Operational Excellence - Operations
operations should be standardized and managemable
Focus on automation, small frequent changes, QA testing
Use logs and metrics
Setup pipelines for continuous integration and deployment
Should be able to revert changes
Operations - questions
How are you evolving your workload while minimizing impact of change
how do you monitor workload
Operational Excellence - Responses
responses should be automated
for alerting, mitigation, remediation, rollback and recovery
responses should follow a predefined playbook
in AWS you can use SNS for some of this
responses questions
how do you respond to unplanned events
how is escalation managed when responding to unplanned events
Key AWS Services for defining priorities
AWS Config inventories your AWS resources and configurations
Service Catalog creates stand set of service offerings
Use autoscaling, SQS to increase automation
Key AWS Services for Operations
Codecommit
Code Deploy
Code Pipeline to manage code changes
Cloud Trail to audit
Key AWS Services for Responses
Cloudwatch alarms for setting thresholds for alerting, notification
Cloudwatch events for triggering notifications and automated responses
Key AWS Services for defining priorities / preparation
AWS Support, including support center. Business and Enterprise Support customers get access to additional checks and reviews
AWS Cloud compliance for regulatory, compliance requirements
AWS Trusted Advisor for optimizations
Key AWS Services for designing for operations
Cloudwatch to monitor resources and applications
CloudFormation to create version-controlled templates for your infrastructure
DeveloperTools to enable safe, rapid delivery of software
AWS X-Ray to trace user requests through entire application for analysis, debugging