AWS Well-Architected Framework Flashcards
What are the 5 pillars of the well-architected Framework?
- Operational Excellence
- Security
- Reliability
- Performance Efficiency
- Cost Optimization
Well-Architected: What is a “component”?
It is the code, configuration, and AWS resources that together deliver against a requirement. A component is a unit of technical ownership and is decoupled from other components.
Well-Architected: What is a “workload”?
It is a set of components that work together. Usually it is the level of detail that business and tech leaders communicate about.
Well-Architected: What is a “milestone”?
Milestones mark key changes in the architecture as it evolves throughout the product lifecycle.
Well-Architected: What is “architecture”?
Generally it is how components work together in a workload. How they communicate and interact is the focus of architecture diagrams.
Well-Architected: What is the “technology portfolio”?
It is the collection of workloads required for the business to operate.
Well-Architected: Discuss tradeoffs between pillars.
Business decisions can drive engineering priorities. May need to lower cost but consequentially lower reliability at times, or may go the other way. However, security and operational-excellence are not traded-off against the other pillars.
How are risks mitigated when distributing decision making authority?
In 2 ways:
- Have practices (ways of doing things, process, standards) that enable each team with experts who are put in place to ensure teams raise the bar on the standards that need to be met.
- Implement mechanisms to carry automated checks to ensure standards are being met.
What is a fundamental part of Amazon’s innovation process?
Working backward from the customer so that products are built in response to customer needs.
How did the Well-Architected Framework come about?
It is the customer-facing implementation of Amazon’s internal review process where principal engineering thinking across field roles have been codified.
What are the general design principles that come out of the Well-Architected Framework?
- Stop guessing capacity needs.
- Test systems at production scale.
- Automate to make architectural experimentation easier.
- Allow for architectures that evolve over time as we learn more
- Drive architectures using data
- Improve through game days to simulate events in production
Operational Excellence: What are the 5 design principles?
- Perform operations as code
- Make frequent, small, reversible changes
- Refine operations procedures frequently
- Anticipate failure
- Learn from operational failures
Operational Excellence: Why would you want to make small and reversible changes frequently?
So that we can allow components to be updated regularly and can be reversed if they fail.
Operational Excellence: How would you refine operations procedures frequently?
Set up game days that simulate production situations and validate that procedures are effective and teams are aware.
Operational Excellence: How would you anticipate failure?
Perform pre-mortem exercises to identify sources of failures and test these. Also test response procedures to simulated events.
Operational Excellence Practice Areas: What are the four best practice areas for operational excellence in the cloud?
- Organization
- Prepare
- Operate
- Evolve
Operational Excellence Practice Areas: Describe the “Organization” practice area
- Evaluate customer needs and be aware of guidelines/obligations defined by governance and external factors (compliance requirements).
- Evaluate threats to the business and impact of risks between competing interests.
- Ensure owners are in place for each application, workload, platform, and infrastructure component. Separate owners for definition and performance.
- Recognize business value of each component and define responsibilities of team members to act appropriately.
- Encourage experimentation and seek multiple diverse perspectives
Operational Excellence Practice Areas: How do you determine what your priorities are?
Everyone needs to understand their part, if goals are shared across the organization priorities can be set better for resources.
Operational Excellence Practice Areas: How do you structure your organization to support your business outcomes?
Teams must understand their part and their roles in success of other teams, other teams roles in their success, and shared goals.
Operational Excellence Practice Areas: How does organizational culture support business outcomes?
Provide support to the team so that they can be effective taking action and supporting business outcome
Operational Excellence Practice Areas: How do you prepare for operational excellence?
Need to understand workloads and expected behavior to provide insight and build procedures to support them. Develop telemetry necessary to monitor workload health, identify risky outcomes, and enable effective response. Adopt approaches that improve flow of changes into production and enable refactoring. Adopt approaches that provide fast feedback on quality and rapid recovery from undesirable changes. Use a consistent process to know when a workload can go live.
Operational Excellence Practice Areas: How do you design your workload so that you can understand its state?
Design it such that all information provided across components give us what we need to understand its internal state and effectively provide responses:
- Metrics
- Logs
- Traces
Operational Excellence Practice Areas: How do you reduce defects, ease remediation, and improve flow into production?
Approaches that enable refactoring, fast feedback on quality and bug fixing are ideal. This enables rapid identification and remediation of issues introduced through deployment activities.
Operational Excellence Practice Areas: How do you mitigate deployment risks?
Take on approaches with fast feedback on quality and enable rapid recovery from undesirable changes.
Operational Excellence Practice Areas: How to know if you’re ready to support a workload?
Evaluate operational readiness of workload, processes, procedures, and personnel to understand operational risks to workload.
Operational Excellence Practice Areas: What are some additional steps to prepare?
- Implement operations activities as code
- Use pre-mortems to anticipate failure
- Use resource tags and resource groups with consistent tagging strategy
- Tag for organization, cost accounting, access controls, and execution of automated operations activities
- Plan what to do with live systems that don’t comply with changes to the go-live checklist
Operational Excellence Practice Areas: What does operational health include?
Both the health of the workload and health of operations activities performed in support of the workload (deployment and incident response). Establish baselines for improvement, investigation, intervention, collect and analyze metrics, then validate understanding of operations success and changes over time.
Operational Excellence Practice Areas: When to use runbooks and playbooks?
Runbooks for well understood events and playbooks to help with investigation and resolution of issues.
Operational Excellence Practice Areas: How to prioritize responses to events?
Start with the business and customer impact
Operational Excellence Practice Areas: What to do when an alert is raised?
Make sure an associated process to be executed is in place with an identified owner. The personnel required along with escalation triggers need to be in place.
Operational Excellence Practice Areas: How would operational status of workloads be communicated?
Through dashboards and notifications tailored to the target audience (customer, business, developers, operations) and manage expectations. Inform them immediately when normal operations resume.
Operational Excellence Practice Areas: How would you generate dashboard views of collected metrics?
Can use CloudWatch or third party applications to aggregate and present business, workload, and operations level views of operations activities. Can use X-Ray, CloudWatch, CloudTrail, and VPC Flow Logs to enable identification of workload issues.
Operational Excellence Practice Areas: How do you understand health of your workload?
Define, capture, analyze workload metrics
Operational Excellence Practice Areas: How do you understand health of operations?
Define, capture, analyze operations metrics
Operational Excellence Practice Areas: How do you manage workload and operations events?
Prepare and validate procedures for responding to events to minimize disruption
Operational Excellence Practice Areas: How would you evolve your workloads?
Learn, share, improve to sustain excellence. Dedicate work cycles to make continuous improvements. Perform post-incident analysis of customer impacting events and identify contributing factors along with preventative action.
Operational Excellence Practice Areas: How to get log data to properly analyze workload performance?
Can export data into S3 and use Glue to prepare data for analytics with metadata stored in the catalog. Athena can be used to analyze log data using SQL.
Operational Excellence Practice Areas: How do you evolve operations?
Dedicate time and resources for continuous incremental improvement to evolve effectiveness and efficiency of operations.
Security: What are the main design principles?
- Implement a strong identity foundation.
- Enable traceability
- Apply security at all layers
- Automate security best practices
- Protect data in transit and at rest
- Keep people away from manually processing data
- Prepare for security events
Security Practice Areas: How do you securely operate your workload?
- Apply best practices to every area of security
- Take organizational and workload level requirements and processes and apply them to all areas
- Stay up to date on industry sources, threat intelligence to evolve threat model and control objectives
- Automate security processes, testing, and validation to scale security operations
- Segregate workloads by account based on function and compliance/data sensitivity requirements
Security Practice Areas: How do you manage identities for people and machines?
Need two types of identities when operating secure AWS workloads:
- Human identities: admins/devs/operators/users to access environments and applications.
- Machine identities: service applications, workloads etc. to make requests to read data. EC2 instances, Lambda functions or external parties who need access.
Security Practice Areas: How to manage permissions for people and machines?
User access should be granted using a least-privilege approach with password requirements and MFA enforced. Programmatic access only performed using temp and limited-privilege credentials issued by AWS security token service.
Security Practice Areas: How do you detect and investigate security events?
Pprocess logs, events and monitor for auditing, automated analysis and alarming. CloudTrail, CloudWatch provide monitoring of metrics while Config provides configuration history. GuardDuty continuously monitors for malicious activity.
Security Practice Areas: What are the different types of defective controls?
- Conduct inventory of assets and detailed attributes for better decision making and lifecycle controls and establish operational baselines
- Use internal auditing, examination of controls related to information systems to ensure that practices meet policies and requirements.
Security Practice Areas: How would you make data handling more predictable and reliable?
Define a data-retention lifecycle or define where data will be preserved, archived, or deleted.
Security Practice Areas: How do you protect your network resources?
Any workload that has some form of network connectivity requires multiple layers of defense to protect from external and internal network-based threats.
Security Practice Areas: How do you protect your compute resources?
Compute resources (EC2 instances, containers, Lambda functions, etc) need multiple layers of defense to help protect from external and internal threats.