AWS Well-Architected Framework Flashcards

Question

Operational Excellence Practice Areas: How to know if you're ready to support a workload?

Answer 1

Evaluate operational readiness of workload, processes, procedures, and personnel to understand operational risks to workload.

Answer 2

1. Implement operations activities as code 2. Use pre-mortems to anticipate failure 3. Use resource tags and resource groups with consistent tagging strategy 4. Tag for organization, cost accounting, access controls, and execution of automated operations activities 5. Plan what to do with live systems that don't comply with changes to the go-live checklist

Answer 3

Both the health of the workload and health of operations activities performed in support of the workload (deployment and incident response). Establish baselines for improvement, investigation, intervention, collect and analyze metrics, then validate understanding of operations success and changes over time.

Answer 4

Runbooks for well understood events and playbooks to help with investigation and resolution of issues.

Answer 5

Start with the business and customer impact

Answer 6

Make sure an associated process to be executed is in place with an identified owner. The personnel required along with escalation triggers need to be in place.

Answer 7

Through dashboards and notifications tailored to the target audience (customer, business, developers, operations) and manage expectations. Inform them immediately when normal operations resume.

Answer 8

Can use CloudWatch or third party applications to aggregate and present business, workload, and operations level views of operations activities. Can use X-Ray, CloudWatch, CloudTrail, and VPC Flow Logs to enable identification of workload issues.

Answer 9

Define, capture, analyze workload metrics

Answer 10

Define, capture, analyze operations metrics

Answer 11

Prepare and validate procedures for responding to events to minimize disruption

Answer 12

Learn, share, improve to sustain excellence. Dedicate work cycles to make continuous improvements. Perform post-incident analysis of customer impacting events and identify contributing factors along with preventative action.

Answer 13

Can export data into S3 and use Glue to prepare data for analytics with metadata stored in the catalog. Athena can be used to analyze log data using SQL.

Answer 14

Dedicate time and resources for continuous incremental improvement to evolve effectiveness and efficiency of operations.

Answer 15

1. Implement a strong identity foundation. 2. Enable traceability 3. Apply security at all layers 4. Automate security best practices 5. Protect data in transit and at rest 6. Keep people away from manually processing data 7. Prepare for security events

Answer 16

1. Apply best practices to every area of security 2. Take organizational and workload level requirements and processes and apply them to all areas 3. Stay up to date on industry sources, threat intelligence to evolve threat model and control objectives 4. Automate security processes, testing, and validation to scale security operations 5. Segregate workloads by account based on function and compliance/data sensitivity requirements

Answer 17

Need two types of identities when operating secure AWS workloads: 1. Human identities: admins/devs/operators/users to access environments and applications. 2. Machine identities: service applications, workloads etc. to make requests to read data. EC2 instances, Lambda functions or external parties who need access.

Answer 18

User access should be granted using a least-privilege approach with password requirements and MFA enforced. Programmatic access only performed using temp and limited-privilege credentials issued by AWS security token service.

Answer 19

Pprocess logs, events and monitor for auditing, automated analysis and alarming. CloudTrail, CloudWatch provide monitoring of metrics while Config provides configuration history. GuardDuty continuously monitors for malicious activity.

Answer 20

1. Conduct inventory of assets and detailed attributes for better decision making and lifecycle controls and establish operational baselines 2. Use internal auditing, examination of controls related to information systems to ensure that practices meet policies and requirements.

Answer 21

Define a data-retention lifecycle or define where data will be preserved, archived, or deleted.

Answer 22

Any workload that has some form of network connectivity requires multiple layers of defense to protect from external and internal network-based threats.

Answer 23

Compute resources (EC2 instances, containers, Lambda functions, etc) need multiple layers of defense to help protect from external and internal threats.

Answer 24

1. Implement stateful and stateless packet inspection 2. Use Amazon Virtual Private Cloud to create private, secured, scalable environment 3. Define topology within that environment with gateways, routing tables, and public/private subnets

Answer 25

1. Enforce boundary protecting 2. Monitor points of ingress/egress 3. Comprehensive logging, monitoring, alerting

Answer 26

Can tailor or harden configuration of an Amazon Elastic Compute Cloud, Elastic Container Service container, or Elastic Beanstalk instance and persist this configuration to an immutable Amazon Machine Image (AMI). All new virtual servers can be launched with this AMI to receive this hardened configuration.

Answer 27

Categorize organizational data based on levels of sensitivity (data classification) and encrypt data to make it unreadable to anyone unauthorized.

Answer 28

1. As an AWS customer, maintain full control over data 2. Rotate keys regularly to encrypt data 3. Log file access and change data 4. Choose storage systems like S3 Standard, S3 Standard-IA, S3 Zone-IA to provide 99.99% durability 5. Implement versioning to protect against accidental overwrites 6. AWS never initiates movement of data between Regions

Answer 29

Categorize based on criticality and sensitivity to help determine protecting and retention controls

Answer 30

Implement multiple controls to reduce risk of unauthorized access and can use server-side encryption to store data in an encrypted form.

Answer 31

Can arrange for HTTPS encryption/decryption handled by Elastic Load Balancing if needed

Answer 32

1. Detailed logging with file access changes 2. Events to be processed and trigger tools that automate responses through use of AWS APIs 3. Provision tooling ahead of time and a "clean room" using AWS cloud formation to carry out forensics in a safe, isolated environment

Answer 33

Preparation is key to minimize disruption. Make sure that there's a way to quickly grant access for security team and automated isolate of instances and capturing data/state for the organization.

Answer 34

1. Automatically recover from failure by trigger automation when threshold is breached. 2. Test recovery procedures and automate different failure scenarios 3. Scale horizontally to increase aggregate workload availability with smaller resources that don't share a common point of failure 4. Stop guessing capacity and instead automate addition/removal by monitoring demand and utilization 5. Change infrastructure using automation that can be tracked and reviewed

Answer 35

1. Foundations 2. Workload Architecture 3. Change Management 4. Failure Management

Answer 36

They are the requirements where the scope extends beyond a single workload or project. One example is having sufficient network bandwidth to the data center.

Answer 37

Quotas/limits exist to prevent accidentally provisioning more resources than necessary and to limit request rates on API operations to protect services from abuse.

Answer 38

Workloads exist in multiple environments and they could be cloud environments that are a mix of being publicly or privately accessible. Plans must include network considerations such as intra and inter-system connectivity, public IP address management, private IP address management, and domain name resolution.

Answer 39

Use a service-oriented architecture (SOA) or a microservices architecture. SOA is the practice of making software components reusable via service interfaces. Micro services goes further to make components smaller and simpler.

Answer 40

Distributed systems rely on communication networks to interconnect components like servers and services. Workloads must operate reliably despite data loss or latency in these networks. Components of the distributed system need to operate in a way that doesn't impact other components or the workload itself. This ensures better MTBF (mean time between failures).

Answer 41

MTTR - mean time to recovery

Answer 42

Logs/metrics can be configured to send notifications when thresholds or crossed or significant events occur. Can recognize low-performance thresholds and even recover automatically in response.

Answer 43

Scalable workloads provide elasticity to add/remove resources automatically to match current demand at any given point in time.

Answer 44

Controlled changes necessary to deploy new functionality and to ensure workloads and running known software that can be patched/replaced.

Answer 45

Back up data, applications, and configuration to meet requirement for recovery time objectives and recovery point objectives.

Answer 46

Fault isolated boundaries limit the effect of a failure within a workload to a limited number of components. Components outside of this boundary are unaffected by this failure and so impact on workload can be limited.

Answer 47

Workloads with requirement for high availability and low mean time to recovery (MTTR) must be architected for resiliency

Answer 48

After designing workloads to be resilient to stresses of production, testing is essential to ensure it will operate as designed with the resiliency expected.

Answer 49

Having backups and redundant workload components in place is the start of DR strategy. RTO (recovery time objective) and RPO (recovery point objective) are used to restore workloads and are set based on business needs. Incorporate the probability of disruption and cost of recovery to help inform business value of providing disaster recovery.

Answer 50

1. Democratize advanced technologies 2. Go global in minutes 3. Use serverless architectures 4. Experiment often 5. Consider mechanical sympathy

Answer 51

1. Selection 2. Review 3. Monitoring 4. Tradeoffs

Answer 52

Multiple solutions and features are used for optimal performance. Use a data-driven approach to select patterns and implementation.

Answer 53

1. Instances 2. Containers 3. Functions

Answer 54

Virtualized servers that can have capabilities changed with a button or API call. These capabilities range from varying SSDs to multiple GPUs.

Answer 55

They are a method of operating system virtualization to run an application and dependencies in resource-isolated processes. Fargate is a serverless compute for containers or EC2 can be used to keep control over installation/configuration/management of compute environment.

Answer 56

Either the Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS) can be used.

Answer 57

They abstract the execution environment form the code that we want to execute. Lambda allows us to run code without running an instance.

Answer 58

It varies based on architectural design, usage patterns, and configuration settings. It can use different compute solutions for various components and have different features enabled for performance.

Answer 59

1. Object Storage 2. Block Storage 3. File Storage

Answer 60

S3 is an example that makes data accessible from any internet location and is designed for durability.

Answer 61

Block storage provides highly available, consistent, low-latency block storage for each virtual host and is analogous to direct-attached (DAS) or Storage Area Network (SAN). Amazon Elastic Block Store (EBS) is designed for workload that require persistent storage accessible by EC2 instances that help you tune applications with the right storage capacity, performance and cost.

Answer 62

This provides access to a shared file system across systems. Amazon Elastic File System (EFS) is ideal to store large content repositories, development environments, media stores, or user home directories. Amazon FSx lets you launch popular file systems to use rich feature sets and fast performance of open source and commercially-licensed file sytems.

Answer 63

It varies based on the kind of access method (file/block/object), patterns of access (random/sequential), required throughput, frequency of access (online/offline/archival), frequency of update (WORM/dynamic), and availability/durability constraints.

Answer 64

1. Relational 2. Key-value 3. Document 4. In-memory 5. Graph 6. Time-series 7. Ledger

Answer 65

Balance availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Can use different databases for various subsystems with different features enabled to improve performance.

Answer 66

Networking solutions are configured for workloads. Optimal solution varies based on latency, throughput requirements, jitter, and bandwidth. In addition, need to validate physical constraints (user/on-prem resources) and offset with edge locations or resource placement.

Answer 67

Product features like Enhanced Networking, EBS-optimized instances, S3 transfer acceleration, and dynamic Amazon CloudFront are available to optimize network traffic. Also networking features like Route 53 latency routing, VPC endpoints, Direct Connect, and Global Accelerator to reduce network distance or jitter.

Answer 68

Take advantage of Regions, placement groups, and edge services.

Answer 69

Choose from finite options to begin with but keep an eye on what technologies and approaches become available over time.

Answer 70

Implement a performance review process and aply Deming's play-do-check-act (PDCA) cycle to drive iterative improvement.

Answer 71

Can use CloudWatch to monitor workload, respond to system-wide performance changes, optimize resource utilization and get a unified view of operational health. X-Ray can help developers analyze and debug production, distributed applications by identifying root causes and performance bottlenecks. On a more general note, implement automated triggers to minimize false positives. Plan for game days with simulations in the production environment.

Answer 72

Trade consistency, durability, space for time and latency. Can go global in minutes and even add readonly replicas to information stores to reduce load on primary database for example. To further evaluate tradeoffs, use a systematic approach (like load testing) to explore its value.

Answer 73

1. Implement Cloud Financial Management 2. Adopt a consumption model and only pay for what you use 4. Measure overall efficiency with business output and associated tech costs 5. Stop spending money on undifferentiated heavy lifting like racking/stacking/powering servers. 6. Analyze and attribute expenditure

Answer 74

1. Practice cloud financial management 2. Expenditure and usage awareness 3. Cost-effective resources 4. Manage demand and supply resources 5. Optimize over time

Answer 75

Sometimes best to just go to market quickly with new features and meeting deadlines rather than investing in up-front cost optimization. Temptation exists to overcompensate "just in case" and this might lead to over-provisioned and under-optimized deployments. This may only make sense when lifting and shifting resources from on-prem environment to the cloud.

Answer 76

Can use Cost Explorer with Athena and the Cost and Usage report. Supplement the team with experts in cost optimization and include people with supplementary skill sets in analytics and project management. Finally, try to improve on existing programs and processes instead of trying to build new ones from scratch. Leads to achieving outcomes much faster.

Answer 77

After creating an account structure with AWS Organizations or AWS Control Tower, can use resource tagging to apply business and organization information to usage and cost. Use Cost explorer for visibility into costs and control it by notifications in AWS budgets and controls with AWS Identity, IAM, and Service quotas.

Answer 78

Establish policies/mechanisms to ensure appropricate costs are incurred while objectives are achieved. Employ a checks-and-balances approach to innovate without overspending.

Answer 79

Implement change control and resource management from project inception to end of life. This makes sure that we shut down or terminate unused resources to reduce waste.

Answer 80

These tags can be used to categorize and track AWS costs. When applying tags to resources such as EC2 instances or S3 buckets, AWS generates a cost and usage report with the usage and tags. The tags can be applied to represent organization categories (cost centers, workload names, owners) to organize costs across multiple services. For high level insights and trends, use daily granularity. For deeper analysis, move to hourly granularity with the cost and usage report. Combine tagged resources with entity lifecycle tracking (employees, projects) to identify orphaned resources or projects no longer generating value to the organization and should be decommissioned.

Answer 81

Amazon EC2, Amazon EBS, and Amazon S3 are building block services. Managed services like RDS and DynamoDB are application level services. By selecting the right building blocks and managed services, workload cost can be optimized.

Answer 82

Can pay for compute capacity by the hour with no minimum commitments required.

Answer 83

Savings of up to 75% off On-Demand pricing are available.

Answer 84

Spot instances let you use unused EC2 capacity and get savings of up to 90% off On-Demand pricing. Spot Instances are appropriate when a system can tolerate using a fleet of servers where individual servers can come and go dynamically such as: 1. Stateless web servers 2. Batch processing 3. Using HPC and big data

Answer 85

Plan and monitor data transfer charges so that architectural decisions to minimize costs can be made. Small but effective architectural changes can reduce operational costs drastically over time.

Answer 86

For a workload with balanced spend and performance, make sure everything paid for is used and avoid underutilizing resources. Skewed utilization metric in either direction has an adverse impact on your organization in either: 1. Operational costs through degraded performance from over utilization 2. Wasted AWS expenditures due to over provisioning

Answer 87

Use auto scaling with demand or time-based approaches to add/remove resources as needed. If anticipating demand, can save money and ensure resources match needs. Can also use Amazon API Gateway to implement throttling or Amazon SQS to implement a queue in your workload. These will allow you to modify demand on workload components. Lastly, think about pattern of usage, time to provision new resources, and predictability of the demand pattern. With a correctly sized queue/buffer tuned to the amount of time that workload demands need to be responded in will result in optimal balance between the two.

Answer 88

As new services are released, review existing architectural decisions to make sure they continue to be the most cost effective. Note that workloads can be optimized incrementally while minimizing effort to implement the change. Components in the workload can also be replaced with new services to achieve increases in efficiency. Be aggressive in decommissioning resources, entire services, and systems that you no longer require.

Answer 89

To identify any critical issues that might need addressing or areas that could be improved.

Answer 90

A set of actions to improve the experience of a customer using the workload.

Answer 91

Each team member who build architectures should use this framework to continually review their work instead of holding formal review meetings. This allows team members to update answers as architecture evolves and deliver features.

Answer 92

They should be applied at key milestones in the product lifecycle, early on in the design phase to avoid one-way doors that are difficult to change, and then before the go-live date. After going into production, workload will evolve and the architecture changes and if anything significant comes out of that, follow a set of hygiene processed including a Well-Architected Review

Answer 93

Have a series of informal conversations about their architecture where you can glean answers to most questions. Then follow it up with 1 or 2 meetings to gain clarity or dive deep on ambiguous areas. Items to use: 1. Meeting room with whiteboards 2. Print outs of diagrams or design notes 3. Action list of questions that require out-of-band research to answer

Answer 94

A list of issues prioritized based on business context. The impact of the issues on the day-to-day work of the team should be accounted for - identify opportunities to address issues early and then work on creating business value.

AWS Well-Architected Framework Flashcards

(118 cards)