AWS Well-Architected Framework Flashcards

1
Q

What are the 5 pillars of the well-architected Framework?

A
  1. Operational Excellence
  2. Security
  3. Reliability
  4. Performance Efficiency
  5. Cost Optimization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Well-Architected: What is a “component”?

A

It is the code, configuration, and AWS resources that together deliver against a requirement. A component is a unit of technical ownership and is decoupled from other components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Well-Architected: What is a “workload”?

A

It is a set of components that work together. Usually it is the level of detail that business and tech leaders communicate about.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Well-Architected: What is a “milestone”?

A

Milestones mark key changes in the architecture as it evolves throughout the product lifecycle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Well-Architected: What is “architecture”?

A

Generally it is how components work together in a workload. How they communicate and interact is the focus of architecture diagrams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Well-Architected: What is the “technology portfolio”?

A

It is the collection of workloads required for the business to operate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Well-Architected: Discuss tradeoffs between pillars.

A

Business decisions can drive engineering priorities. May need to lower cost but consequentially lower reliability at times, or may go the other way. However, security and operational-excellence are not traded-off against the other pillars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are risks mitigated when distributing decision making authority?

A

In 2 ways:

  1. Have practices (ways of doing things, process, standards) that enable each team with experts who are put in place to ensure teams raise the bar on the standards that need to be met.
  2. Implement mechanisms to carry automated checks to ensure standards are being met.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a fundamental part of Amazon’s innovation process?

A

Working backward from the customer so that products are built in response to customer needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How did the Well-Architected Framework come about?

A

It is the customer-facing implementation of Amazon’s internal review process where principal engineering thinking across field roles have been codified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the general design principles that come out of the Well-Architected Framework?

A
  1. Stop guessing capacity needs.
  2. Test systems at production scale.
  3. Automate to make architectural experimentation easier.
  4. Allow for architectures that evolve over time as we learn more
  5. Drive architectures using data
  6. Improve through game days to simulate events in production
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Operational Excellence: What are the 5 design principles?

A
  1. Perform operations as code
  2. Make frequent, small, reversible changes
  3. Refine operations procedures frequently
  4. Anticipate failure
  5. Learn from operational failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Operational Excellence: Why would you want to make small and reversible changes frequently?

A

So that we can allow components to be updated regularly and can be reversed if they fail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Operational Excellence: How would you refine operations procedures frequently?

A

Set up game days that simulate production situations and validate that procedures are effective and teams are aware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Operational Excellence: How would you anticipate failure?

A

Perform pre-mortem exercises to identify sources of failures and test these. Also test response procedures to simulated events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Operational Excellence Practice Areas: What are the four best practice areas for operational excellence in the cloud?

A
  1. Organization
  2. Prepare
  3. Operate
  4. Evolve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Operational Excellence Practice Areas: Describe the “Organization” practice area

A
  1. Evaluate customer needs and be aware of guidelines/obligations defined by governance and external factors (compliance requirements).
  2. Evaluate threats to the business and impact of risks between competing interests.
  3. Ensure owners are in place for each application, workload, platform, and infrastructure component. Separate owners for definition and performance.
  4. Recognize business value of each component and define responsibilities of team members to act appropriately.
  5. Encourage experimentation and seek multiple diverse perspectives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Operational Excellence Practice Areas: How do you determine what your priorities are?

A

Everyone needs to understand their part, if goals are shared across the organization priorities can be set better for resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Operational Excellence Practice Areas: How do you structure your organization to support your business outcomes?

A

Teams must understand their part and their roles in success of other teams, other teams roles in their success, and shared goals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Operational Excellence Practice Areas: How does organizational culture support business outcomes?

A

Provide support to the team so that they can be effective taking action and supporting business outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Operational Excellence Practice Areas: How do you prepare for operational excellence?

A

Need to understand workloads and expected behavior to provide insight and build procedures to support them. Develop telemetry necessary to monitor workload health, identify risky outcomes, and enable effective response. Adopt approaches that improve flow of changes into production and enable refactoring. Adopt approaches that provide fast feedback on quality and rapid recovery from undesirable changes. Use a consistent process to know when a workload can go live.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Operational Excellence Practice Areas: How do you design your workload so that you can understand its state?

A

Design it such that all information provided across components give us what we need to understand its internal state and effectively provide responses:

  1. Metrics
  2. Logs
  3. Traces
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Operational Excellence Practice Areas: How do you reduce defects, ease remediation, and improve flow into production?

A

Approaches that enable refactoring, fast feedback on quality and bug fixing are ideal. This enables rapid identification and remediation of issues introduced through deployment activities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Operational Excellence Practice Areas: How do you mitigate deployment risks?

A

Take on approaches with fast feedback on quality and enable rapid recovery from undesirable changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Operational Excellence Practice Areas: How to know if you’re ready to support a workload?

A

Evaluate operational readiness of workload, processes, procedures, and personnel to understand operational risks to workload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Operational Excellence Practice Areas: What are some additional steps to prepare?

A
  1. Implement operations activities as code
  2. Use pre-mortems to anticipate failure
  3. Use resource tags and resource groups with consistent tagging strategy
  4. Tag for organization, cost accounting, access controls, and execution of automated operations activities
  5. Plan what to do with live systems that don’t comply with changes to the go-live checklist
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Operational Excellence Practice Areas: What does operational health include?

A

Both the health of the workload and health of operations activities performed in support of the workload (deployment and incident response). Establish baselines for improvement, investigation, intervention, collect and analyze metrics, then validate understanding of operations success and changes over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Operational Excellence Practice Areas: When to use runbooks and playbooks?

A

Runbooks for well understood events and playbooks to help with investigation and resolution of issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Operational Excellence Practice Areas: How to prioritize responses to events?

A

Start with the business and customer impact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Operational Excellence Practice Areas: What to do when an alert is raised?

A

Make sure an associated process to be executed is in place with an identified owner. The personnel required along with escalation triggers need to be in place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Operational Excellence Practice Areas: How would operational status of workloads be communicated?

A

Through dashboards and notifications tailored to the target audience (customer, business, developers, operations) and manage expectations. Inform them immediately when normal operations resume.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Operational Excellence Practice Areas: How would you generate dashboard views of collected metrics?

A

Can use CloudWatch or third party applications to aggregate and present business, workload, and operations level views of operations activities. Can use X-Ray, CloudWatch, CloudTrail, and VPC Flow Logs to enable identification of workload issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Operational Excellence Practice Areas: How do you understand health of your workload?

A

Define, capture, analyze workload metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Operational Excellence Practice Areas: How do you understand health of operations?

A

Define, capture, analyze operations metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Operational Excellence Practice Areas: How do you manage workload and operations events?

A

Prepare and validate procedures for responding to events to minimize disruption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Operational Excellence Practice Areas: How would you evolve your workloads?

A

Learn, share, improve to sustain excellence. Dedicate work cycles to make continuous improvements. Perform post-incident analysis of customer impacting events and identify contributing factors along with preventative action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Operational Excellence Practice Areas: How to get log data to properly analyze workload performance?

A

Can export data into S3 and use Glue to prepare data for analytics with metadata stored in the catalog. Athena can be used to analyze log data using SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Operational Excellence Practice Areas: How do you evolve operations?

A

Dedicate time and resources for continuous incremental improvement to evolve effectiveness and efficiency of operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Security: What are the main design principles?

A
  1. Implement a strong identity foundation.
  2. Enable traceability
  3. Apply security at all layers
  4. Automate security best practices
  5. Protect data in transit and at rest
  6. Keep people away from manually processing data
  7. Prepare for security events
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Security Practice Areas: How do you securely operate your workload?

A
  1. Apply best practices to every area of security
  2. Take organizational and workload level requirements and processes and apply them to all areas
  3. Stay up to date on industry sources, threat intelligence to evolve threat model and control objectives
  4. Automate security processes, testing, and validation to scale security operations
  5. Segregate workloads by account based on function and compliance/data sensitivity requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Security Practice Areas: How do you manage identities for people and machines?

A

Need two types of identities when operating secure AWS workloads:

  1. Human identities: admins/devs/operators/users to access environments and applications.
  2. Machine identities: service applications, workloads etc. to make requests to read data. EC2 instances, Lambda functions or external parties who need access.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Security Practice Areas: How to manage permissions for people and machines?

A

User access should be granted using a least-privilege approach with password requirements and MFA enforced. Programmatic access only performed using temp and limited-privilege credentials issued by AWS security token service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Security Practice Areas: How do you detect and investigate security events?

A

Pprocess logs, events and monitor for auditing, automated analysis and alarming. CloudTrail, CloudWatch provide monitoring of metrics while Config provides configuration history. GuardDuty continuously monitors for malicious activity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Security Practice Areas: What are the different types of defective controls?

A
  1. Conduct inventory of assets and detailed attributes for better decision making and lifecycle controls and establish operational baselines
  2. Use internal auditing, examination of controls related to information systems to ensure that practices meet policies and requirements.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Security Practice Areas: How would you make data handling more predictable and reliable?

A

Define a data-retention lifecycle or define where data will be preserved, archived, or deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Security Practice Areas: How do you protect your network resources?

A

Any workload that has some form of network connectivity requires multiple layers of defense to protect from external and internal network-based threats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Security Practice Areas: How do you protect your compute resources?

A

Compute resources (EC2 instances, containers, Lambda functions, etc) need multiple layers of defense to help protect from external and internal threats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Security Practice Areas: What are some examples of protecting network resources?

A
  1. Implement stateful and stateless packet inspection
  2. Use Amazon Virtual Private Cloud to create private, secured, scalable environment
  3. Define topology within that environment with gateways, routing tables, and public/private subnets
49
Q

Security Practice Areas: What are some examples of control methodologies for defense in depth for infrastructure protection?

A
  1. Enforce boundary protecting
  2. Monitor points of ingress/egress
  3. Comprehensive logging, monitoring, alerting
50
Q

Security Practice Areas: How to harden configuration in a secure and scalable manner?

A

Can tailor or harden configuration of an Amazon Elastic Compute Cloud, Elastic Container Service container, or Elastic Beanstalk instance and persist this configuration to an immutable Amazon Machine Image (AMI). All new virtual servers can be launched with this AMI to receive this hardened configuration.

51
Q

Security Practice Areas: How would you protect data at a high level?

A

Categorize organizational data based on levels of sensitivity (data classification) and encrypt data to make it unreadable to anyone unauthorized.

52
Q

Security Practice Areas: What practices facilitate protection of data?

A
  1. As an AWS customer, maintain full control over data
  2. Rotate keys regularly to encrypt data
  3. Log file access and change data
  4. Choose storage systems like S3 Standard, S3 Standard-IA, S3 Zone-IA to provide 99.99% durability
  5. Implement versioning to protect against accidental overwrites
  6. AWS never initiates movement of data between Regions
53
Q

Security Practice Areas: How do you classify your data?

A

Categorize based on criticality and sensitivity to help determine protecting and retention controls

54
Q

Security Practice Areas: How to protect data at rest?

A

Implement multiple controls to reduce risk of unauthorized access and can use server-side encryption to store data in an encrypted form.

55
Q

Security Practice Areas: How to protect data in transit?

A

Can arrange for HTTPS encryption/decryption handled by Elastic Load Balancing if needed

56
Q

Security Practice Areas: What practices facilitate effective incident response?

A
  1. Detailed logging with file access changes
  2. Events to be processed and trigger tools that automate responses through use of AWS APIs
  3. Provision tooling ahead of time and a “clean room” using AWS cloud formation to carry out forensics in a safe, isolated environment
57
Q

Security Practice Areas: How do you anticipate, respond to, and recover from incidents?

A

Preparation is key to minimize disruption. Make sure that there’s a way to quickly grant access for security team and automated isolate of instances and capturing data/state for the organization.

58
Q

Reliability: What are the main Reliability design principles?

A
  1. Automatically recover from failure by trigger automation when threshold is breached.
  2. Test recovery procedures and automate different failure scenarios
  3. Scale horizontally to increase aggregate workload availability with smaller resources that don’t share a common point of failure
  4. Stop guessing capacity and instead automate addition/removal by monitoring demand and utilization
  5. Change infrastructure using automation that can be tracked and reviewed
59
Q

Reliability Practice Areas: What are the four best practice areas for reliability?

A
  1. Foundations
  2. Workload Architecture
  3. Change Management
  4. Failure Management
60
Q

Reliability Practice Areas: What are the foundational requirements?

A

They are the requirements where the scope extends beyond a single workload or project. One example is having sufficient network bandwidth to the data center.

61
Q

Reliability Practice Areas: How do you manage service quotas and constraints?

A

Quotas/limits exist to prevent accidentally provisioning more resources than necessary and to limit request rates on API operations to protect services from abuse.

62
Q

Reliability Practice Areas: How do you plan network topology?

A

Workloads exist in multiple environments and they could be cloud environments that are a mix of being publicly or privately accessible. Plans must include network considerations such as intra and inter-system connectivity, public IP address management, private IP address management, and domain name resolution.

63
Q

Reliability Practice Areas: How do you design your workload service architecture?

A

Use a service-oriented architecture (SOA) or a microservices architecture. SOA is the practice of making software components reusable via service interfaces. Micro services goes further to make components smaller and simpler.

64
Q

Reliability Practice Areas: How do you design interactions in a distributed system to prevent failures?

A

Distributed systems rely on communication networks to interconnect components like servers and services. Workloads must operate reliably despite data loss or latency in these networks. Components of the distributed system need to operate in a way that doesn’t impact other components or the workload itself. This ensures better MTBF (mean time between failures).

65
Q

Reliability Practice Areas: What is a metric used to gauge ability to mitigate or withstand failures?

A

MTTR - mean time to recovery

66
Q

Reliability Practice Areas: How do you monitor workload resources?

A

Logs/metrics can be configured to send notifications when thresholds or crossed or significant events occur. Can recognize low-performance thresholds and even recover automatically in response.

67
Q

Reliability Practice Areas: How do you design workload to adapt to changes in demand?

A

Scalable workloads provide elasticity to add/remove resources automatically to match current demand at any given point in time.

68
Q

Reliability Practice Areas: How to implement change?

A

Controlled changes necessary to deploy new functionality and to ensure workloads and running known software that can be patched/replaced.

69
Q

Reliability Practice Areas: How do you back up data?

A

Back up data, applications, and configuration to meet requirement for recovery time objectives and recovery point objectives.

70
Q

Reliability Practice Areas: How to use fault isolation to protect your workload?

A

Fault isolated boundaries limit the effect of a failure within a workload to a limited number of components. Components outside of this boundary are unaffected by this failure and so impact on workload can be limited.

71
Q

Reliability Practice Areas: How to design workloads to withstand component failures?

A

Workloads with requirement for high availability and low mean time to recovery (MTTR) must be architected for resiliency

72
Q

Reliability Practice Areas: How do you test reliability?

A

After designing workloads to be resilient to stresses of production, testing is essential to ensure it will operate as designed with the resiliency expected.

73
Q

Reliability Practice Areas: How to plan for disaster recovery?

A

Having backups and redundant workload components in place is the start of DR strategy. RTO (recovery time objective) and RPO (recovery point objective) are used to restore workloads and are set based on business needs. Incorporate the probability of disruption and cost of recovery to help inform business value of providing disaster recovery.

74
Q

Performance Efficiency: What are the design principles of Performance Efficiency?

A
  1. Democratize advanced technologies
  2. Go global in minutes
  3. Use serverless architectures
  4. Experiment often
  5. Consider mechanical sympathy
75
Q

Performance Efficiency Practice Areas: What are the 4 best practice areas for performance efficiency?

A
  1. Selection
  2. Review
  3. Monitoring
  4. Tradeoffs
76
Q

Performance Efficiency Practice Areas: How do you select the best performing architecture?

A

Multiple solutions and features are used for optimal performance. Use a data-driven approach to select patterns and implementation.

77
Q

Performance Efficiency Practice Areas: What are the 3 forms that compute is available in?

A
  1. Instances
  2. Containers
  3. Functions
78
Q

Performance Efficiency Practice Areas: What are instances?

A

Virtualized servers that can have capabilities changed with a button or API call. These capabilities range from varying SSDs to multiple GPUs.

79
Q

Performance Efficiency Practice Areas: What are containers?

A

They are a method of operating system virtualization to run an application and dependencies in resource-isolated processes. Fargate is a serverless compute for containers or EC2 can be used to keep control over installation/configuration/management of compute environment.

80
Q

Performance Efficiency Practice Areas: What are some container orchestration platforms?

A

Either the Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS) can be used.

81
Q

Performance Efficiency Practice Areas: What are functions?

A

They abstract the execution environment form the code that we want to execute. Lambda allows us to run code without running an instance.

82
Q

Performance Efficiency Practice Areas: How to select compute solution?

A

It varies based on architectural design, usage patterns, and configuration settings. It can use different compute solutions for various components and have different features enabled for performance.

83
Q

Performance Efficiency Practice Areas: What are the different types of storage available?

A
  1. Object Storage
  2. Block Storage
  3. File Storage
84
Q

Performance Efficiency Practice Areas: What is object storage?

A

S3 is an example that makes data accessible from any internet location and is designed for durability.

85
Q

Performance Efficiency Practice Areas: What is block storage?

A

Block storage provides highly available, consistent, low-latency block storage for each virtual host and is analogous to direct-attached (DAS) or Storage Area Network (SAN). Amazon Elastic Block Store (EBS) is designed for workload that require persistent storage accessible by EC2 instances that help you tune applications with the right storage capacity, performance and cost.

86
Q

Performance Efficiency Practice Areas: What is File Storage?

A

This provides access to a shared file system across systems. Amazon Elastic File System (EFS) is ideal to store large content repositories, development environments, media stores, or user home directories. Amazon FSx lets you launch popular file systems to use rich feature sets and fast performance of open source and commercially-licensed file sytems.

87
Q

Performance Efficiency Practice Areas: How do you select your storage solution?

A

It varies based on the kind of access method (file/block/object), patterns of access (random/sequential), required throughput, frequency of access (online/offline/archival), frequency of update (WORM/dynamic), and availability/durability constraints.

88
Q

Performance Efficiency Practice Areas: What are the types of database engines available?

A
  1. Relational
  2. Key-value
  3. Document
  4. In-memory
  5. Graph
  6. Time-series
  7. Ledger
89
Q

Performance Efficiency Practice Areas: How do you select your database solution?

A

Balance availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Can use different databases for various subsystems with different features enabled to improve performance.

90
Q

Performance Efficiency Practice Areas: How do you configure your networking solution?

A

Networking solutions are configured for workloads. Optimal solution varies based on latency, throughput requirements, jitter, and bandwidth.

In addition, need to validate physical constraints (user/on-prem resources) and offset with edge locations or resource placement.

91
Q

Performance Efficiency Practice Areas: What are some products offered by AWS for networking?

A

Product features like Enhanced Networking, EBS-optimized instances, S3 transfer acceleration, and dynamic Amazon CloudFront are available to optimize network traffic.

Also networking features like Route 53 latency routing, VPC endpoints, Direct Connect, and Global Accelerator to reduce network distance or jitter.

92
Q

Performance Efficiency Practice Areas: How can you improve networking performance?

A

Take advantage of Regions, placement groups, and edge services.

93
Q

Performance Efficiency Practice Areas: How do you evolve workload to take advantage of new releases?

A

Choose from finite options to begin with but keep an eye on what technologies and approaches become available over time.

94
Q

Performance Efficiency Practice Areas: How would you evaluate a poorly performing architecture?

A

Implement a performance review process and aply Deming’s play-do-check-act (PDCA) cycle to drive iterative improvement.

95
Q

Performance Efficiency Practice Areas: How do you monitor resources to ensure they are performing?

A

Can use CloudWatch to monitor workload, respond to system-wide performance changes, optimize resource utilization and get a unified view of operational health.

X-Ray can help developers analyze and debug production, distributed applications by identifying root causes and performance bottlenecks.

On a more general note, implement automated triggers to minimize false positives. Plan for game days with simulations in the production environment.

96
Q

Performance Efficiency Practice Areas: How do you use tradeoffs to improve performance?

A

Trade consistency, durability, space for time and latency. Can go global in minutes and even add readonly replicas to information stores to reduce load on primary database for example.

To further evaluate tradeoffs, use a systematic approach (like load testing) to explore its value.

97
Q

Cost Optimization: What are the five design principles for cost optimization?

A
  1. Implement Cloud Financial Management
  2. Adopt a consumption model and only pay for what you use
  3. Measure overall efficiency with business output and associated tech costs
  4. Stop spending money on undifferentiated heavy lifting like racking/stacking/powering servers.
  5. Analyze and attribute expenditure
98
Q

Cost Optimization Practice Areas: What are the five best practice areas for cost optimization?

A
  1. Practice cloud financial management
  2. Expenditure and usage awareness
  3. Cost-effective resources
  4. Manage demand and supply resources
  5. Optimize over time
99
Q

Cost Optimization Practice Areas: Explain the tradeoff between optimizing for speed-to-market and cost.

A

Sometimes best to just go to market quickly with new features and meeting deadlines rather than investing in up-front cost optimization. Temptation exists to overcompensate “just in case” and this might lead to over-provisioned and under-optimized deployments. This may only make sense when lifting and shifting resources from on-prem environment to the cloud.

100
Q

Cost Optimization Practice Areas: How do you implement cloud financial management?

A

Can use Cost Explorer with Athena and the Cost and Usage report. Supplement the team with experts in cost optimization and include people with supplementary skill sets in analytics and project management.

Finally, try to improve on existing programs and processes instead of trying to build new ones from scratch. Leads to achieving outcomes much faster.

101
Q

Cost Optimization Practice Areas: How should expenditures be approached?

A

After creating an account structure with AWS Organizations or AWS Control Tower, can use resource tagging to apply business and organization information to usage and cost.

Use Cost explorer for visibility into costs and control it by notifications in AWS budgets and controls with AWS Identity, IAM, and Service quotas.

102
Q

Cost Optimization Practice Areas: How do you govern usage?

A

Establish policies/mechanisms to ensure appropricate costs are incurred while objectives are achieved. Employ a checks-and-balances approach to innovate without overspending.

103
Q

Cost Optimization Practice Areas: How do you decommission resources?

A

Implement change control and resource management from project inception to end of life. This makes sure that we shut down or terminate unused resources to reduce waste.

104
Q

Cost Optimization Practice Areas: How are cost allocation tags to be used?

A

These tags can be used to categorize and track AWS costs. When applying tags to resources such as EC2 instances or S3 buckets, AWS generates a cost and usage report with the usage and tags. The tags can be applied to represent organization categories (cost centers, workload names, owners) to organize costs across multiple services.

For high level insights and trends, use daily granularity. For deeper analysis, move to hourly granularity with the cost and usage report.

Combine tagged resources with entity lifecycle tracking (employees, projects) to identify orphaned resources or projects no longer generating value to the organization and should be decommissioned.

105
Q

Cost Optimization Practice Areas: How do you evaluate cost when selecting services?

A

Amazon EC2, Amazon EBS, and Amazon S3 are building block services. Managed services like RDS and DynamoDB are application level services.

By selecting the right building blocks and managed services, workload cost can be optimized.

106
Q

Cost Optimization Practice Areas: What do On-Demand instances let you do?

A

Can pay for compute capacity by the hour with no minimum commitments required.

107
Q

Cost Optimization Practice Areas: What do Savings Plans and Resreved Instances provide?

A

Savings of up to 75% off On-Demand pricing are available.

108
Q

Cost Optimization Practice Areas: When are spot instances useful?

A

Spot instances let you use unused EC2 capacity and get savings of up to 90% off On-Demand pricing. Spot Instances are appropriate when a system can tolerate using a fleet of servers where individual servers can come and go dynamically such as:

  1. Stateless web servers
  2. Batch processing
  3. Using HPC and big data
109
Q

Cost Optimization Practice Areas: How do you plan for data transfer charges?

A

Plan and monitor data transfer charges so that architectural decisions to minimize costs can be made. Small but effective architectural changes can reduce operational costs drastically over time.

110
Q

Cost Optimization Practice Areas: How do you manage demand and supply resources?

A

For a workload with balanced spend and performance, make sure everything paid for is used and avoid underutilizing resources. Skewed utilization metric in either direction has an adverse impact on your organization in either:

  1. Operational costs through degraded performance from over utilization
  2. Wasted AWS expenditures due to over provisioning
111
Q

Cost Optimization Practice Areas: How can you automatically provision resources?

A

Use auto scaling with demand or time-based approaches to add/remove resources as needed. If anticipating demand, can save money and ensure resources match needs.

Can also use Amazon API Gateway to implement throttling or Amazon SQS to implement a queue in your workload. These will allow you to modify demand on workload components.

Lastly, think about pattern of usage, time to provision new resources, and predictability of the demand pattern. With a correctly sized queue/buffer tuned to the amount of time that workload demands need to be responded in will result in optimal balance between the two.

112
Q

Cost Optimization Practice Areas: How do you evaluate new services?

A

As new services are released, review existing architectural decisions to make sure they continue to be the most cost effective. Note that workloads can be optimized incrementally while minimizing effort to implement the change.

Components in the workload can also be replaced with new services to achieve increases in efficiency. Be aggressive in decommissioning resources, entire services, and systems that you no longer require.

113
Q

The Review Process: What is the purpose of reviewing an architecture?

A

To identify any critical issues that might need addressing or areas that could be improved.

114
Q

The Review Process: What is the outcome of the review?

A

A set of actions to improve the experience of a customer using the workload.

115
Q

The Review Process: What is the continuous approach?

A

Each team member who build architectures should use this framework to continually review their work instead of holding formal review meetings. This allows team members to update answers as architecture evolves and deliver features.

116
Q

The Review Process: When should reviews be applied?

A

They should be applied at key milestones in the product lifecycle, early on in the design phase to avoid one-way doors that are difficult to change, and then before the go-live date.

After going into production, workload will evolve and the architecture changes and if anything significant comes out of that, follow a set of hygiene processed including a Well-Architected Review

117
Q

The Review Process: What is an effective approach to review another team’s workload?

A

Have a series of informal conversations about their architecture where you can glean answers to most questions. Then follow it up with 1 or 2 meetings to gain clarity or dive deep on ambiguous areas.

Items to use:

  1. Meeting room with whiteboards
  2. Print outs of diagrams or design notes
  3. Action list of questions that require out-of-band research to answer
118
Q

The Review Process: What should be an output of the review?

A

A list of issues prioritized based on business context. The impact of the issues on the day-to-day work of the team should be accounted for - identify opportunities to address issues early and then work on creating business value.