AWS Solutions Architect Certification Flashcards
What is data durability and what is S3’s rating
Chance of data loss, and 9 11s (99.9s %)
What is block storage
Range of bytes/bits on disk where storage files are divided into blocks.
Each block receives a unique identifier and written to disk efficiently
Can be spread across multiple disks or environments
Object storage vs file storage
Object storage is flat structure where the data (object) is located in a single repository (bucket)
Prefixes and delimiters allow you to group similar items to visually organize and retrieve your data giving the appearance of files.
File storage is how an OS stores data in a hierarchical fs. Need to know the exact path and location of the files.
What are the components of object data
The data, metadata (size, dates, file types), attributes (permissions), unique id
How many buckets are allowed per AWS account
100
What are some of the limitations of block storage (buckets)
Cannot be transferred to other accounts
Need objects to make globally unique names
Cannot change the name of after creation
Only can remove buckets when they’re empty
Can create as many objects in the bucket as you want
Bucket naming convention
my-s3-bucket.s3.amazonaws.com
bucket-name.s3.amazonaws.com
What are bucket/object tags used for
Help to track storage costs, can help with finer grained access control, can use CloudWatch to setup metrics for specific tags
Use of S3 for public, static websites
Can host static web content, enable static hosting, set public read permissions, provide index.html file
S3 Path style Urls
Virtual hosted: bucket-name.s3.Region.amazonaws.com/key-name
Path Style:
s3.Region.amazonaws.com/bucket-name/key-name
S3 Virtual hosted style
http:/bucket-name.s3.region.amazonaws.com/key-name
Can make the bucket name the same as your registered domain name and make that nam a DNS alias for AWS S3
S3 Consistency
After uploading/overwriting new object, read requests return new object immediately
S3 Object Versioning
Keeping multiple versions of an object in the same bucket.
When enabled, even if an object is overwritten, older versions will remain.
If object is deleted, can still retain prior versions
S3 Transfer Accelerator
Online
Fast file transfer over long distances leveraging CloudFront globally distributed edge locations over an optimized network path
Kinesis data firehouse (Data transfer)
Online
Captures and automatically loads streaming data into S3, Redshift and can get real time analytics
Kinesis data stream (Data transfer)
Online
Can emit to various AWS services. EMR, Redshift, Lambda, S3, etc.
Snowcone (Data transfer)
Offline
Smallest edge storage transfer device. 8TB. Can transfer offline or online with Datasync
DataSync (Data transfer)
Can transfer hundreds of TB at speeds 10x open source tools from On prep to cloud
Snowball (Data transfer)
Offline
block and object storage with 40vCPUs. Used for data collection, ML, storage in remote or bad network connectivity
Snowmobile (Data transfer)
Offline
Extremely large amounts of data to AWS. 100 PB per snowmobile.
Direct Connect (Data transfer)
Hybrid Offline/Online
Dedicated network connection that bypasses the internet from onprem data centers to S3
Storage Gateway (Data transfer)
Hybrid Online/Offline
Store on prep data on an S3 bucket
Bucket policies (Securing data)
Permissions for all or a subject of objects using tags and prefixes
Presigned Urls (Securing data)
Grant limited access to others with temporary urls
Block public access (Securing data)
Default configuration for S3 buckets
Resource-based Policies (Securing data)
Use if granting cross-account access permissions to other AWS accounts.
Also if user IAM polices reach their size limits or you prefer to keep access policies in S3
Will require a principal
User policy (Securing data)
Use if you prefer to keep access control policies in IAM
Or if you have numerous S3 buckets with different permission requirements
Encrypt data at rest (S3) server side encryption
S3 encrypts objects before saving to disk and decrypts the data when downloaded
Can allow S3 to create/encrypt keys, can provide CMK and store in KMS, or can manage the key yourself
Client side encryption (S3)
Encrypt data before storing in S3 and manage master encryption keys yourself
Data Lakes in S3
Centralized repository to store structured and unstructured data, and S3 is a good selection for its near limitless storage capacity
Used to decouple compute from storage as well as ML and analytics workloads
Can use Athena, Redshift, Rekognition, and Glue
Data cataloguing S3
Query-able interface all assets stored in S3 and provides single source of contents
Can input data into S3, extract metadata with Lambda and store in DynamoDB and query with elastic search
AWS Glue
Fully managed ETL pipelines
Athena (Querying S3 data)
Analyze data directly in S3 using SQL.
Serverless so only pay for volume of data you run queries on.
Best used for light data discovery
Redshift Spectrum
Run queries against S3. Should be used for complex queries with many concurrent users.
Intelligent Tiering (S3)
Monitors data access patterns and transfers data into more appropriate storage class to save on costs.
Works best when data has an unknown access pattern
Life cycle config (S3)
Set of rules defining actions to apply to a group of objects. Can tell S3 to transition objects to less expensive storage classes. Can automate the move to archival tiers, but will need to manually move them back
S3 General purpose tier
Default storage option that provides high performance for frequently accessed data.
High throughput and low latency. SSL encryption at rest, and no minimums or storage duration
Price is the highest, but the more you store the cheaper per GB the fees become
S3 Standard-Infrequent Access
Less frequent access, but rapid access when you need it. Idea for long-term storage, backups, datastore, DR files, etc.
Less expensive than general purpose for storage, but more expensive for access
S3 One-Zone IA
Less available, less resilient, only available in one AZ
Less frequently accessed data and less resilient
Good for backups of onprem data where it can be recovered if necessary.
Cheaper
Glacier Instant vs Flexible retrieval
Instant (rarely accessed but need it instantly, ex. medical images)
Flexible 1-5min, 3-5 hrs, 12 hours for bulk, rarely accessed and only need 1-2 times per year. Good for Offset storage needs.
Data minimum duration of 90 days
Glacier Deep Archive
Lowest storage cost option.
Minimum of 180 days
12-48 hr retrieval times
Factors of S3 pricing
Storage: Amount/size of objects, intelligent tiering, moving from tier class to the other
Requests/data retrieval: Pay per requests made against objects in bucket
Data transfer: Except to EC2, from the internet, or CF
Storage management features: S3 inventory, analytics, object tagging
S3 Server Access Logs
Free
Provides details about requests made to a bucket, requester, bucket name, status, environment
Can help to learn about customer base with certain access patterns, and understanding of bill
Can use Athena to analyze logs
Not guaranteed log delivery
S3 Cloudtrail logging
Fee
Captures subset of API calls for S3 events
Determine which requests were made to S3 from which IP address, time and additional details
Can use Athena to analyze logs
Can use CloudWatch to monitor Cloudtrail events and invoke AWS lambdas and SNS notifications
Guaranteed log delivery and more structured
S3 Events
New object created, object removed, restore object, reduced redundancy storage, replication
Can publish events to SNS, SQS, Lambda
S3 Batch operations
Can perform a single API action on a list of objects:
Put object tagging
Restore requests to Glacier
Copies of objects
Invoke lambdas
Can use with S3 inventory, manage with labels and tags, monitor/troubleshoot the jobs with Cloudtrail
Cloudwatch
Monitoring service that collects logs, events and metrics
Can create custom dashboards and set alarms for certain thresholds
Metrics can have multiple dimensions (bucket name, filtered, storage type, etc). Dimensions are used to drill down on certain metrics
AWS Config (S3 auditing)
Check that you have recommended settings enabled in S3 account.
Can receive notifications via SNS or remediate issues with Lambdas
Can check if Logging is enabled for buckets, checks for publics access, checks if buckets require SSL
AWS IAM Access Analyzer (S3 auditing)
Receive alerts for questionable access to S3 buckets and can remediate issues by removing access
Identify resources in your account/org that is shared with external identities
Alerts you if buckets configured to allow access to internet
Need to create an analyzer in each bucket region
AWS Trusted Advisor (S3 auditing)
Ensure account follows best practices for security, performance, fault tolerance, service limits, and cost optimizations
Can asses bucket permissions, bucket logging, bucket versioning
Four pillars of S3 cost optimization
App requirements: Understand data access patterns or archival needs.
Data organization: Use prefixes or tags to organize all data to help manage it
Understand, analyze, optimize: Setup monitors to help manage costs proactively and defensively
Continuous right sizing: Know the correct storage classes
Predictable workload tooling
Storage class analysis, observes data access patterns over time
AWS resources to manage/monitor costs
QS dashboards that can use ML insights
CloudWatch that can provide actionable insights to monitor application performance changes
AWS budgets to track costs and usage
Cost and usage reports
S3 Lifecycle policy
Rule that moves objects between storage classes based on create dates
Only supports transitions from one more frequently accessed tier to one less frequently accessed
Works for versioned and unversioned buckets
CT doesn’t support lifecycle actions
S3 Mutlipart uploads
For object greater than 100 MB, or transfer over a spotty network
Can remove incomplete uploads
File, Volume and Tape Gateways (S3)
Connect to cloud to store app data files in S3, cloud backed storage volumes of onprem apps, and physical tape on prep to virtual tapes in AWS
S3 Storage Lens
Provides org wide visibility into object storage and activity trends
Can drill down into account, region, storage class, bucket, and prefixes. Can create custom dashboards
AWS budgets
Tracks and takes action on AWS costs and usage
Can set custom budget alerts
Set budgets on a recurring bases to continue to be notified
S3 Performance metrics
network throughput, CPU, DRAM, DNS lookup time, latency, data transfer speeds with HTTP analysis tools
S3 performance tools
Prefixes: Scale for high request rates
Parallelization: Maximize bandwidth by scaling connections horizontally
S3 select: Optimize data retrieval operations by querying subset of objects. Works with JSON, CSV, and Parquet
Timeout/retries: Recover from continuous failures
Use Cloudfront for frequently accessed content
Transfer accelerator: transferring data across vast geographic locations
Cloudwatch: monitor performance
Bucket partitioning with prefixes
Naming scheme:
bucket-name/prefix/object
The bucket prefix allows objects to be stored on separate partitions, so can increase the transactions per second
Parallelization (S3)
Horizontally scale parallel requests to S3 service endpoints for better performance
Can break up object data into multiple parts
Requires an upload id and part #
Can pause between uploads and don’t need to finish uploading the entire object
S3 Connection delays and failures
Use AWS SDK for connection timeouts, retries, backoffs (exponential or random increase in request wait times)
CloudFront
CDN to ensure low latency and higher transfer speeds
Caches copies of content, for everything not cached it will maintain open connection with origin server
DNS routes client requests to closest CDN
Can restrict access to buckets to only be accessible by CF
Can use HTTPS
Implement georestrictions
WAF to prevent certain traffic based on IPs
AWS Organizations
Account management service where you can programatically create accounts and consolidate accounts within the org
Root –> OU –> Account
Policies can impact all child nodes and leaves when assigned
Service Control Policies
Will apply to all AWS users, groups, and roles within the account including the root identity
CloudFront Use cases
Increase points of presence
Static asset caching Live, on demand video streaming Security and DDos protection Dynamic and customized content API acceleration Software distribution Can help with app availability since content can be cached and origin servers can go down
S3 ACLs (Access Control Lists)
Can be applied to every object and object stored in S3 and grand additional permissions beyond those specified in IAM or bucket policies. Can be used to grant access to another AWS user or predefined groups like the public.
If you have read permissions on the bucket, but not the actual objects, then you can only list contents of the bucket
S3 bucket permissions vs object permission
Can be set separately, If you want smoke to be able to view list of files in a bucket and be able To view/download them, you must gran them permission on the bucket itself as well as each object
S3 IAM user policy
Great way to apply very limited permissions to an IAM role. Good if you want the user to have a policy that applies to multiple buckets
For example, a role used for DB backups should only be able to create objects and not view/delete them
S3 data consistency
Creation and updates to individual objects in S3 are atomic, you’ll never upload a new object or change an existing object and see only half of the change
New objects are seen instantly (read-after-write consistency)
Updates to objects only guarantee eventual consistency. Thus it’s best to treat objects as immutable.
S3 gotchas
S3 sits outside the VPC and can be access from anywhere if bucket polices are not set to deny it.
Incomplete multipart upload costs accrue storage costs event if it fails to fully upload. Create lifecycle policies to clean up these incomplete uploads
EBS
High performance block storage designed for use with EC2
Can handle relational/nonrelational DBs, containerized apps, fs, media workflows, and big data analytic workloads
All volumes are replicated within an AZ and can easily scale to PB of data
Can use snapshots with automated lifecycle policies to back up to S3
What is block storage
Raw storage where hardware is presented as disk or volume, and can be attached to compute system for use
Storage is formatted in predefined continuous segments called blocks and are the basic fixed storage unites to store data
Can be on HDD, SSDs, or NVMe
Application shares data management with OS
What is file storage
Built on top of block storage, that serves as a file share or file server
Created using an OS that formats and manages the reading and writing of data to the block storage. Data is stored in a directory tree hierarchy
SMB and NFS are the most common storage protocols
OS manages the storage protocol and the operation of the file system and differentiates based on types of data
What is object storage
Built on top of block and created using an OS that formats and manages the reading and writing of data to the block storage device,
Object storage does not differentiate between types of data, and type becomes part of object metadata.
One object storage system could use binary objects of size 128MB, so smaller files or data are stored at a binary level within the object, and data larger than that are stored by spreading the data across multiple objects
How does file storage work in the cloud
Instead of storing files locally and managing access on. a NFs server, you store files in cloud resources by a managed file service like EFS
Customer manages access control for who has access to FS via network controls (SGs/NACLs), access points, and IAM policies
Comm between clients are storage is handled by special NFS protocol
The cloud service provides a common DNS namespace for clients to connect to the shared file system and those with appropriate permissions access the FS by connecting to their attached mount points and file systems appear as local volumes to the client.
What are the file storage services
EFS, FSx for Lustre, FSx for Windows FS, Fsx on NETApp ONTAP, FSx for OpenZFS
EFS
Scalable, elastic, fully managed and supports NFS. Capacity is dynamic without any intervention
Can be shared with up to 10k+ concurrent clients.
Only pay for the storage that you use
FSx for Lustre
Parallel FS for high performing workloads. Need to select the specific performance and capacity parameters suited for you app needs
File storage performance
Latency: Amount of time between making requests to storage system and receiving a response.
EFS standard offers 1 to 2.4 ms
IOPS: General purpose FS offers 35k read and 7k write. Max I/O offers over 500k
Throughput: Measuring the performance of reading and writing large sequential data files measured in MBs/second. EFS offers rates up to 10GB/sec
Has higher latency but better durability and availability than EBS
How many AZs are all files and directories redundantly stores with EFS
- A write isn’t acknowledged until data is written to all 3.
EFS security
POSIX permissions: user and group level permissions to control client access permissions to your fs
SGs: Restrict access over network with VPC SFs. Determines which IPs have network visibility to EFS endpoint.
IAM: Control both the creation and administration of the EFS FS
KMS: Encrypt data at rest and turn on TLS when you mount the fs
EFS Cost optimization
Standard and Standard IA and OneZone IA
Lifecycle management will move files based on access frequency
Much more expensive than EBS
Can an EFS be mounted on prep?
Yes, using Direct connect
Lambda and EFS
Can set an EFS fs as the local mount path directly within the Lambda service console.
EFS use cases
Big data and analytics: Shared files access to data scientists using genomics software running on EKS cluster
Web serving content management: Serve files to web apps quickly and scalable way to meet demand
App testing and development: Shared storage repository to share code and other files in a secure and organized way
DB backups: NFS files system is the preferred backup repository for many common database backup apps, like Oracle, SAP
Container storage: Providing persistent shared access across common file repository
Placement groups
The amount of data center capacity to provide all the computational resources to an AZ. Can be multiple data centers
For high performance needs, with extremely low latencies, AWS offers the availability to provision compute not just within same availability zone, but same placement grouping, so basically the same data center (hardware).
FSx for Windows File Server
Simplifies the setup, provisioning and maintenance of Windows workloads
SSD and HDD storage. Provides up to 64TB per file system. Throughput up to 3GB
Provides backups and replication across multiple AZs
Migration of files from Windows servers to AWS, accelerating the adoption of migration through the use of a hybrid files system with low latency
Identity-based auth through Microsoft Active Directory
Factors that affect pricing for FSx for WFS
Deployment type (Multi or single AZ)
Storage type (HDD or SSD)
Storage capacity (Priced per GB-month)
Throughput capacity (priced per MBps-month)
What are the components of the FSx for WFS
Windows file server (with DNS address) and storage volumes
What is an elastic network interface
A resource that allows client compute instances, where in AWS or on prep, to connect to FS
What is AWS Direct Connect and Client VPN
Direct Connect: Service that enables you to access FS over a dedicated network connection from on-prem environment
Client VPN: Access FS from on-prem using secure and private tunnel
AWS Managed AD with FSx for WFS
Setup and run Active Directories in the cloud
Deploy each directory across multiple AZ and AWS handles the integration of the two services
Can also keep self-managed AD on prem and integrated with FSx in the cloud
Network interface level access control with FSx and WFS
Can control which resources in VPC can access FSx with SGs with inbound and outbound rules.
Will need to allow outbound traffic to connect to AD
AWS RDS
Web services that sets up, operates and scales relational dbs
Handles updates and backups
Supported drivers include Postgres, mysql, mariadb, oracle, and sql server
AWS Aurora
Managed db service only compatible with mysql and Postgres drivers
AWS DynamoDb
Managed key-value, non-relational db service that provides fast and predictable performance
You can create db tables that store and retrieve data and serve any level of request traffic
Can scale up or scale down your tables’ throughput capacity without downtime or degradation
Uses a partition key to allocate data to different nodes. Can have an optional sort key to store related attributes in a sorted order to be queried as a collection
Also has a PK and can use global and local indexes to speed up performance
AWS DocumentDB
MongoDB workloads at scale with separate storage and compute that can be scaled independently
Table -> Collection Row -> Document Column -> Field PK -> Object ID Nested table -> Embedded document
AWS Elasticache
Improves performance by retrieving data from high throughput and low latency in-memory data stores
Provides access to data across replicated nodes
Popular choice for gaming, FS, healthcare and IOT
Memcached and Redis cache engines differ based on backup and replication, automatic failover
Redis supports complex data types, data replication and data availability.
AWS Neptune
Fast, reliable managed graph db for apps with highly connected datasets.
Good for applications that work with highly connected data sets used to discover potential fraudulent behavior before it happens.
Used for recommendation engines, fraud detection, drug discovery, and network security
AWS Redshift
Enterprise-level, petabyte scale, fully managed warehousing service. Can achieve efficient storage and optimum query performance through a combination of massively parallel process, columnar data storage, and efficient data compression encoding schemes
Offers 10x faster performance than other solutions.
Serve different purposes than RDBMS. Warehouses are meant to store aggregate values (analytical data)
Structured Data
Organized to support transactional and analytical operations.
Most commonly stored in relational databases but can also be in non-relational.
Can run powerful data queries and analysis
Semistructured Data
More flexible than structured and without the requirement to change the schema for every single record in the table.
Allows user to capture any data in any structure as data evolves and changes over time.
Examples include XML, email, and JSON
Unstructured data
Not organized in any distinguishable or predefined manner
Full of irrelevant info which means data needs to first be processed to perform any kind of meaningful analysis
Examples include text messages, word processing docs, videos, photos, and other images. Files are not organized
Relational DBs
Built to store structured data in tables using defined schema
Key-Value DBs
non-relational that store unstructured data in the form of key-value pairs
+ Store data in a single table as blob objects without predefined schema
+ Flexible and handles a wide variety of data types
+ No need for complex joins
- Difficult to perform analytical queries due to lack of ions
- Access patterns need to be known in advance for optimum performance
Non-relational Document Dbs
Type of non-relational db that store semistructured and unstructured data in the form of files
+ Flexible
+ No need to play for a specific type of data
+ Easy to scale
- Sacrifice ACID compliance
- Databases cannot query across files natively
In-Memory
Both structured and unstructured data sources and for apps that require real-time access to data.
+ Support demanding apps requiring ms response times
+ Great for caching
+ Ultrafast and inexpensive access to copies of data
- Not great for rapidly changing data
Graph Dbs
Store any type of data, structured, semi, unstructured
+ Allow simple, fast retrieval of complex hierarchical structures
+ Great for RT big data mining, such as fraud detection
+ Great for making relevant recommendations and allowing for rapid querying of this relationships
- Cannot adequately store transactional data
- Not efficient for analytics
OLTP Databases
Focus on recording update, insertion and deletion transactions. Queries are simple and short which require less time and space to process.
OLAP
Store historical data that has been input by OLTP. Can extract information from a large database and analyze if for decision-making. A good example is business intelligence Tool
Describe real-time data analytics architecture with RDS
Stored procedure in RDS is executed for every new row, triggering a lambda function that passes the event to Kinesis and is stored in S3. Then can use Athena queries in QS to visualize data
Aurora vs RDS
Aurora is more durable than RDS and more resilient, it is very fast recovery from failures.
Aurora has better auto scaling capabilities and can provision up to 15 replicas
AWS DMS
Migrate data from external database to AWS.
Requires source and target DB connection strings, a deployed EC2 instance to run replication task
Transfer data from S3 into relational db in Aurora
Data Design Relational vs Non-relational
Relational: Normalized or dimensional data warehouse
Non: Denormalized document, wide column, or key-value
Advantages of Non-relational
Much east to scale horizontally, but has the issue of eventual consistency, which can be an issue for apps that require ACID compliance (Data may not be updated at the same time for all distributed systems)
Homogenous vs heterogeneous database migrations
Homogenous: Migrate between same DB engines, and require use of native database tools
Heterogenous: Migrate between different database engines. Requires the use of AWS Schema conversion Tool to translate db schema to new platform.
What instance classes does Aurora support
Burstable performance (short-lived bursts of high activity)
and
memory-optimized (suitable for most DBs)
What are the database engines RDS can run
Oracle, postgres, aurora, mysql, mariadb, oracle, and sql server
Which instance types are available for RDS
On-demand and reserved
What security features does Aurora provide
Require both authentication and permissions to access tables
IAM polices and be used to assign permissions
Security groups are used to control access to the DB instance
What are the pricing models for Aurora
Serverless, on-demand, reserved
What security options are available for DynamoDb
IAM, fully managed encryption at rest
What are the components of DynamoDB
Attribute, items, table
What are valid capacity modes for DynamoDB
Provisioned and On-demand
Server vs serverless DB architectures benefits/tradeoffs
Server-based: Traditional architecture, server hosts, delivers, and manages the resources the application users need. Need to pay for maintenance and you’re responsible for maintaining them. Requires you to pay for additional server as scale grows
+ Better for predictable workloads, in-depth analysis, or long running computations
+ Full visibility since you own all the infra
+ Good for legacy apps that can’t be decoupled
Serverless: Apps are hosted by 3rd party service so no need to manage server. Provide automatic scaling and higher availability
+ Good for rapid scaling and applications with short running tasks that have a single purpose
+ Liability is reduced
+ Smaller deployable units result in faster delivery
AWS RDS benefits
Automates config, management, and maintenance
Configures read replicas or setup synchronous replication
Automatic backups and encryption at rest and in transit
Can easily scale compute resources
Databases with EC2 benefits
Full control over database deployment
Supervise number of instances per database
Encrypt EBS volumes to protect your data at rest and in transit as data travels between the volume and the instance
Does RDS automatically scale workloads
No
Serverless Databases Benefits (DynamoDB and Aurora serverless)
High available, fault tolerant, and scales as damn grows
DynamoDB supports ACID
No hardware provisioning, patching, or upgrading
Encrypts data by default
Aurora serverless provides relational dbs, with on demand and automatically scaling. Shuts down when not in use
What are some use cases for aurora serverless
Variable workloads (Peaks throughout the day)
Unpredictable workloads (Peaks of unpredictable traffic)
New apps with unknown instance size
Multitenant apps where each customer has their own db
What are ideal applications for Redis
Session caching, full page caching, message queue applications, leaderboards
What are applications for memcached
Small and static data, static HTML page or JS pages
What are the node types for elasticache
On-demand (pay by the hour) and
reserved (1 or 3 year term) but high savings
What is the high level architecture of Redshift
Clients work with database via SQL endpoint at the leader node (leader and compute nodes are grouped into a cluster). This node spins up jobs and distributes to compute nodes that contain the actual data. Leader node aggregates data from compute nodes
Compute nodes have their own CPU, memory and disk storage. Jobs are partitioned into slices and allocated compute node resources
What are the pricing options for Redshift
On-demand: Pay an hourly rate based on type and number of nodes
Concurrency scaling: Per second on demand rate that exceeds free daily credits
Reserved Instance: Lots of savings by committing to 1 or 3 yr term
Spectrum pricing: Applied when you use this feature. Bytes scanned on S3
How many copies of your data and across how many availability zones does Aurora provide
6 copies across 3 zones
What are the 3 parts to Aurora billing
Instance hosting the database:
- On-demand, reserved, serverless (based on capacity)
Storage:
- per GB per month
- I/O
Data transferred out to the internet and other AWS regions. Never between services in the same region.
What is Amazon Aurora Global Database
Only a feature for MySQL but allows for single db to span multiple regions for shorter latencies throughout each region
Types of EC2 billing
On demand: Pay for compute capacity by the second with no long-term commitments
Spot Instances: Unused EC2 capacity. Can save a lot, but not always available
Reserved: Discounted but need to pay for 1 or 3yr contract
EC2 Families
General Purpose: M4, M5, T2/T3 Burstable
Compute Optimized: C4, C5, C5d
Memory Optimized: R4, R5
EC2 Placement Groups
Clusters: Logical grouping of instances within single AZ. Good for low network latency, high network throughput
Spread: Placed on distinct racks within data center. Good for small number of critical instances that should be kept separate from each other
Partition: Reduce the likelihood of correlated hardware failures of your application. Can be used to deploy large distributed and replicated workloads like HDFS and Cassandra across distinct racks +
What are the five types of EBS Volumes
General purpose SSD gp2
Provisioned IOPS SSD io1
Throughput HDD st1
Cold HDD sc1
EBS Magnetic standard
What is the volume size range of a gp (General purpose) EBS volume and the IOPS it can accommodate
1GB to 16TB
16k IOPS
What is the IOPS range for the provisioned SSD EBS Volumes
64k -256k IOPS
This would be for big database workloads
What is the volume size and max throughput volume of a Throughput optimized HDD EBS volume
125GB to 16TB
Throughput is 500 MiB/s
Used for data lakes and data warehouses
What are cold HDD EBS volumes for
Lowest cost block store for infrequently accessed data workloads. Some use cases include file servers and throughput oriented storage for data that is infrequently accessed
Volume size of EBS magnetic
1GB to 1TB. Workloads for infrequently used data
What are some characteristics of Redis
Advanced data structures
Multi AZ capable
Replication
Backup and restore
What are some characteristics of memcached
Simple data structures
No replication
No backups
Multiple nodes
Multi-threaded
No backups
What is the throughput performance for Aurora in regards to MySQL and Postgres
5x for MySQL; 3x for Postgres; Can scale out to 15 replicas
What is DynamoDB Accelerator DAX
High available cache for DynamoDB. Microsecond latency. Millions of requests per second. API compatible
What is the range of the C4 class of EC2 instances
This is compute optimized instances
Anywhere from 3 to 36 vCPUS
What is the maximum number of vCPUs available for EC2 in the C family
64
What is the main difference going from C4 to C7 EC2 instances
Basically better/new get processors that may be more cost-effective/higher performing
Can RDS instances encrypt backups
Yes, if you enable encryption at the time of launching an RDS instance
Can RDS instances encrypt logs
Yes, in you enable encryption at the time of launching an RDS instance
Does RDS automatically encrypt data, logs, and backups
Yes once you enable encryption it is done automatically
When deploying a DB cluster with limited use (ie debugging prod issues) what is the most cost-effective option
Aurora Serverless. Not a long running instance.
Do DynamoDB instances exist outside the VPC boundary? What about Aurora
DynamoDb is a regional service and can exist outside the VPC, whereas Aurora must reside within it.