Storage Flashcards

Question 1

Q

S3

Answer

A

Bucket must have a globally unique name
Bucket are defined at the region level
Naming convention
- No uppercase
- No underscore
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number
Object key is its full path
Max 5TB
More than 5GB, must use “multi-part upload”
Strong Consistency

Question 2

Q

S3 Standard General Purpose

Answer

A

Use for frequently accessed data
Low latency and high throughput
Sustain 2 concurrent facility failures

Question 3

Q

Infrequent Access

Answer

A

Use for less frequently access but require rapid access when needed
Standard IA
- Use Cases : Disaster Recovery and Backups
One Zone IA
- Use Cases : Secondary backup

Question 4

Q

Glacier

Answer

A

Glacier Instant Retrieval
- milli second retrieval
- min 90 days storage
- Use Case : Data access once a quarter
Glacier Flexible Retrieval
- Expedited 1-5 mins
- Standard 3-5 hours
- Bulk 5-12 hours
- min 90 days storage
Glacier Deep Archive
- Standard 12 hours
- Bulk 48 hours
- min 180 days storage

Question 5

Q

S3 Intelligent Tiering

Answer

A

Small monthly monitoring and auto-tiering fee
Moves objects automatically between Access Tiers based on usage

Question 6

Q

S3 Moving Between Storage Classes

Answer

A

For IA accessed object, move them to STANDARD_IA
For archive objects, move to Glacier or Deep_Archive
Moving objects can be automated using a lifecycle configuration

Question 7

Q

S3 Lifecycle Rules

Answer

A

Transition Rules
- defines when objects are transitioned to another storage class
Expiration Rules
- Configure objects to expire after some time
  - can be used to delete old versions of files if versioning is enabled
  - can be used to delete incomplete multi part uploads

Question 8

Q

S3 Versioning

Answer

A

Enable at bucket level
Same key overwrite will increment the version
Use Cases : 1 Protect against unintended delete 2 Easy roll back
Any file is not versioned prior to enabling versioning will have version “null”
Suspending versioning does not delete the previous versions

Question 9

Q

S3 Replication

Answer

A

Must enable versioning in source and destination
Cross Region Replication
Same Region Replication
Buckets can be in different account
Copying is asynchronous
After activation, only new objects are replicated
Optionally, you can replicate existing objects using S3 Batch Replication to replicate existing objects and objects that failed replication
For DELETE operation
- can replicate delete markers from source to target
- deletions with a version ID are not replicated
There is no chaining replication

Question 10

Q

S3 Performance

Answer

A

Durability 99.999999999%
Availability 99.99%
100-200ms latency
3500 Put Copy Post Delete /sec /prefix in a bucket
5500 Get Head /sec /prefix in a bucket
Multi Part Upload
- Recommended for files > 100 MB
- Required for files > 5GB
S3 Transfer Acceleration
- Increase transfer speed by transferring file on an AWS edge location which will forward the data to the S3 Bucket in the target region
- Compatible with multi-part upload
S3 Byte-Range Fetches
- Parallelize GETs by requesting specific byte ranges
- Better resilience in case of failures

Question 11

Q

S3 KMS

Answer

A

SSE-KMS will be impacted by KMS limit
When upload, calls GenerateDataKey KMS API
When download, calls Decrypt KMS API

Question 12

Q

SSE-S3

Answer

A

“x-amz-server-side-encryption” : “AES256”

Question 13

Q

SSE-KMS

Answer

A

Pros : Use Control + Audit Trail
“x-amz-server-side-encryption” : “aws:kms”

Question 14

Q

SSE-C

Answer

A

HTTPS must be used
Encryption key must be provided in HTTP header

Question 15

Q

S3 Bucket Settings for Block Public Access

Answer

A

Block public access to buckets and objects
Block public and cross-account access to buckets and objects thr any public bucket or access point policies

Question 16

Q

S3 Event Notification with Amazon EventBridge

Answer

A

Advanced Filtering options with JSON rules (metadata, object size, name, …)
Multiple Destinations –> Step Functions, Kinesis Streams/Firehose
EventBridge Capability –> Archive, Replay Events, Reliable delivery

Question 17

Q

NoSQL

Answer

A

Distributed
Support limited query joins
NoSQL databases scale horizontally

Question 18

Q

DynamoDB

Answer

A

Fully managed
Highly available with replication across multiple AZs
Scales to massive workloads and distributed databases
Millions of requested per second
100TBs storage
Standard and Infrequent Access Table Class

Question 19

Q

DynamoDB Basics

Answer

A

DynamoDB is made of Tables
Each table has a primary key
Each table can have an infinite number of items
Each item has attributes
max size of an item is 400KB
3 Data Types (Number, String, Binary in both scalar and multi-valued sets)
Supports document stores such as JSON, XML, HTML

Question 20

Q

DynamoDB Primary Key

Answer

A

Choose the column which has highest cardinality
Partition Key (HASH)
- partition key must be unique for each item
- partition key must be diverse so that the data is distributed
Partition Key+Sort Key (HASH+RANGE)
- Combination must be unique for each item
- Data is grouped by partition key

Question 21

Q

DynamoDB Read Write Capacity Mode

Answer

A

Provisioned Mode
- Throughput can be exceeded temporarily using “Burst Capacity”
- If Burst Capacity has been consumed, you will get a “ProvisionedThroughputExcceded” Exception
On-Demand Mode
Switch between different modes once every 24 hours

Question 22

Q

DynamoDB Write Capacity Unit

Answer

A

1 WCU represents 1 write per second for an item up to 1KB in size

Question 23

Q

DynamoDB Strongly Consistent Read vs Eventually Consistent Read

Answer

A

SCR
- If we read after a write, we will get the correct data
- Set ConsistentRead to True
- Consumes twice the RCU
ECR
- If we read just after a write, it is possible to get some stale data because of replication

Question 24

Q

Read Capacity Unit

Answer

A

1 RCU represents 1 Strongly Consistent Read per second
1 RCU represents 2 Eventually Consistent Read per second
for an item up to 4KB

Question 25

Q

DynamoDB Throttling

Answer

A

If we exceed provisioned WCU or RCU, we get “ProvisionedThroughputExceeded” Exception
Solution
- Distributie Partition Key
- Use DynamoDB Accelerator (DAX)

Question 26

Q

DynamoDB Writing Data

Answer

A

PutItem
- Creates a new item or fully replace an old item
UpdateItem
- Edits an existing items attribute or adds a new item if it does not exist

Question 27

Q

DynamoDB Reading Data

Answer

A

GetItem
- Read based on Primary Key
- Primary Key can be HASH or HASH+RANGE
- Eventual Consistent Read
- Returns up to 1MB
Scan
- Scan the entire table and filter out data
- Can use Parallel Scan

Question 28

Q

DynamoDB Batch Operation

Answer

A

Save latency by reducing the number of API calls
Operations are done in parallel for better efficiency
BatchWriteItem
- Up to 25 PutItem or DeleteItem in one call
- Up to 16MB of data written and up to 400KB per item
- Cannot update items
BatchGetItem
- Return items from one or more tables
- Up to 100 items and up to 16 MB per item
- Items are retrieved in parallel to minimize latency

Question 29

Q

DynamoDB Local Secondary Index

Answer

A

Alternative Sort Key
Up to 5 Local Secondary Indexes per table
Must be defined at table creation time

Question 30

Q

DynamoDB Global Secondary Index

Answer

A

Alternative Primary Key (HASH or HASH+RANGE) from base table
Speed up queries on non-key attributes
Must provision RCUs and WCUs for the index
Can be added / modified after table creation

Question 31

Q

DynamoDB Indexes and Throttling

Answer

A

GSI
- If writes are throttled on GSI, then the main table will be throttled
LSI
- Use the WCUs and RCUs of the main table
- No special throttling considerations

Question 32

Q

DynamoDB PartiQL

Answer

A

Use a SQL like syntax to manipulate DynamoDB tables
Support some insert, update, select and delete statements

Question 33

Q

DynamoDB Accelerator

Answer

A

Fully managed in-memory cache for DynamoDB
Microseconds latency for cached reads and queries
Does not require application logic modification (compatible with existing DynamoDB APIs)
5 mins TTL for cache
Up to 10 nodes in the cluster
Multi-AZ

Question 34

Q

DynamoDB Streams

Answer

A

Ordered stream of item-level modifications (create/update/delete) in a table
Stream records can be sent to KDS
Retention up to 24 hours
Ability to choose the information that will be written to the stream
- KEYS_ONLY : Only the key attributes of the modified item
- NEW_IMAGE : the entire new item
- OLD_IMAGE : the entire old item
- NEW_AND_OLD_IMAGE : both new and old images of the item
DynamoDB Streams are made of shards just like Kinesis Data Streams
Records are not retroactively populated in a stream after enabling it

Question 35

Q

DynamoDB Streams and AWS Lambda

Answer

A

We need to define an Event Source Mapping to read from a DynamoDB Streams
We need to ensure the Lambda function has the appropriate permissions
Lambda function is invoked synchronously

Question 36

Q

DynamoDB Time To Live

Answer

A

Automatically delete items after an expiry timestamp
Does not consume WCUs
TTL attribute must be a “Number” data type with “Unix Epoch timestamp” value
Expired items deleted within 48 hours of expiration
Expired items that have not been deleted will appear in reads/queries/scans
Expired items are deleted from both LSIs and GSIs
A delete operation for each expired item enters the DynamoDB Streams

Question 37

Q

DynamoDB Security

Answer

A

Security
- VPC Endpoint
EAR : KMS
EIF : SSL / TLS
BackUp
- Point in time restore like RDS

Question 38

Q

AWS ElastiCache

Answer

A

To manage Redis or Memcached
Caches are in memory database with high performance and low latency
Helps reduce load off databases for read intensive workloads
Multi AZ with Failover Capability
AWS takes care of OS maintenance / patching, optimizations, setup etc

Question 39

Q

Redis

Answer

A

In-memory key-value store
Super low latency (sub ms)
Cache survive reboots by default
Multi AZ with Automatic Failover for disaster recovery
Support for Read Replicas
Use Cases : Gaming, Relieve pressure on databases, etc

Question 40

Q

Memcached

Answer

A

Memcached is an in-memory object store
Cache does not survive reboots
Overall, Redis is better