Storage Flashcards
S3
- Bucket must have a globally unique name
- Bucket are defined at the region level
- Naming convention
- No uppercase
- No underscore
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number
- Object key is its full path
- Max 5TB
- More than 5GB, must use “multi-part upload”
- Strong Consistency
S3 Standard General Purpose
- Use for frequently accessed data
- Low latency and high throughput
- Sustain 2 concurrent facility failures
Infrequent Access
- Use for less frequently access but require rapid access when needed
- Standard IA
- Use Cases : Disaster Recovery and Backups
- One Zone IA
- Use Cases : Secondary backup
Glacier
- Glacier Instant Retrieval
- milli second retrieval
- min 90 days storage
- Use Case : Data access once a quarter
- Glacier Flexible Retrieval
- Expedited 1-5 mins
- Standard 3-5 hours
- Bulk 5-12 hours
- min 90 days storage
- Glacier Deep Archive
- Standard 12 hours
- Bulk 48 hours
- min 180 days storage
S3 Intelligent Tiering
- Small monthly monitoring and auto-tiering fee
- Moves objects automatically between Access Tiers based on usage
S3 Moving Between Storage Classes
- For IA accessed object, move them to STANDARD_IA
- For archive objects, move to Glacier or Deep_Archive
- Moving objects can be automated using a lifecycle configuration
S3 Lifecycle Rules
- Transition Rules
- defines when objects are transitioned to another storage class
- Expiration Rules
- Configure objects to expire after some time
- can be used to delete old versions of files if versioning is enabled
- can be used to delete incomplete multi part uploads
- Configure objects to expire after some time
S3 Versioning
- Enable at bucket level
- Same key overwrite will increment the version
- Use Cases : 1 Protect against unintended delete 2 Easy roll back
- Any file is not versioned prior to enabling versioning will have version “null”
- Suspending versioning does not delete the previous versions
S3 Replication
- Must enable versioning in source and destination
- Cross Region Replication
- Same Region Replication
- Buckets can be in different account
- Copying is asynchronous
- After activation, only new objects are replicated
- Optionally, you can replicate existing objects using S3 Batch Replication to replicate existing objects and objects that failed replication
- For DELETE operation
- can replicate delete markers from source to target
- deletions with a version ID are not replicated
- There is no chaining replication
S3 Performance
- Durability 99.999999999%
- Availability 99.99%
- 100-200ms latency
- 3500 Put Copy Post Delete /sec /prefix in a bucket
- 5500 Get Head /sec /prefix in a bucket
- Multi Part Upload
- Recommended for files > 100 MB
- Required for files > 5GB
- S3 Transfer Acceleration
- Increase transfer speed by transferring file on an AWS edge location which will forward the data to the S3 Bucket in the target region
- Compatible with multi-part upload
- S3 Byte-Range Fetches
- Parallelize GETs by requesting specific byte ranges
- Better resilience in case of failures
S3 KMS
- SSE-KMS will be impacted by KMS limit
- When upload, calls GenerateDataKey KMS API
- When download, calls Decrypt KMS API
SSE-S3
- “x-amz-server-side-encryption” : “AES256”
SSE-KMS
- Pros : Use Control + Audit Trail
- “x-amz-server-side-encryption” : “aws:kms”
SSE-C
- HTTPS must be used
- Encryption key must be provided in HTTP header
S3 Bucket Settings for Block Public Access
- Block public access to buckets and objects
- Block public and cross-account access to buckets and objects thr any public bucket or access point policies
S3 Event Notification with Amazon EventBridge
- Advanced Filtering options with JSON rules (metadata, object size, name, …)
- Multiple Destinations –> Step Functions, Kinesis Streams/Firehose
- EventBridge Capability –> Archive, Replay Events, Reliable delivery
NoSQL
- Distributed
- Support limited query joins
- NoSQL databases scale horizontally
DynamoDB
- Fully managed
- Highly available with replication across multiple AZs
- Scales to massive workloads and distributed databases
- Millions of requested per second
- 100TBs storage
- Standard and Infrequent Access Table Class
DynamoDB Basics
- DynamoDB is made of Tables
- Each table has a primary key
- Each table can have an infinite number of items
- Each item has attributes
- max size of an item is 400KB
- 3 Data Types (Number, String, Binary in both scalar and multi-valued sets)
- Supports document stores such as JSON, XML, HTML
DynamoDB Primary Key
- Choose the column which has highest cardinality
- Partition Key (HASH)
- partition key must be unique for each item
- partition key must be diverse so that the data is distributed
- Partition Key+Sort Key (HASH+RANGE)
- Combination must be unique for each item
- Data is grouped by partition key
DynamoDB Read Write Capacity Mode
- Provisioned Mode
- Throughput can be exceeded temporarily using “Burst Capacity”
- If Burst Capacity has been consumed, you will get a “ProvisionedThroughputExcceded” Exception
- On-Demand Mode
- Switch between different modes once every 24 hours
DynamoDB Write Capacity Unit
- 1 WCU represents 1 write per second for an item up to 1KB in size
DynamoDB Strongly Consistent Read vs Eventually Consistent Read
- SCR
- If we read after a write, we will get the correct data
- Set ConsistentRead to True
- Consumes twice the RCU
- ECR
- If we read just after a write, it is possible to get some stale data because of replication
Read Capacity Unit
- 1 RCU represents 1 Strongly Consistent Read per second
- 1 RCU represents 2 Eventually Consistent Read per second
- for an item up to 4KB
DynamoDB Throttling
- If we exceed provisioned WCU or RCU, we get “ProvisionedThroughputExceeded” Exception
- Solution
- Distributie Partition Key
- Use DynamoDB Accelerator (DAX)
DynamoDB Writing Data
- PutItem
- Creates a new item or fully replace an old item
- UpdateItem
- Edits an existing items attribute or adds a new item if it does not exist
DynamoDB Reading Data
- GetItem
- Read based on Primary Key
- Primary Key can be HASH or HASH+RANGE
- Eventual Consistent Read
- Returns up to 1MB
- Scan
- Scan the entire table and filter out data
- Can use Parallel Scan
DynamoDB Batch Operation
- Save latency by reducing the number of API calls
- Operations are done in parallel for better efficiency
- BatchWriteItem
- Up to 25 PutItem or DeleteItem in one call
- Up to 16MB of data written and up to 400KB per item
- Cannot update items
- BatchGetItem
- Return items from one or more tables
- Up to 100 items and up to 16 MB per item
- Items are retrieved in parallel to minimize latency
DynamoDB Local Secondary Index
- Alternative Sort Key
- Up to 5 Local Secondary Indexes per table
- Must be defined at table creation time
DynamoDB Global Secondary Index
- Alternative Primary Key (HASH or HASH+RANGE) from base table
- Speed up queries on non-key attributes
- Must provision RCUs and WCUs for the index
- Can be added / modified after table creation
DynamoDB Indexes and Throttling
- GSI
- If writes are throttled on GSI, then the main table will be throttled
- LSI
- Use the WCUs and RCUs of the main table
- No special throttling considerations
DynamoDB PartiQL
- Use a SQL like syntax to manipulate DynamoDB tables
- Support some insert, update, select and delete statements
DynamoDB Accelerator
- Fully managed in-memory cache for DynamoDB
- Microseconds latency for cached reads and queries
- Does not require application logic modification (compatible with existing DynamoDB APIs)
- 5 mins TTL for cache
- Up to 10 nodes in the cluster
- Multi-AZ
DynamoDB Streams
- Ordered stream of item-level modifications (create/update/delete) in a table
- Stream records can be sent to KDS
- Retention up to 24 hours
- Ability to choose the information that will be written to the stream
- KEYS_ONLY : Only the key attributes of the modified item
- NEW_IMAGE : the entire new item
- OLD_IMAGE : the entire old item
- NEW_AND_OLD_IMAGE : both new and old images of the item
- DynamoDB Streams are made of shards just like Kinesis Data Streams
- Records are not retroactively populated in a stream after enabling it
DynamoDB Streams and AWS Lambda
- We need to define an Event Source Mapping to read from a DynamoDB Streams
- We need to ensure the Lambda function has the appropriate permissions
- Lambda function is invoked synchronously
DynamoDB Time To Live
- Automatically delete items after an expiry timestamp
- Does not consume WCUs
- TTL attribute must be a “Number” data type with “Unix Epoch timestamp” value
- Expired items deleted within 48 hours of expiration
- Expired items that have not been deleted will appear in reads/queries/scans
- Expired items are deleted from both LSIs and GSIs
- A delete operation for each expired item enters the DynamoDB Streams
DynamoDB Security
- Security
- VPC Endpoint
- EAR : KMS
- EIF : SSL / TLS
- BackUp
- Point in time restore like RDS
AWS ElastiCache
- To manage Redis or Memcached
- Caches are in memory database with high performance and low latency
- Helps reduce load off databases for read intensive workloads
- Multi AZ with Failover Capability
- AWS takes care of OS maintenance / patching, optimizations, setup etc
Redis
- In-memory key-value store
- Super low latency (sub ms)
- Cache survive reboots by default
- Multi AZ with Automatic Failover for disaster recovery
- Support for Read Replicas
- Use Cases : Gaming, Relieve pressure on databases, etc
Memcached
- Memcached is an in-memory object store
- Cache does not survive reboots
- Overall, Redis is better