Storage Flashcards
EBS
- Network drive you attach to ONE instance only.
- Linked to specific AZ, the only way to move is do snapshot and restore
- Volumes can be resized
- Best performance when EBS and instance type are well matched
EBS Volume Types
- gp2: General (cheap)
- 3 IOPS/GB, min 100 IOPS, burst to 3000 IOPS, max 16000 IOPS
- 1GB - 16TB. +1 TB = + 3000 IOPS
- IO1. Provision IOPS (expensive)
– Min 100 IOPS, Max 64000 IOPS (Nitro) or 32000 (Other)
– 4GB - 16TB, Size of volume and IOPS are independent
For databases - ST1. Throughput optimized HDD
– 500GB - 16TB, 500 MB/s throughput
For data analtics - SC1 Cold HDD
- 250GB - 16TB, 250MB/s throughput
EBS RAID COnfigurations
- RAID 0, distributed - faster but in case of failure half is lost
- RAID 1, replicated - same speed, in case of failure no data loss
EBS Snapshots
- Incremental
- Use IO so don’t run while application is using a lot of traffic
- Stored in S2 (Not visible)
- Not necessary to detach volume to snapshot, but recommended
- Can copy across region for DR
- Can create AMI from snapshot
- EBS volumes restored by snapshots need to be pre- warmed (fio or dd command to read entire volume)
- Can be automated using Amazon Data Lifecycle Manager
Local EC2 Instance Store
- Physical Disk, very high IOPS up to 7.5TB, stripped can reach 30TB - Block storage - Cannot be increased in size - Risk of data loss of hw fails - Ephemeral, lose, stop or terminate EC2, instance lose storage - Survives reboots - Good for buffer, cache, scratch data - Manual backups
EFS
- Linux based only, POSIX, NSF4
- Good for data sharing, cms
- Control access using SGs
- Encryption at rest with KMS
- Only one VPC, but can create one mount target per AZ for redundancy
EFS Scale
- 1000s concurrent NFS client, 10GB+/s throughput
- Grow to petabyte scale
EFS Performance
- General
- Max IO
Set at EFS creation time
EFS Throughput
Bursting, linked to FS size
Provisioned IO, expensive high throughput to size ratio
EFS VPC peering
EC2 can be in another VPC and connected using VPC peering
EFS on-prem
- Can be connected using Direct Connect and/or VPN
- Can be accessed using Mount Target IPv4, hostname not supported
S3 vs DynamoDB
No indexing facility on S3
- Use S3 event to notify lambda
- Lambda reads from S3 using by fetch and inserts metadata and indexed data into DynamoDB
S3 vs EFS
S3 is not good for POSIX or file locking use EFS instead
S3 Replication
- For latency, for DR, for security
- Cross Region
- Same Region
- Can combine with lifecycle policies
- Must enable S3 bucket versioning
S3 Event Notifications
- Delivery in seconds, but can take up to minutes
- If two events same time non versioned object, possible only one event will be fired
- To ensure event for every successful write enable versioning
S3 CW
- When CloudTrail enabled, records all bucket level API calls by default
- Can enable object level by enabling CloudTrail on bucket
S3 Baseline, Performance
- 3500 PUT/COPY/POST/DELETE per second per prefix
- 5500 GET/HEAD per second per prefix
S3 Performance Optimizations Upload
- Multi-part upload, parallel uploads
- recommended for > 100MB
- must use for > 5GB
- retries chunk not whole
- Transfer Acceleration
- Compatible with multi-part upload
- Go to edge location, from there use fast AWS network to upload to regional bucket
S3 Performance Optimizations Download
- S3 Byte-Range Fetches
- Parallelise gets by requesting specific byte
- re-request byte chunk not whole, better resilience
- Can be used to retrieve only a part of the file, ex. head
S3 Select & Glacier Select
- Optimise data transfer size by doing server side filtering using SQL
- Less network transfer, less CPU cost client-side
- Cheaper
S3 Cloudfront
- Require that your users access your private content by using special CloudFront signed URLs or signed cookies.
- Require that your users access your content by using CloudFront URLs, not URLs that access content directly on the origin server (for example, Amazon S3 or a private HTTP server). Requiring CloudFront URLs isn’t necessary, but we recommend it to prevent users from bypassing the restrictions that you specify in signed URLs or signed cookies.