10 - Storage Flashcards
EBS Volume Types
1) EBS Volumes come in 6 types
* gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for a wide variety of workloads
* io1 / io2 (SSD): Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads
* st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput-intensive workloads
* sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads
2) Only gp2/gp3 and io1/io2 can be used as boot volumes
EBS Volume Types Summary
1) General Purpose SSD, gp3/gp2, 1 GiB - 16 TiB, 16,000 Max IOPS, 250-1000 MiB/s
2) Provisioned IOPS SSD, io2/io1, 4 GiB - 16 TiB, 32,000-64,000 (Nitro) Max IOPS, 500 - 1,000 (Nitro) MiB/s
3) Throughput Optimized HDD, st1, 125 GiB - 16 TiB, 500 Max IOPS, 500 MiB/s
4) Cold HDD, sc1, 125 GiB - 16 TiB, 250 Max IOPS, 250 MiB/s
Amazon S3 Overview
1) Objects (files) have a Key
2) The key is the FULL path:
• s3://my-bucket/my_file.txt
• s3://my-bucket/my_folder1/another_folder/my_file.txt
3) The key is composed of prefix + object name
• s3://my-bucket/my_folder1/another_folder/my_file.txt
4) Object values are the content of the body:
• Max Object Size is 5TB (5000GB)
• If uploading more than 5GB, must use “multi-part upload”
S3 Encryption for Objects
• SSE-S3, SSE-KMS, SSE-C, Client Side Encryption
SSE-S3
• SSE-S3: encryption using keys handled & managed by Amazon S3
• Object is encrypted server side
• AES-256 encryption type
• Must set header: “x-amz-server-side-encryption”: “AES256”
SSE-KMS
• SSE-KMS: encryption using keys handled & managed by KMS
• KMS Advantages: user control + audit trail
• Object is encrypted server side
• Must set header: “x-amz-server-side-encryption”: ”aws:kms”
SSE-C
• SSE-C: server-side encryption using data keys fully managed by the customer outside of AWS
• Amazon S3 does not store the encryption key you provide
• HTTPS must be used
• Encryption key must provided in HTTP headers, for every HTTP request made
Client Side Encryption
• Client library such as the Amazon S3 Encryption Client
• Clients must encrypt data themselves before sending to S3
• Clients must decrypt data themselves when retrieving from S3
• Customer fully manages the keys and encryption cycle
S3 Security
1) User based
• IAM policies - which API calls should be allowed for a specific user from IAM console
2) Resource Based
• Bucket Policies - bucket wide rules from the S3 console - allows cross account
• Object Access Control List (ACL) – finer grain
• Bucket Access Control List (ACL) – less common
3) Note: an IAM principal can access an S3 object if
• the user IAM permissions allow it OR the resource policy ALLOWS it
• AND there’s no explicit DENY
S3 Bucket Policies
1) JSON based policies • Resources: buckets and objects • Actions: Set of API to Allow or Deny • Effect: Allow / Deny • Principal: The account or user to apply the policy to
2) Use S3 bucket for policy to:
• Grant public access to the bucket
• Force objects to be encrypted at upload
• Grant access to another account (Cross Account)
S3 Bucket settings for Block Public Access
1) Block public access to buckets and objects granted through
• new access control lists (ACLs)
• any access control lists (ACLs)
• new public bucket or access point policies
2) Block public and cross-account access to buckets and objects through any public bucket or access point policies
S3 CORS
• CORS means Cross-Origin Resource Sharing
• The requests won’t be fulfilled unless the other origin allows for the
requests, using CORS Headers (ex: Access-Control-Allow-Origin)
- If a client does a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
- It’s a popular exam question
- You can allow for a specific origin or for * (all origins)
S3 Replication (CRR & SRR)
• Must enable versioning in source and destination
• Cross Region Replication (CRR)
• Same Region Replication (SRR)
- Buckets can be in different accounts
- Copying is asynchronous
- Must give proper IAM permissions to S3
- CRR - Use cases: compliance, lower latency access, replication across accounts
- SRR – Use cases: log aggregation, live replication between production and test accounts
S3 Pre-Signed URLs
• Can generate pre-signed URLs using SDK or CLI
• For downloads (easy, can use the CLI)
• For uploads (harder, must use the SDK)
1) Valid for a default of 3600 seconds, can change timeout with –expires-in [TIME_BY_SECONDS] argument
2) Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT
3) Examples :
• Allow only logged-in users to download a premium video on your S3 bucket
• Allow an ever changing list of users to download files by generating URLs dynamically
• Allow temporarily a user to upload a file to a precise location in our bucket
Amazon Glacier & Glacier Deep Archive
1) Amazon Glacier – 3 retrieval options: • Expedited (1 to 5 minutes) • Standard (3 to 5 hours) • Bulk (5 to 12 hours) • Minimum storage duration of 90 days
2) Amazon Glacier Deep Archive – for long term storage – cheaper:
• Standard (12 hours)
• Bulk (48 hours)
• Minimum storage duration of 180 days
S3 Performance
Multi-Part upload:
• recommended for files > 100MB, must use for files > 5GB
• Can help parallelise uploads (speed up transfers)
S3 Transfer Acceleration
• Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
• Compatible with multi-part upload
S3 Byte-Range Fetches
• Parallelise GETs by requesting specific byte ranges
• Better resilience in case of failures
S3 Select & Glacier Select
- Retrieve less data using SQL by performing server side filtering
- Can filter by rows & columns (simple SQL statements)
- Less network transfer, less CPU cost client-side
Amazon FSx for Windows (File Server)
• FSx for Windows is a fully managed Windows file system share drive
- Supports SMB protocol & Windows NTFS
- Microsoft Active Directory integration, ACLs, user quotas
- Built on SSD, scale up to 10s of GB/s, millions of IOPS, 100s PB of data
- Can be accessed from your on-premise infrastructure
- Can be configured to be Multi-AZ (high availability)
- Data is backed-up daily to S3
Amazon FSx for Lustre
• Lustre is a type of parallel distributed file system, for large-scale computing
• The name Lustre is derived from “Linux” and “cluster”
• Machine Learning, High Performance Computing (HPC)
• Video Processing, Financial Modeling, Electronic Design Automation
• Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
• Seamless integration with S3
- Can “read S3” as a file system (through FSx)
- Can write the output of the computations back to S3 (through FSx)
• Can be used from on-premise servers