10 - Storage Flashcards

1
Q

EBS Volume Types

A

1) EBS Volumes come in 6 types
* gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for a wide variety of workloads
* io1 / io2 (SSD): Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads
* st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput-intensive workloads
* sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads
2) Only gp2/gp3 and io1/io2 can be used as boot volumes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

EBS Volume Types Summary

A

1) General Purpose SSD, gp3/gp2, 1 GiB - 16 TiB, 16,000 Max IOPS, 250-1000 MiB/s
2) Provisioned IOPS SSD, io2/io1, 4 GiB - 16 TiB, 32,000-64,000 (Nitro) Max IOPS, 500 - 1,000 (Nitro) MiB/s
3) Throughput Optimized HDD, st1, 125 GiB - 16 TiB, 500 Max IOPS, 500 MiB/s
4) Cold HDD, sc1, 125 GiB - 16 TiB, 250 Max IOPS, 250 MiB/s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Amazon S3 Overview

A

1) Objects (files) have a Key
2) The key is the FULL path:
• s3://my-bucket/my_file.txt
• s3://my-bucket/my_folder1/another_folder/my_file.txt
3) The key is composed of prefix + object name
• s3://my-bucket/my_folder1/another_folder/my_file.txt
4) Object values are the content of the body:
• Max Object Size is 5TB (5000GB)
• If uploading more than 5GB, must use “multi-part upload”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

S3 Encryption for Objects

• SSE-S3, SSE-KMS, SSE-C, Client Side Encryption

A

SSE-S3
• SSE-S3: encryption using keys handled & managed by Amazon S3
• Object is encrypted server side
• AES-256 encryption type
• Must set header: “x-amz-server-side-encryption”: “AES256”

SSE-KMS
• SSE-KMS: encryption using keys handled & managed by KMS
• KMS Advantages: user control + audit trail
• Object is encrypted server side
• Must set header: “x-amz-server-side-encryption”: ”aws:kms”

SSE-C
• SSE-C: server-side encryption using data keys fully managed by the customer outside of AWS
• Amazon S3 does not store the encryption key you provide
• HTTPS must be used
• Encryption key must provided in HTTP headers, for every HTTP request made

Client Side Encryption
• Client library such as the Amazon S3 Encryption Client
• Clients must encrypt data themselves before sending to S3
• Clients must decrypt data themselves when retrieving from S3
• Customer fully manages the keys and encryption cycle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

S3 Security

A

1) User based
• IAM policies - which API calls should be allowed for a specific user from IAM console

2) Resource Based
• Bucket Policies - bucket wide rules from the S3 console - allows cross account
• Object Access Control List (ACL) – finer grain
• Bucket Access Control List (ACL) – less common

3) Note: an IAM principal can access an S3 object if
• the user IAM permissions allow it OR the resource policy ALLOWS it
• AND there’s no explicit DENY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

S3 Bucket Policies

A
1) JSON based policies
• Resources: buckets and objects
• Actions: Set of API to Allow or Deny
• Effect: Allow / Deny
• Principal: The account or user to apply the policy to

2) Use S3 bucket for policy to:
• Grant public access to the bucket
• Force objects to be encrypted at upload
• Grant access to another account (Cross Account)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

S3 Bucket settings for Block Public Access

A

1) Block public access to buckets and objects granted through
• new access control lists (ACLs)
• any access control lists (ACLs)
• new public bucket or access point policies

2) Block public and cross-account access to buckets and objects through any public bucket or access point policies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

S3 CORS
• CORS means Cross-Origin Resource Sharing
• The requests won’t be fulfilled unless the other origin allows for the
requests, using CORS Headers (ex: Access-Control-Allow-Origin)

A
  • If a client does a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
  • It’s a popular exam question
  • You can allow for a specific origin or for * (all origins)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

S3 Replication (CRR & SRR)
• Must enable versioning in source and destination
• Cross Region Replication (CRR)
• Same Region Replication (SRR)

A
  • Buckets can be in different accounts
  • Copying is asynchronous
  • Must give proper IAM permissions to S3
  • CRR - Use cases: compliance, lower latency access, replication across accounts
  • SRR – Use cases: log aggregation, live replication between production and test accounts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

S3 Pre-Signed URLs
• Can generate pre-signed URLs using SDK or CLI
• For downloads (easy, can use the CLI)
• For uploads (harder, must use the SDK)

A

1) Valid for a default of 3600 seconds, can change timeout with –expires-in [TIME_BY_SECONDS] argument
2) Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT

3) Examples :
• Allow only logged-in users to download a premium video on your S3 bucket
• Allow an ever changing list of users to download files by generating URLs dynamically
• Allow temporarily a user to upload a file to a precise location in our bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amazon Glacier & Glacier Deep Archive

A
1) Amazon Glacier – 3 retrieval options:
• Expedited (1 to 5 minutes)
• Standard (3 to 5 hours)
• Bulk (5 to 12 hours)
• Minimum storage duration of 90 days

2) Amazon Glacier Deep Archive – for long term storage – cheaper:
• Standard (12 hours)
• Bulk (48 hours)
• Minimum storage duration of 180 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

S3 Performance

A

Multi-Part upload:
• recommended for files > 100MB, must use for files > 5GB
• Can help parallelise uploads (speed up transfers)

S3 Transfer Acceleration
• Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region
• Compatible with multi-part upload

S3 Byte-Range Fetches
• Parallelise GETs by requesting specific byte ranges
• Better resilience in case of failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

S3 Select & Glacier Select

A
  • Retrieve less data using SQL by performing server side filtering
  • Can filter by rows & columns (simple SQL statements)
  • Less network transfer, less CPU cost client-side
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amazon FSx for Windows (File Server)

• FSx for Windows is a fully managed Windows file system share drive

A
  • Supports SMB protocol & Windows NTFS
  • Microsoft Active Directory integration, ACLs, user quotas
  • Built on SSD, scale up to 10s of GB/s, millions of IOPS, 100s PB of data
  • Can be accessed from your on-premise infrastructure
  • Can be configured to be Multi-AZ (high availability)
  • Data is backed-up daily to S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Amazon FSx for Lustre
• Lustre is a type of parallel distributed file system, for large-scale computing
• The name Lustre is derived from “Linux” and “cluster”

A

• Machine Learning, High Performance Computing (HPC)
• Video Processing, Financial Modeling, Electronic Design Automation
• Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
• Seamless integration with S3
- Can “read S3” as a file system (through FSx)
- Can write the output of the computations back to S3 (through FSx)
• Can be used from on-premise servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

AWS Storage Gateway

• Bridge between on-premises data and cloud data in S3

A

• Use cases: disaster recovery, backup & restore, tiered storage

  • 3 types of Storage Gateway:
  • File Gateway
  • Volume Gateway
  • Tape Gateway

• Exam Tip: You need to know the differences between all 3!

  • File access / NFS – user auth with Active Directory => File Gateway (backed by S3)
  • Volumes / Block Storage / iSCSI => Volume gateway (backed by S3 with EBS snapshots)
  • VTL Tape solution / Backup with iSCSI = > Tape Gateway (backed by S3 and Glacier)
  • No on-premises virtualisation => Hardware Appliance
17
Q

File Gateway

• Configured S3 buckets are accessible using the NFS and SMB protocol

A
  • Supports S3 standard, S3 IA, S3 One Zone IA
  • Bucket access using IAM roles for each File Gateway
  • Most recently used data is cached in the file gateway
  • Can be mounted on many servers
  • Integrated with Active Directory (AD) for user authentication
18
Q

Volume Gateway

• Block storage using iSCSI protocol backed by S3

A
  • Backed by EBS snapshots which can help restore on-premises volumes!
  • Cached volumes: low latency access to most recent data
  • Stored volumes: entire dataset is on premise, scheduled backups to S3
19
Q

Tape Gateway

A
  • Some companies have backup processes using physical tapes (!)
  • With Tape Gateway, companies use the same processes but, in the cloud
  • Virtual Tape Library (VTL) backed by Amazon S3 and Glacier
  • Back up data using existing tape-based processes (and iSCSI interface)
  • Works with leading backup software vendors