Module 5 - Storage & Transfer Flashcards

1
Q

What are the 3 kinds of storage? What AWS services handle each?

A

Block (EBS, instance store)
File (AWS EFS, FSx)
Object (S3, Glacier)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some ways to migrate data online?

A
AWS Storage Gateway
Kinesis (Firehose and Streams)
DataSync
S3 Transfer Acceleration
AWS Direct Connect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some ways to migrate data offline?

A

AWS Snow family

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AWS S3? How do you change data?

A

Simple Storage Service. It is object-level storage, meaning if you want to change a part of a file, you must make the change and then reupload the entire modified file. Max 5TB per file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you access S3?

A

Through the web-based AWS Management Console,

or programmatically through the API and SDKs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an S3 object made up of?

A
  • key (full path to file including folders),
  • the file itself (value),
  • version ID (if enabled)
  • any metadata that describes the file (key/value pairs),
  • tags (Unicode key/value pair, for security/lifecycle)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Identify the parts of this S3 URL

http://doc.s3-us-west-1.amazonaws.com/2006-03-01/AmazonS3.html

A

“doc” is the NAME of the bucket
“2006-03-01/AmazonS3.html” is the KEY
(“2006-03-01/” in the object key is called the PREFIX)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do buckets work?

How many buckets can an account have?

A

• Objects are stored in buckets.
• 1 AWS account can have 1-100 buckets.
• You can choose one REGION and control access.
You can access bucket logs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you control access to a bucket? What is the default access to a bucket?

A

Everything is PRIVATE by default. The account that created the resource can grant permissions by writing access policies (CONTROLLED ACCESS).
You can also make it PUBLIC if necessary (uncommon).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you make sure your buckets are never exposed to public access?

A

Turn on “block all public access” at the account level. These settings apply account-wide for all current and future buckets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the ways that access is granted to a bucket?

A

ACLs and bucket policies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you recover objects from accidental deletion or overwrite?

A

1) Enable versioning on your bucket.

2) S3 Object Lock - uses the write once, read many (WORM) model.
• Use retention periods for locking an object for a fixed period of time
• use Legal Hold for a lock until explicitly removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different S3 storage classes? What are they used for?

A
  • S3 Standard - general-purpose storage of frequently accessed data
  • S3 Intelligent-Tiering for data with unknown or changing access patterns (uses ML to determine your needs)
  • S3 Standard-Infrequent Access (S3 Standard-IA) and S3 One Zone-Infrequent Access (S3 One Zone-IA which is cheaper still) for long-lived, but less frequently accessed data. Cheap if you access less than once a month. 50% less than standard. There is a retrieval fee.
  • S3 Glacier and S3 Glacier Deep Archive for long-term archive and digital preservation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an archive?

A

Any object stored in a vault in S3 Glacier. It has a unique ID and optional description. When you store it, Glacier returns a regionally unique archive ID.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do you use to manage S3 Glacier vaults?

A

Via the management console to create and delete.

For everything else, use the CLI, REST API, or SDKs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a vault?

A

A container for storing archives.
You specify its name and region.
You can lock it with Vault Lock. (for compliance; data and lock can’t be removed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the retrieval times for S3 Glacier?

A

Instant retrieval - milliseconds

Flexible Retrieval:
•Expedited - 1-5 minutes
• Standard - 3-5 hours
• Bulk - 5-12 hours

Deep Archive:
• standard - 12 hours
• bulk - 48 hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an S3 Lifecycle Policy in Lifecycle Management?

A

An automated system to move (transition) or delete (expire) your data based on age. (Saves you money on storage).

You can set rules per object or per bucket.

Works with versioning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are 3 ways to encrypt data at rest in S3?

A

1) SSE-S3: Server-side encryption (SSE) with Amazon S3-managed keys. AWS handles the key, uses AES-256 algorithm. How? Put in the header: “x-amz-server-side-encryption”:”AES256”
2) SSE-KMS: Server-side encryption with AWS KMS keys (KMS keys) stored in AWS KMS. Envelope encryption, you and AWS manage the keys. Why? You can control who has access and you get an audit trail. Put in the header: “x-amz-server-side-encryption”:”KMS”
3) SSE-C: Server-side encryption with customer-managed keys. You manage the keys. Must use HTTPS. CLI only. Must include key in header because it’s discarded every time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does S3 handle replication?

A

All data is replicated in at least 3 AZs (except for S3 One Zone-IA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is S3 Transfer Acceleration? When would you choose this?

A

It’s a way to move data faster over long distances. It uses CloudFront global edge locations with a distinct URL (…s3-accelerate…). Once it’s uploaded, it is automatically routed to S3 using an optimized network path (AWS backbone network).

You only get charged if there is a performance improvement. Enabled at the bucket level.

Good for when :
• you have customers worldwide using the same bucket
• you transfer giga- or terabytes of data worldwide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

⭐️ What is S3 multipart upload? When would you want to do this?

A

This is a way to break large objects up into manageable parts. Once they are uploaded, S3 reassembles the object. You can’t do this with the console. Recommended for files > 100 MB. Required if > 5 GB.

Use for:
• Improved throughput: You can upload parts in parallel to improve throughput.
• Quick recovery from any network issues: Smaller part sizes minimize the impact of restarting a failed upload due to a network error.
• Pausing and resuming object uploads: You can upload object parts over time. When you have initiated a multipart upload, there is no expiration. You must explicitly complete or cancel the multipart upload.
• Beginning an upload before you know the final object size: You can upload an object as you are creating it.
• Uploading large objects: Using the multipart upload API, you can upload large objects, up to 5 TB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are S3 Access Points?

A

Named network endpoints that you can use to perform S3 object operations, such as GetObject and PutObject.

They each have their own policy for permissions and network controls.

These only work for objects, not S3 operations like modifying buckets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Object Lambda?

A

You add your own code to process data from a GET request Access Point.

E.g. convert data format (XML to JSON), resize images, augment data with another service, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the costs for S3? What are the factors that change pricing?

A

STARRL
• Storage pricing
• Request and data retrieval pricing (only transfer OUT to other regions or the internet; only PUT, COPY, POST, LIST, GET requests)
• Data transfer and Amazon S3 Transfer Acceleration pricing
• Data management and analytics pricing
• S3 Replication pricing
• Processing with S3 Object Lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When is S3 a good choice?

A

Use when:
• Need to write once, and read many times.
• Have a large number of users.
• Have growing data sets.
• Have spiky access to data (in this case, use S3 Standard and S3 Intelligent-Tiering.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What storage should I use when multiple instances need to use the same file storage?

A

You could use EBS, but only up to 16 attachments.
You could use S3 but because it is object-store you don’t have the high-performance and read/write capacity of file storage systems so it’s not ideal.

If you need high throughput changes to files of different sizes, EFS is best.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is AWS EFS?

A

Elastic File System. It’s a managed Network File System, meaning that lots of instances in different AZs can connect to it.

No need to provision for capacity, it scales automatically. Pay per use.

Highly durable, scalable, and expensive. When a client makes a request, EFS routes to the “mount target” in the AZ closest to the client.

Only for Linux-based AMIs (POSIX). Great for CMS, WordPress, content sharing, web serving

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is FSx?

A

FSx for Windows and FSx for Lustre (Linux) are managed services that handle 3rd party file systems for you. Integrates with S3. Links long-term storage with high-performance file systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the options for data migration in the Snow family?

A

For all: Transfer data and ship back to AWS, stores data in your bucket. (For uploading OR downloading)

If it takes longer than a week over the network, use snow!
100TB over 1gbps = 12 days

  • Snow Cone - small, portable data storage device 8Tb
  • SnowBall Edge - 80TB device (storage or compute optimized). ❗️Has computing options, can pre-process data.
  • Snowmobile - shipping container on a semi. 100 PB.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How does DataSync work?

A

Deploy a software agent on-prem through a virtual instance. Transfer data over WAN to AWS using TLS. It’s a service so there is nothing to maintain.

32
Q

What is the Storage Gateway service?

A

A set of hybrid cloud storage services that provide on-premises access to virtually unlimited cloud storage.

An on-prem File Gateway connects to the Storage Gateway service in the cloud. Looks like the File Gateway caches data for lower latency.

Or use an Appliance Gateway on-prem. Supports:
files, volumes, tapes.

NFS or SMB for files,
iSCSI for volumes,
iSCSI VTL for tapes.

33
Q

What should I use if I have on-prem applications that need to access an S3 bucket in the cloud?

A

Storage Gateway service.

34
Q

You have 2 Linux applications in different AZs that must share the same file system. What do you use?

A

Amazon EFS.

35
Q

What protocols are supported in an AWS Storage Gateway Appliance?

A

iSCI
SMB
NFS

36
Q

What size can your S3 object be?

A

0 bytes –> 5TB

37
Q

What must you consider when naming a bucket?

A

It uses a universal namespace. They must have unique names, like a domain name.

⛔️ No Caps, no underscore, no IPs
3-63 chars
must start with lowercase or number

38
Q

How does S3 logging work?

A

You can optionally turn on “per request” logging; files are saved in a different bucket.

39
Q

What is the difference between ACL and bucket policies?

A

ACL: Legacy feature but still in use, simple way of granting access.

Bucket policy: defines more complex rule access. JSON.

40
Q

How is encryption handled in transit? At rest?

A

In transit: SSL/TLS
In transit: client-side encryption. You encrypt before uploading. You can use a library like S3 Encryption Client.

At rest: SSE. There are 3 options here.

41
Q

What are CRR and SRR?

A

Cross-Region Replication: You enable this and choose another region. Any object uploaded will automatically be replicated.

❗️You MUST have versioning turned on for both source and destination buckets.

SRR = Same Region Replication. Same thing, just in the same region.

Both can also replicate to another AWS account.
Asynchronous.
When you enable, only NEW objects will be replicated. (To do all, use S3 BATCH replication)
You can’t “chain” replication from bucket #1 to #2 then #2 to #3.

42
Q

How do you turn off versioning for an S3 object?

A

You can’t on an object. Once enabled it cannot be disabled. You can only suspend versioning on the bucket.

43
Q

I have a web app and I want the user to be able to download an object that is in a private S3 bucket from a password-protected part of the app. How do I do this?

A

Generate a Presigned URL. It grants temporary access to an object (up or download) that expires in a number of seconds. (default 1 hour = 3600 secs)

44
Q

How do you keep people from accidentally deleting objects?

A

Enable MFA delete, where a user has to provide an MFA code before deleting or changing the versioning state of a bucket.

❗️Versioning must be turned on. Must use CLI to enable it.

Only the bucket owner logged in as root can delete.

Header in request: x-amz-mfa

45
Q

When you upload a new version of an object, does the new version inherit the properties of the previous versions? E.g. public/private settings.

A

No.

46
Q

What is the CLI command to copy a file from an S3 bucket to another location?

A

aws s3 cp s3://bucketName/folder/file.fileType destination

47
Q

❗️What features can you configure on EFS?

A

Performance:
• EFS Scale Mode (auto scales to Petabyte scale)
• Performance Mode: Max I/O or general
• Throughput Mode: Bursting or provisioned

Storage:
• Standard (frequently accessed)
• EFS-IA (infrequently accessed) cost per retrieval but cheaper to store

Multi-AZ or 1-Zone

48
Q

When can a user access an S3 bucket?

A
When the IAM user-based policy allows it
OR
the resource-based policy (bucket, ACL) allows it
AND
there is no explicit deny.
49
Q

What is an S3 website?

A

S3 can host static websites; index.html.

You must make access public.

50
Q

What is CORS?

A

Cross-Origin Resource Sharing. Trying to get resources from a different origin.

Web browser-based security that only allows you to get resources from a different origin if the second origin allows it.

Second origin sends headers telling the first origin what they are allowed to do.

51
Q

How does S3 CORS work?

A

If website A needs to get a resource from website B, then B needs to enable CORS in the response headers, otherwise, the request will be blocked by the browser.

52
Q

What is the S3 consistency model?

A

All operations are strongly consistent. All changes are immediately available.

53
Q

Can you put a bucket inside another bucket?

A

No.

54
Q

Where does S3 live?

A

NOT in a VPC, it lives in the public space.

55
Q

What are the ways you can connect to S3?

A

Browser - to the public endpoint
Programmatically via REST API

EC2 can connect from within a VPC through the Internet Gateway and public internet.

❗️EC2 can connect from within a VPC through a PRIVATE connection with the S3 GATEWAY ENDPOINT.

56
Q

What are the minimum storage durations for S3 storage classes? How long do you have to keep data before moving it?

A

Standard - N/A
Intelligent-Tiering, Standard IA, 1Zone IA - 30 days
Glacier - 90 days
Deep Archive - 180 days

57
Q

What is MFA-Protected API Access?

A

When using the API or CLI, enforces MFA when accessing AWS resources (not just S3).

In a bucket policy, the condition will look like this:
“Condition”:{“Null”: {“was:MultiFactorAuthAge”: true}}

This denies any API operation that does not use MFA.

58
Q

What happens to new and existing objects when I enable default encryption on a bucket?

A

New objects: If unencrypted then it will encrypt. If encrypted, then nothing happens. (It won’t re-encrypt.)

Existing objects: Nothing. Only NEW objects will be encrypted.

59
Q

What is an S3 Event Notification?

A

Sends notifications when events happen in buckets to:

SNS, SQS, Lambda & EventBridge.

60
Q

What is S3 Select (or Glacier Select)?

A

A way to use simple SQL to access objects (or objects within objects like a zip file) on S3. You can filter out the data you don’t need on the server, which costs less in transfer and client-side CPU.

❗️Query based on the bucket’s name and object’s KEY

Need Lambda?

61
Q

We need our architects to have programmatic and console access ACROSS AWS ACCOUNTS.

A

Configure cross-account access using IAM roles.

62
Q

What gets evaluated first, bucket policies or default encryption?

A

Bucket policies. Default encryption works like your backup in case you forgot to encrypt.

63
Q

What are S3 access logs?

A

Logs for ALL requests to S3, then you can use these for analysis. The logs are saved in a different bucket.

64
Q

⭐️ I don’t know when I should move objects in S3 to a different tier. What tool can help me?

A

S3 Analytics - Storage Class Analysis (not for 1Zone or Glacier).

• Daily report

65
Q

What is a way to speed up the performance of a download from S3?

A

Byte-Range Fetch. Divides up the file and fetches specific ranges in parallel. Also good if you only need one part of the file.

66
Q

What is S3 Requester Pays?

A

A setting where the requester who downloads from S3 pays the networking cost for the download (not storage). The requester must be authenticated in AWS.

67
Q

Something something analyze data in S3 using serverless SQL…

A

Athena.

68
Q

How do you get data from Snowball to Glacier?

A

You can’t do it directly.

Snowball to S3, then lifecycle policy to Glacier.

69
Q

When would you use FSx for Lustre?

A

Lustre = “Linux cluster”
❗️ML, HPC (video, financial modeling, etc)

Seamless integration with S3
Can be used on-prem with VPN or DirectConnect

70
Q

What are the 2 deployment options for FSx?

A
Scratch file system
     • temp storage
     • not replicated
     • super speedy
     • FOR - short-term processing, save $

Persistent file system
• long-term storage
• replicated in same AZ (failures replaced in minutes)
• FOR - long-term processing, sensitive data

71
Q

What types of storage gateway are there?

A

File - NFS (Network File System), SMB
• S3, IA
• recently used data is cached in the gateway
• ❗️integrated with Active Directory for user authentication

Volume - iSCSI
• cached volume gateway
• stored volume gateway - all data is on-prem with scheduled backups to S3.
• S3 to EBS snapshots

Tape iSCSI VTL
• Virtual Tape Library (VTL) backed by S3/Glacier

Connects to the cloud: EBS, S3, Glacier

72
Q

What if I don’t have the virtual servers to run the different gateways on-prem?

A

You can use Storage Gateway Hardware Appliance.

Will have all you need to run the gateway for you.
Good for daily NFS backups where you don’t have virtualization available.

73
Q

What is the FSx File Gateway?

A

Native access to FSx for Windows file server.
Lives on-prem, connects to FSx in the cloud.

  • ❗️Has a cache for frequently accessed data
  • windows native (Active Directory, SMB, NTFS, etc.)
74
Q

I want to transfer data in and out of S3 or EFS, but I want to use FTP instead of APIs. What are my options?

A

AWS Transfer Services
You can store user credentials or integrate with 3rd party (LDAP, Cognito, AD)

3 Flavors:
FTP (only within VPC)
SFTP
FTPS

Service assumes IAM role to access S3/EFS

75
Q

Data in an S3 bucket has a lifecycle policy that moves older data to Glacier every month. You should be able to retrieve required data in under 15 minutes and should handle up to 150 MB/s of retrieval throughput.

Which of the following should you do to meet the above requirement? (Select TWO.)

A

Expedited retrievals allow you to quickly access your data when occasional urgent requests for a subset of archives are required.

Provisioned capacity ensures that your retrieval capacity for expedited retrievals is available when you need it.

76
Q

A company plans to use a durable storage service to store on-premises database backups to the AWS cloud. To move their backup data, they need to use a service that can store and retrieve objects through standard file storage protocols for quick recovery.

Which of the following options will meet this requirement?

A

File Gateway presents a file-based interface to Amazon S3, which appears as a network file share. It enables you to store and retrieve Amazon S3 objects through standard file storage protocols. File Gateway allows your existing file-based applications or devices to use secure and durable cloud storage without needing to be modified. With File Gateway, your configured S3 buckets will be available as Network File System (NFS) mount points or Server Message Block (SMB) file shares.