S3 Flashcards

1
Q

S3

A

infinitely scaling storage

a system, a service that allows us to store objects = files into buckets or directories

not a global service!!! regional!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

each buckets must have

A

a globally unique name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

buckets and regions

A

The buckets are defined at the region level, so even though S3 is a global service, buckets are regional resource

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

naming convention

A
  1. no upper case,
  2. no underscore
  3. three to 63 characters long,
  4. it should not be an IP,
  5. it must start with a lowercase letter or a number.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

objects and keys

A

in these S3 buckets, we need to create objects.

And objects are files and they must have a key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

key

A

is the full path to that file.

s3://my-bucket/my_file.txt

So if we have a bucket named my-bucket and an object named my_file.txt, then the key is my_file.txt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

key: if we have folder structures within our S3 buckets

A

s3://my-bucket/my_folder/another_folder/my_file.txt

then the key is the full path

my_folder/another_folder/my_file.txt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the key can be decomposed in two things

A

key prefix and the object name.

s3: //my-bucket/my_folder/another_folder/my_file.txt
prefix: my_folder/another_folder/

object name: my_file.txt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

even though there’s no concepts of directories within buckets,

A

just very, very long key names

the exam will try to trick you into thinking otherwise because we could create quote unquote directories within S3.

But in fact they are just keys with very long names that contains slashes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

object values

A

are the content of the body.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

maximum object size on Amazon S3

A

five terabytes = 5,000 gigabyte

but you cannot upload more than five gigabytes at a time. So that means that if you want to upload a big object of five terabytes you must divide that object into parts of less than five gigabytes and upload these parts independently into what’s called a multi-part upload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

object metadata

A

list of text key / value pairs that could be system or user metadata.

To add info to your objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

object tags

A

Unicode key/value pair - up to 10

useful for security on your objects or lifecycle policies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Version

A

Objects have a version id if versioning is enabled

it has to be enabled at the bucket level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

if you re upload a file version with the same key,

A

it won’t overwrite it, actually it will create a new version of that file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

it is best practice to version your buckets

A

in order to be able to have all the file versions for a while, because you can get protected against unintended deletes because you’re able to restore a previous version.

And also, you can easily rollback to any previous versions you wanted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Any file that is not versioned prior to enabling versioning

A

will have the version null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

if you suspend versioning in your bucket,

A

it does not delete the previous versions, it will just make sure that the future files do not have a version assigned to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

S3 encryption for objects - 4 methods

A
  1. SSE-S3
  2. SSE-KMS
  3. SSE-C
  4. Client Side Encryption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

SSE-S3

A

server side encryption AES-256

encrypts S3 objects using keys handled and managed by AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

SSE-KMS

A

server side encryption

encryption using keys handled and managed by AWS KMS

when you have your encryption keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

SSE-C

A

when you want to manage your own encryption keys

server side encryption using data keys fully managed by the customer outside of AWS

S3 does not store the key you provide. It will be discarded after usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

SSE-S3 how

A

when uploading a file you can use HTTP or HTTPS, and in header you set

x-amz-server-side-encryption : AES-256

And AWS will know that is has to apply its own managed data key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

SSE-KMS how

A

when uploading a file you can use HTTP or HTTPS, and in header you set

x-amz-server-side-encryption : aws:kms

And AWS will know that is has to apply the KMS customer master key you have defined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

why would you choose SSE-KMS over SSE-S3

A

control over who has access to what keys and gives you an audit trail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

SSE-C how

A

you have to use HTTPS because you are sending a secret to AWS so you must have encryption in transit

encryption key must be provided in HTTP headers, for every HTTP request made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Client Side Encryption

A

you enrypt the file before uploading to S3

some client libraries can help you do this, for example. Amazon S3 Encryption Client

you are solely responsible for decrypting the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Amazon S3 as HTTP service

A

it exposes HTTP endpoint that is not encrypted and HTTPS endpoint that provides encryption in flight (SSL/TLS certificates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

S3 Security

A
  1. User based
  2. Resource Based

Other:

  1. Networking
  2. Logging and Audit
  3. User Security
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

S3 Security - User based

A

IAM users have IAM policies, and they authorize which API calls should be allowed

If our user is authorized through IAM policy how to access our Amazon S3 bucket, then it’s going to be able to do it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

S3 Security - Resource based

A
  1. S3 bucket policies
  2. Object Access Control List - we set at the object level the access rule.
  3. Bucket Access Control List - less common
  4. Bucket settings for Block Public Access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

S3 bucket policies

A

= bucket-wide rules that we can set in the S3 console

They will say what principals can and cannot do on our S3 bucket.

And this enables us to do cross account access to our S3 buckets.

JSON-based policies

can be applied to your buckets and objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

An IAM principal, so it can be a user, a role,

can access an S3 object if

A

the IAM permissions allow it, so that means that you have an IAM policy attached to that principal that allows access to your S3 bucket,

OR

if the resource policy, so usually your S3 bucket policy, allows it.

AND

you need to make sure there is no explicit deny.

So if your user through IAM is allowed to access your S3 bucket but your bucket policy is explicitly denying
your user to access it, then you will not be able to access it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

common use cases for S3 bucket policies

A
  1. to grant public access to a bucket
  2. to force objects to be encrypted at the upload time,
  3. to grant access to another account using cross account S3 bucket policies.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Bucket settings for Block Public Access

A

block public access to your S3 bucket

if you know that your buckets should never, ever be public, leave these on.

historically were created to prevent company data leaks because there were a lot of leaks of Amazon S3 bucket in the news and Amazon S3 came up with this way of making sure that any server could say,
hey, none of my buckets are public, and that was very popular.

there’s a way to set these at the account level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

S3 Security through Networking

A

you can access S3 privately through VPC endpoints.

if you have EC2 instances in your VPC without internet access, then they can access S3 privately through what’s called a VPC endpoint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

S3 Security: Logging and Audit

A

For logging audit, you can use S3 access logs and they can be stored in the other S3 buckets.

API calls can also be logged into CloudTrail, which is a service to log API calls in your accounts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

S3 Security: User Security

A
  1. MFA Delete

2. Pre-Signed URLs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

MFA Delete

A

multifactor authentication

if you want to delete a specific version objects in your buckets, then you can enable MFA Delete and we will need to be authenticated with MFA to be able to delete the objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Pre-Signed URLs

A

a URL that’s signed with some credentials from AWS
and it’s valid only for a limited time.

use case for it, for example, is to download a premium video from a service if the user is logged in and has purchased that video.

At the exam you see the access of certain files to certain users for a limited amount of time, think pre-signed URLs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

S3 Websites

A

S3 can host static websites and have them accessible on WWW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

S3 Websites URL

A

.s3-website-.amazonaws.com

if you get 403 Forbidden error - make sure the bucket policy allows public reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

CORS

A

Cross-Origin Resource Sharing

we want to get resources from a different origin.

Web Browsers have this security in place CORS basically saying as soon as you visit a website, you can make request to other origins only if the other origins allow you to make these request.

browser-based security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

origin

A

is a scheme (a protocol), a host (domain), and a port.

if you do https://www.example.com, this is an origin where the scheme is HTTPS, the host is www.example.com, and the port is port 443.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

same origin

A

you go on example.com/app1

or example.com/app2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

cross-origin request

A

if you visit, for example, www.example.com and then you’re asking your web browser to make a request to other.example.com, this is what’s called a cross-origin request and your web browser will block it unless you have the correct CORS headers Access-Control-Allow-Origin

47
Q

CORS example

A

our web browser visits our first web server. it’s called the origin. So for example, our web server is at https://www.example.com

And there is a second web server called a cross-origin
because it has a different url, which is https://www.other.com.

So a web browser visits our first origin and it’s going to be asked from the files that are uploaded from the origin to make a request to the cross-origin.

So the web browser will do what is called a preflight request. And this preflight request is going to ask
the cross-origin if it is allowed to do a request on it.

So it’s going to say, “Hey cross-origin, the website https://www.example.com is sending me to you, can I make a request onto your website?”

and the origin is saying, “yes, here is what you can do.”

so the Access-Control-Allow-Origin is saying: the methods that are authorized is GET, PUT, and DELETE.

So we can get a file, delete a file, or update the file.

48
Q

S3 CORS

A

if a client does a cross-origin request on our S3 bucket enabled as a website, we can allow for a specific origin by specifying the entire origin name, or as star * for all origins.

the CORS headers have to be defined on the cross-origin bucket, not the first origin bucket

49
Q

S3 CORS example

A

The web browser for example, is getting HTML files
from our bucket. And our bucket is enabled as a website. But there is second bucket that is going to be
our cross-origin bucket, also enabled as a website,
that contains some files that we want.

we’re going to do GET index.html and the website will say, okay here is your index.html and that file is going to say you need to perform a GET for another file on the other origin. And if the other bucket is configured
with the right CORS headers, then web browser will be able to make the request, if not it will not be able to make that request

50
Q

Amazon S3 Consistency Model

A

Amazon S3 is made up of multiple servers and so when you write to Amazon S3, the other servers are going to replicate data between each other.

And this is what leads to different consistency issues.

  1. Read after Write consistency for PUTs of new objects
  2. Eventual Consistency for DELETEs and PUTs of existing objects
51
Q

Read after Write consistency for PUTs of new objects

A

as soon as you upload a new object, once you get a correct response from Amazon S3, then you can do a GET of that object and get it.

This is true except if you do a GET before doing the PUT to check if the object existed. So if you do a GET, and you get a 404 for not existing. Then you do a PUTS, then there is a chance that you would get a 404 still even though the object was already uploaded
and this is what’s called eventually consistent.

52
Q

Eventual Consistency for DELETEs and PUTs of existing objects

A

if you read an object right after updating it, you may get the older version of that object. So if you do a PUT on an existing object so PUT 200, then you do another PUT 200, and then you do a GET, then the GET might return the older version if you are very, very quick.

And if you delete an object, you might still be able
to retrieve it for a very short time.

And the way that you get the newer version is just to wait a little bit.

53
Q

strong consistency

A

There is no way to request strong consistency in Amazon S3, you only get eventual consistency
and there’s no API to get strong consistency. So that means that if you overwrite an object, you need to wait a little bit before you are certain that the GET returns the newest version of your object.

54
Q

You’re trying to upload a 25 GB file on S3 and it’s not working

A

Multi Part Upload is also recommended as soon as the file is over 100MB

55
Q

Your client wants to make sure the encryption is happening in S3, but wants to fully manage the encryption keys and never store them in AWS. You recommend

A

SSE-C

Here you have full control over the encryption keys, and let AWS do the encryption

56
Q

Your company wants data to be encrypted in S3, and maintain control of the rotation policy for the encryption keys, but not know the encryption keys values. You recommend

A

With SSE-KMS you let AWS manage the encryption keys but you have full control of the key rotation policy

57
Q

Your company does not trust S3 for encryption and wants it to happen on the application. You recommend

A

With Client Side Encryption you perform the encryption yourself and send the encrypted data to AWS directly. AWS does not know your encryption keys and cannot decrypt your data.

58
Q

The bucket policy allows our users to read/write files in the bucket, yet we were not able to perform a PutObject API call.

A

the IAM user must have an explicit DENY in the attached IAM policy

Explicit DENY in an IAM policy will take precedence over a bucket policy permission

59
Q

You have a website that loads files from another S3 bucket. When you try the URL of the files directly in your Chrome browser it works, but when the website you’re visiting tries to load these files it doesn’t. What’s the problem?

A

CORS is wrong
Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. To learn more about CORS, go here: https://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html

60
Q

to use MFA-Delete

A

we have to first enable versioning on the S3 bucket

61
Q

when do we need MFA

A

to permanently delete an object version

suspend versioning on the bucket.

62
Q

you don’t need MFA to

A

enable versioning

list deleted versions

63
Q

who can enable and disable MFA-Delete

A

only by the bucket owner, which is the root account

So even if you have an administrator accounts, you cannot enable MFA-Delete.

you have to use MFA-Delete only using the CLI for now.

64
Q

S3 Access Logs

A

for audit purposes, you may want to log all the access into your S3 buckets. Any request that is done to Amazon S3 from any account, authorized or denied,
you want it to be logged into another S3 bucket so you can analyze it later.

Once we’ve enabled the S3 Access Logs, will log all the requests into the logging bucket.

65
Q

S3 Access Logs - When setting up logging bucket

A

never, ever set your logging bucket to be the bucket you are monitoring.

Otherwise, if you set the logging bucket and the monitoring bucket to be exactly the same, then it will create a logging loop and your bucket will grow in size exponentially.

66
Q

Amazon S3 replication

A

we have an S3 bucket and we want to replicate it into another bucket.

To do so we first must enable versioning in the source and destination buckets. And then we can set up two different things:

  1. CRR: cross-region replication if the two buckets
    are in different regions
  2. or SRR = same region replication, if the two buckets are in the same regions.

buckets can be in different accounts!!!

The copying happens asynchronously, but it is very, very quick, and for the copying to happen, you need to create an IAM role, which will have the permissions from the first S3 bucket to copy through the second S3 buckets.

67
Q

use cases for cross-region replication

A
  1. for compliance
  2. lower latency access of your data into other regions,
  3. to do cross accounts replication
68
Q

use cases for same-region replication

A
  1. log aggregation: you have different logging buckets and you wanna centralize them into one bucket
  2. live replication between a production and your test accounts
69
Q

After you activate S3 replication,

A

only the new objects are replicated. So it’s not retroactive, it will not copy your existing states of your S3 buckets.

70
Q

Replication: if you delete without a version ID,

A

it will add a delete marker, and it’s not going to be replicated. And you can deletes with a version ID, again you will delete in the source and it’s not replicated.

deletes operation is not replicated.

71
Q

chaining of replication

A

there is no chaining of replication. That means that if bucket one has replication to bucket two, which has replication into bucket three, then any objects created in bucket one, will be in bucket two, but will not be replicated to bucket three,

72
Q

we can generate pre-signed URL using

A

either the SDK or the CLI
but for uploads, it’s a bit harder and you must use the SDK to use them.

aws s3 presign s3://mybucket/myobject –region my-region –expires-in 300

73
Q

when you generate a pre-signed URL, by default it will have an expiration

A

one hour

you can change that timeout using an expires-in parameter

you specify the time in seconds.

74
Q

why would you generate a pre-signed URL for another user

A
  1. you want to allow only logged-in users to download a premium video on your S3 bucket and you only wanna give a downloading for maybe 15 minutes
    to a premium user that is logged in.
  2. you have an ever changing list of users that need to download files, and so, you don’t wanna give them access directly to your bucket because it could be very dangerous or it’s not maintainable because there’s so many new users all the time. So maybe you want to generate URLs dynamically and give them the URLs over time, all by pre signing them all of them.
  3. maybe you want to allow temporarily a user to upload a file to a precise location in our buckets. You want to allow a user to upload a profile picture directly onto our S3 bucket.
75
Q

if you are getting issues creating pre-signed URL,

A

set the proper signature version in order not to get issues when generating URLs for encrypted files

aws configure set default.s3.signature_version s3v4

76
Q

what is the durability of S3 storage classes

A

99.99999999% for all of them

very high durability. So if you store 10 million objects,
you can on average expect to incur a loss of a single object, once every 10,000 years.

77
Q

what is the availability of S3 storage classes

A

99.9% for all except S3 One Zone-IA (99.5%)

78
Q

what is the availability SLA of S3 storage classes

A

99.9% for S3 Standard, Glacier and Glacier Deep Archive
99% for others

Amazon would guarantee you to reimburse you.

79
Q

what is the number of Availability Zones for all the storage classes

A

3 at least

except for S3 One Zone-IA (1)

80
Q

Amazon S3 Standard storage class

A

for general purpose

it can sustain two concurrent facility failures. So, it’s really resistant to AZ disasters.

81
Q

S3 Infrequent Access or S3 IA

A

suitable for data that is less frequently accessed
but requires a rapid access when needed.

it has a little less availability, and it is lower cost compared to Amazon S3 Standard.

The idea is that if you access your object less you won’t need to pay as much.

It can sustain two concurrent facility failures.

82
Q

S3 One Zone IA

A

this is the same as S3 IA,

but now the data is stored in a single availability zone

we have the same durability within the single AZ, but if that AZ is somewhat destroyed, then you would lose your data.

You have less availability 99.5% availability, and you have still the low latency and high throughput performance you would expect from S3.

It’s lower cost by 20% compared to infrequent access, it supports SSL for all the encryption.

83
Q

S3 Intelligent Tiering

A

is going to move data between your storage classes, intelligently

it has the same low latency and high throughput as S3 Standard, but there’s a small monthly monitoring fee
and auto-tiering fee.

it will automatically move objects between the access tiers based on the access patterns. So it will move objects between S3 General Purpose, S3 IA. And so it will choose for you if your object is less frequently accessed or not. And you’re going to pay a fee for an S3 to do that little monitoring. So the durability is the same.

it can resist an entire event that impacts an availability zone.

84
Q

Amazon Glacier

A

for archives
low cost object storage meant really for archivings and backups.

the data needs to be retained for a very long time, tens of years

It’s a big alternative to on-premise magnetic type storage where you would store data on magnetic tapes and put these tapes away. And so if you wanted to retrieve the data from these tapes you would have to find the tape manually put it somewhere and then restore the data from it.

Amazon Glacier is really to retrieve files and not have some kind of urgency around it.

the minimum storage duration for Glacier is going to be 90 days.

85
Q

Amazon Glacier Deep Archives

A

for the archives you don’t need right away

super long-term storage and even cheaper. But this time the retrieval options are Standard, 12 hours. So you can not retrieve a file in less than 12 hours. And Bulk, if you have multiple files and you can wait up to 48 hours. it’s gonna be even cheaper.

the minimum storage duration is 180 days

86
Q

use case for Amazon S3 Standard storage class

A

big data analytics, mobile and gaming applications, content distribution.

87
Q

use case for S3 Infrequent Access or S3 IA storage class

A

a data store for disaster recovery, backups, or any files that you expect to access way less frequently.

88
Q

use case for One Zone IA storage class

A

to store secondary backup copies of on-premise data
or storing any type of data we can recreate.

for example, we can recreate thumbnails from an image so we can store the image on S3 General Purpose and we can store the thumbnail
on S3 One Zone Infrequent Access. And if we need to recreate that thumbnail over time we can easily do that from the main image.

89
Q

each item in Glacier is not

A

called an object. It’s called an Archive. And each Archive can be a file up to 40 terabytes. And Archives are stored, not in buckets they’re stored in vaults.

90
Q

Amazon Glacier retrieval options

A
  1. Expedited, which is one to five minutes. you request your file and between one to five minutes you will get it back. More expensive
  2. Standard, which is three to five hours.
  3. Bulk when you require multiple files retrieval at the same time, which takes between five to 12 hours to give you back your files.
91
Q

Minimum capacity charge per object for storage classes

A

Standard and Intelligent Tiering - none

Standard IA and One Zone IA - 128 KB

Glacier Archives - 40KB

92
Q

storage classes fees

A

there is retrieval fee for all of them except Standard and Intelligent Tiering

there is minimum storage duration charge for all of them except Standard

Storage cost from the most expensive to the lease:

Standard 
Intelligent Tiering 
Standard IA
One Zone IA
Glacier
Deep Archive
93
Q

you can transition objects between storage classes

A

there is a giant graph on the AWS website that describes which transitions are allowed

moving all these objects around all these classes can be done manually, but it can also be done automatically using something called a lifecycle configuration.

94
Q

Lifecycle Rules

A

You can define Transition actions and Expiration actions

rules can be applied for a specific prefix. So if you have all your MP3 files within the MP3 quote-unquote folder or prefix, then you can set a life-cycle rule just for that specific prefix.

you can also have rules created for a certain object tags. If you want to have a rule that applies just to the
objects that are tagged “Department: Finance”, then you can do so.

95
Q

Expiration actions

A

to delete an object after some time. So for example, your access log files, maybe you don’t need them after another year. So after a year, you will say, “Hey, all my files are over a year old. “Please delete them. Please expire them.”

And it could also be used to delete old versions of a file. So if you have versioning enabled and you keep on overriding a file, and you know you won’t need the previous versions after maybe 60 days, then you can configure an expiration action to expire objects, old versions of a file, after 60 days.

It can also be used to clean up and complete multi-part uploads. In case some parts are hanging around for 30 years and you know, they will never be completed. Then you would set up an expire action to remove these parts.

96
Q

transition actions

A

helpful when you want to transition your objects from one storage class to another.

For example, you’re saying, “Move objects to Standard IA class “60 days after creation “and then move to Glacier for archiving, six months later.”

97
Q

So your application EC2 creates images thumbnails
after profile photos are uploaded to Amazon S3.
And these thumbnails can be easily recreated
and will only need to be kept for 45 days.
The source images should be able to be immediately retrieved for these 45 days. And afterwards the user can wait up to six hours.

How would you design this solution?

A
  1. S3 source images can be on the Standard class and you can set up a lifecycle configuration to transition them to GLACIER after 45 days.

Why? Because they need to be archived afterwards
and we can wait up to six hours to retrieve them.

  1. And then for the thumbnails, they can be ONEZONE_IA.

Why? Because we can recreate them. And we can also set up a lifecycle configuration to expire them or delete them after 45 days.

  1. Let’s move the source image to GLACIER. And the thumbnails can be on ONEZONE_IA because it’s going to be cheaper. And in case we lose an entire AZ in AWS, we can easily, from the source image, recreate all the thumbnails.
98
Q

There’s a rule in your company that states that you should be able to recover your deleted S3 objects immediately for 15 days. Although this may happen rarely, after this time and up to one year, deleted objects should be recoverable within 48 hours.

So how would you design this to make it cost effective?

A
  1. So you need to enable S3 versioning, because we want to delete files, but we want to be able to recover them. And so with S3 versioning, we’re going to have object versions and the deleted objects are going to be hidden by delete marker, and they can be easily recovered.
  2. But we’re going to have a non-current versions, basically the object’s versions from before. And we want to transition them into S3_IA because it’s very unlikely, that these old object versions are going to be accessed, but if so, then you need to make sure to recover them immediately.
  3. And then afterwards, after these 15 days of grace period, to recover these non-current versions, you can transition them into DEEP_ARCHIVE, such as for 100 and for 365 days. It can be archived and they will be recoverable within 48 hours.

Why don’t we use just Glacier? because Glacier will cost us a little bit more money because we have a timeline of 48 hours.

And so we can use all the tiers all the way up to DEEP_ARCHIVE, to reach your file and get even more savings.

99
Q

S3 Baseline Performance

A

by default Amazon S3 automatically scales to a very very high number of requests and has a very very low latency between 100 and 200 milliseconds to get the first bite out of S3.

in terms of how many requests per second you can get:

  • thirty-five hundred Put, Copy, Post, Delete per second per prefix, and
  • five-thousand and five hundred Get, Head requests per second per prefix in a bucket.

the prefix is going to be anything between the bucket and the file. There is not limits to the number of prefixes in your bucket.

100
Q

KMS limitation to S3 performance

A

if you have KMS encryption on your objects using SSE-KMS - then you may be impacted by the KMS limits.

When you upload a file, it will call S3 on your behalf
- the GenerateDataKey KMS API and when you download a file from S3 using SSE-KMS, KMS itself will call the Decrypt KMS API.

by default, KMS has a quota of number requests per second and based on the region you’re in it could be
5500 per second, 10000 per second, or 30000 per second requests. And you cannot change that quota.
if you have more than 10000 requests per second in a specific region that only supports 5500 requests per second for KMS, then you will be throttled.

These quotas are pretty big for normal usage but still good to know if you have many many files and a high usage of your S3 bucket.

101
Q

how to optimize performance

A
  1. multi-part upload

2. S3 Transfer Acceleration

102
Q

multi-part upload

A

it is recommended to use multi-part upload for files that are over one hundred megabites and must be used for files that are over five gigabytes.

It parallelizes uploads and that will help us speed up the transfers to maximize the bandwidth.

We will divide it in parts, so, smaller chunks of that files
and each of these parts will be uploaded in parallel
to Amazon S3. In Amazon S3, once all the parts have been uploaded it’s smart enough to put them together back into the big file.

103
Q

S3 Transfer Acceleration

A

only for upload, not for download.

To increase transfer speed by transferring a file to an AWS edge location which will forward then the data to the S3 bucket in the target region.

transfer acceleration is compatible with multi-part upload.

104
Q

S3 Byte-Range Fetches

A

reading a file in the most efficient way, speed up downloads

to parallelize GETs by getting specific byte ranges for your files.

also in case you have a failure or to get a specific byte range, then you can retry a smaller byte range and you have better resilience in case of failrues.

105
Q

S3 Byte-Range Fetches example

A
  1. very big file to download

you want to request the first part which is the first few bytes of the file, then the second part, and then the the N part.

So, we request all these parts as specific bytes range fetches. And all these requests can be made in parallel.

  1. The second use case is to only retrieve a partial amount of the file. For example, if you know that the first fifty bytes of the file in S3 are the header and give you some information about the file, then you can just issue the header request with byte-range request for the headers using the first, say, fifty bytes.
106
Q

S3 Transfer Acceleration example

A

We have a file in the US and we want to upload it to a S3 bucket in Australia. We will upload that file through an edge location in the US, which will be very very quick. And then from the edge location to the Amazon S3 bucket in Australia - the edge location will transfer it over the fast private AWS network.

We minimized the amount of public internet that we go through. And, we maximize the amount of private AWS network that we go through.

107
Q

S3 Select and Glacier Select

A

we want to retrieve less data by performing server side filtering using SQL statements.

you will use less network and less CPU cost client-side
because you don’t retrieve the full file. S3 will perform the filtering for you and only return to you what you need.

you are up to 400% faster and up to 80% cheaper
because you have less network traffic going through
and the filtering happens server-side

108
Q

S3 Select and Glacier Select example

A

We have the client asking to get a CSV file with S3 Select to only get a few columns and a few rows.
Amazon S3 will perform server-side filtering on that CSV file to find the right columns and the rows we want and send back the data filtered back to our client, so obviously, less network, less CPU, and faster,

109
Q

S3 Select and Glacier Select how it works

A

In the request, along with the SQL expression, you must also specify a data serialization format (JSON, CSV, or Apache Parquet) of the object.

S3 uses this format to parse object data into records, and returns only records that match the specified SQL expression.

You must also specify the data serialization format for the response.

110
Q

S3 event notifications

A

some events happen in your S3 bucket, for example, this could be a new object created, an object removed, an object that has been restored, or there is some S3 replication happening.

And you want to be able to react to all these events. You can create rules, and for these rules you can also filter by object names. For example, you want to react only to the Jpeg file. So *.jpg.

You can create as many S3 events as desired, and most of the time they will be delivered in seconds,
but sometimes it can take a minute or longer.

111
Q

S3 event notifications use case

A

to generate thumbnails of images uploaded to Amazon S3.

112
Q

what are the possible targets for S3 event notifications?

A
  1. SNS = simple notification service to send notifications and emails
  2. SQS for a simple queue service to add messages into a queue
  3. Lambda Functions to generate some custom code.
113
Q

if you want you to make sure every single event notification is delivered,

A

you need to enable versioning on your bucket.