S3 Flashcards

Question

why would you choose SSE-KMS over SSE-S3

Answer 1

control over who has access to what keys and gives you an audit trail

Answer 2

you have to use HTTPS because you are sending a secret to AWS so you must have encryption in transit encryption key must be provided in HTTP headers, for every HTTP request made

Answer 3

you enrypt the file before uploading to S3 some client libraries can help you do this, for example. Amazon S3 Encryption Client you are solely responsible for decrypting the data

Answer 4

it exposes HTTP endpoint that is not encrypted and HTTPS endpoint that provides encryption in flight (SSL/TLS certificates)

Answer 5

1. User based 2. Resource Based Other: 3. Networking 4. Logging and Audit 5. User Security

Answer 6

IAM users have IAM policies, and they authorize which API calls should be allowed If our user is authorized through IAM policy how to access our Amazon S3 bucket, then it's going to be able to do it.

Answer 7

1. S3 bucket policies 2. Object Access Control List - we set at the object level the access rule. 3. Bucket Access Control List - less common 4. Bucket settings for Block Public Access

Answer 8

= bucket-wide rules that we can set in the S3 console They will say what principals can and cannot do on our S3 bucket. And this enables us to do cross account access to our S3 buckets. JSON-based policies can be applied to your buckets and objects

Answer 9

the IAM permissions allow it, so that means that you have an IAM policy attached to that principal that allows access to your S3 bucket, OR if the resource policy, so usually your S3 bucket policy, allows it. AND you need to make sure there is no explicit deny. So if your user through IAM is allowed to access your S3 bucket but your bucket policy is explicitly denying your user to access it, then you will not be able to access it

Answer 10

1. to grant public access to a bucket 2. to force objects to be encrypted at the upload time, 3. to grant access to another account using cross account S3 bucket policies.

Answer 11

block public access to your S3 bucket if you know that your buckets should never, ever be public, leave these on. historically were created to prevent company data leaks because there were a lot of leaks of Amazon S3 bucket in the news and Amazon S3 came up with this way of making sure that any server could say, hey, none of my buckets are public, and that was very popular. there's a way to set these at the account level

Answer 12

you can access S3 privately through VPC endpoints. if you have EC2 instances in your VPC without internet access, then they can access S3 privately through what's called a VPC endpoint.

Answer 13

For logging audit, you can use S3 access logs and they can be stored in the other S3 buckets. API calls can also be logged into CloudTrail, which is a service to log API calls in your accounts.

Answer 14

1. MFA Delete | 2. Pre-Signed URLs

Answer 15

multifactor authentication if you want to delete a specific version objects in your buckets, then you can enable MFA Delete and we will need to be authenticated with MFA to be able to delete the objects.

Answer 16

a URL that's signed with some credentials from AWS and it's valid only for a limited time. use case for it, for example, is to download a premium video from a service if the user is logged in and has purchased that video. At the exam you see the access of certain files to certain users for a limited amount of time, think pre-signed URLs.

Answer 17

S3 can host static websites and have them accessible on WWW

Answer 18

.s3-website-.amazonaws.com if you get 403 Forbidden error - make sure the bucket policy allows public reads

Answer 19

Cross-Origin Resource Sharing we want to get resources from a different origin. Web Browsers have this security in place CORS basically saying as soon as you visit a website, you can make request to other origins only if the other origins allow you to make these request. browser-based security

Answer 20

is a scheme (a protocol), a host (domain), and a port. if you do https://www.example.com, this is an origin where the scheme is HTTPS, the host is www.example.com, and the port is port 443.

Answer 21

you go on example.com/app1 or example.com/app2.

Answer 22

if you visit, for example, www.example.com and then you're asking your web browser to make a request to other.example.com, this is what's called a cross-origin request and your web browser will block it unless you have the correct CORS headers Access-Control-Allow-Origin

Answer 23

our web browser visits our first web server. it's called the origin. So for example, our web server is at https://www.example.com And there is a second web server called a cross-origin because it has a different url, which is https://www.other.com. So a web browser visits our first origin and it's going to be asked from the files that are uploaded from the origin to make a request to the cross-origin. So the web browser will do what is called a preflight request. And this preflight request is going to ask the cross-origin if it is allowed to do a request on it. So it's going to say, "Hey cross-origin, the website https://www.example.com is sending me to you, can I make a request onto your website?" and the origin is saying, "yes, here is what you can do." so the Access-Control-Allow-Origin is saying: the methods that are authorized is GET, PUT, and DELETE. So we can get a file, delete a file, or update the file.

Answer 24

if a client does a cross-origin request on our S3 bucket enabled as a website, we can allow for a specific origin by specifying the entire origin name, or as star * for all origins. the CORS headers have to be defined on the cross-origin bucket, not the first origin bucket

Answer 25

The web browser for example, is getting HTML files from our bucket. And our bucket is enabled as a website. But there is second bucket that is going to be our cross-origin bucket, also enabled as a website, that contains some files that we want. we're going to do GET index.html and the website will say, okay here is your index.html and that file is going to say you need to perform a GET for another file on the other origin. And if the other bucket is configured with the right CORS headers, then web browser will be able to make the request, if not it will not be able to make that request

Answer 26

Amazon S3 is made up of multiple servers and so when you write to Amazon S3, the other servers are going to replicate data between each other. And this is what leads to different consistency issues. 1. Read after Write consistency for PUTs of new objects 2. Eventual Consistency for DELETEs and PUTs of existing objects

Answer 27

as soon as you upload a new object, once you get a correct response from Amazon S3, then you can do a GET of that object and get it. This is true except if you do a GET before doing the PUT to check if the object existed. So if you do a GET, and you get a 404 for not existing. Then you do a PUTS, then there is a chance that you would get a 404 still even though the object was already uploaded and this is what's called eventually consistent.

Answer 28

if you read an object right after updating it, you may get the older version of that object. So if you do a PUT on an existing object so PUT 200, then you do another PUT 200, and then you do a GET, then the GET might return the older version if you are very, very quick. And if you delete an object, you might still be able to retrieve it for a very short time. And the way that you get the newer version is just to wait a little bit.

Answer 29

There is no way to request strong consistency in Amazon S3, you only get eventual consistency and there's no API to get strong consistency. So that means that if you overwrite an object, you need to wait a little bit before you are certain that the GET returns the newest version of your object.

Answer 30

Multi Part Upload is also recommended as soon as the file is over 100MB

Answer 31

SSE-C | Here you have full control over the encryption keys, and let AWS do the encryption

Answer 32

With SSE-KMS you let AWS manage the encryption keys but you have full control of the key rotation policy

Answer 33

With Client Side Encryption you perform the encryption yourself and send the encrypted data to AWS directly. AWS does not know your encryption keys and cannot decrypt your data.

Answer 34

the IAM user must have an explicit DENY in the attached IAM policy Explicit DENY in an IAM policy will take precedence over a bucket policy permission

Answer 35

CORS is wrong Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. To learn more about CORS, go here: https://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html

Answer 36

we have to first enable versioning on the S3 bucket

Answer 37

to permanently delete an object version suspend versioning on the bucket.

Answer 38

enable versioning list deleted versions

Answer 39

only by the bucket owner, which is the root account So even if you have an administrator accounts, you cannot enable MFA-Delete. you have to use MFA-Delete only using the CLI for now.

Answer 40

for audit purposes, you may want to log all the access into your S3 buckets. Any request that is done to Amazon S3 from any account, authorized or denied, you want it to be logged into another S3 bucket so you can analyze it later. Once we've enabled the S3 Access Logs, will log all the requests into the logging bucket.

Answer 41

never, ever set your logging bucket to be the bucket you are monitoring. Otherwise, if you set the logging bucket and the monitoring bucket to be exactly the same, then it will create a logging loop and your bucket will grow in size exponentially.

Answer 42

we have an S3 bucket and we want to replicate it into another bucket. To do so we first must enable versioning in the source and destination buckets. And then we can set up two different things: 1. CRR: cross-region replication if the two buckets are in different regions 2. or SRR = same region replication, if the two buckets are in the same regions. buckets can be in different accounts!!! The copying happens asynchronously, but it is very, very quick, and for the copying to happen, you need to create an IAM role, which will have the permissions from the first S3 bucket to copy through the second S3 buckets.

Answer 43

1. for compliance 2. lower latency access of your data into other regions, 3. to do cross accounts replication

Answer 44

1. log aggregation: you have different logging buckets and you wanna centralize them into one bucket 2. live replication between a production and your test accounts

Answer 45

only the new objects are replicated. So it's not retroactive, it will not copy your existing states of your S3 buckets.

Answer 46

it will add a delete marker, and it's not going to be replicated. And you can deletes with a version ID, again you will delete in the source and it's not replicated. deletes operation is not replicated.

Answer 47

there is no chaining of replication. That means that if bucket one has replication to bucket two, which has replication into bucket three, then any objects created in bucket one, will be in bucket two, but will not be replicated to bucket three,

Answer 48

either the SDK or the CLI but for uploads, it's a bit harder and you must use the SDK to use them. aws s3 presign s3://mybucket/myobject --region my-region --expires-in 300

Answer 49

one hour you can change that timeout using an expires-in parameter you specify the time in seconds.

Answer 50

1. you want to allow only logged-in users to download a premium video on your S3 bucket and you only wanna give a downloading for maybe 15 minutes to a premium user that is logged in. 2. you have an ever changing list of users that need to download files, and so, you don't wanna give them access directly to your bucket because it could be very dangerous or it's not maintainable because there's so many new users all the time. So maybe you want to generate URLs dynamically and give them the URLs over time, all by pre signing them all of them. 3. maybe you want to allow temporarily a user to upload a file to a precise location in our buckets. You want to allow a user to upload a profile picture directly onto our S3 bucket.

Answer 51

set the proper signature version in order not to get issues when generating URLs for encrypted files aws configure set default.s3.signature_version s3v4

Answer 52

99.99999999% for all of them very high durability. So if you store 10 million objects, you can on average expect to incur a loss of a single object, once every 10,000 years.

Answer 53

99.9% for all except S3 One Zone-IA (99.5%)

Answer 54

99.9% for S3 Standard, Glacier and Glacier Deep Archive 99% for others Amazon would guarantee you to reimburse you.

Answer 55

3 at least except for S3 One Zone-IA (1)

Answer 56

for general purpose it can sustain two concurrent facility failures. So, it's really resistant to AZ disasters.

Answer 57

suitable for data that is less frequently accessed but requires a rapid access when needed. it has a little less availability, and it is lower cost compared to Amazon S3 Standard. The idea is that if you access your object less you won't need to pay as much. It can sustain two concurrent facility failures.

Answer 58

this is the same as S3 IA, but now the data is stored in a single availability zone we have the same durability within the single AZ, but if that AZ is somewhat destroyed, then you would lose your data. You have less availability 99.5% availability, and you have still the low latency and high throughput performance you would expect from S3. It's lower cost by 20% compared to infrequent access, it supports SSL for all the encryption.

Answer 59

is going to move data between your storage classes, intelligently it has the same low latency and high throughput as S3 Standard, but there's a small monthly monitoring fee and auto-tiering fee. it will automatically move objects between the access tiers based on the access patterns. So it will move objects between S3 General Purpose, S3 IA. And so it will choose for you if your object is less frequently accessed or not. And you're going to pay a fee for an S3 to do that little monitoring. So the durability is the same. it can resist an entire event that impacts an availability zone.

Answer 60

for archives low cost object storage meant really for archivings and backups. the data needs to be retained for a very long time, tens of years It's a big alternative to on-premise magnetic type storage where you would store data on magnetic tapes and put these tapes away. And so if you wanted to retrieve the data from these tapes you would have to find the tape manually put it somewhere and then restore the data from it. Amazon Glacier is really to retrieve files and not have some kind of urgency around it. the minimum storage duration for Glacier is going to be 90 days.

Answer 61

for the archives you don't need right away super long-term storage and even cheaper. But this time the retrieval options are Standard, 12 hours. So you can not retrieve a file in less than 12 hours. And Bulk, if you have multiple files and you can wait up to 48 hours. it's gonna be even cheaper. the minimum storage duration is 180 days

Answer 62

big data analytics, mobile and gaming applications, content distribution.

Answer 63

a data store for disaster recovery, backups, or any files that you expect to access way less frequently.

Answer 64

to store secondary backup copies of on-premise data or storing any type of data we can recreate. for example, we can recreate thumbnails from an image so we can store the image on S3 General Purpose and we can store the thumbnail on S3 One Zone Infrequent Access. And if we need to recreate that thumbnail over time we can easily do that from the main image.

Answer 65

called an object. It's called an Archive. And each Archive can be a file up to 40 terabytes. And Archives are stored, not in buckets they're stored in vaults.

Answer 66

1. Expedited, which is one to five minutes. you request your file and between one to five minutes you will get it back. More expensive 2. Standard, which is three to five hours. 3. Bulk when you require multiple files retrieval at the same time, which takes between five to 12 hours to give you back your files.

Answer 67

Standard and Intelligent Tiering - none Standard IA and One Zone IA - 128 KB Glacier Archives - 40KB

Answer 68

there is retrieval fee for all of them except Standard and Intelligent Tiering there is minimum storage duration charge for all of them except Standard Storage cost from the most expensive to the lease: ``` Standard Intelligent Tiering Standard IA One Zone IA Glacier Deep Archive ```

Answer 69

there is a giant graph on the AWS website that describes which transitions are allowed moving all these objects around all these classes can be done manually, but it can also be done automatically using something called a lifecycle configuration.

Answer 70

You can define Transition actions and Expiration actions rules can be applied for a specific prefix. So if you have all your MP3 files within the MP3 quote-unquote folder or prefix, then you can set a life-cycle rule just for that specific prefix. you can also have rules created for a certain object tags. If you want to have a rule that applies just to the objects that are tagged "Department: Finance", then you can do so.

Answer 71

to delete an object after some time. So for example, your access log files, maybe you don't need them after another year. So after a year, you will say, "Hey, all my files are over a year old. "Please delete them. Please expire them." And it could also be used to delete old versions of a file. So if you have versioning enabled and you keep on overriding a file, and you know you won't need the previous versions after maybe 60 days, then you can configure an expiration action to expire objects, old versions of a file, after 60 days. It can also be used to clean up and complete multi-part uploads. In case some parts are hanging around for 30 years and you know, they will never be completed. Then you would set up an expire action to remove these parts.

Answer 72

helpful when you want to transition your objects from one storage class to another. For example, you're saying, "Move objects to Standard IA class "60 days after creation "and then move to Glacier for archiving, six months later."

Answer 73

1. S3 source images can be on the Standard class and you can set up a lifecycle configuration to transition them to GLACIER after 45 days. Why? Because they need to be archived afterwards and we can wait up to six hours to retrieve them. 2. And then for the thumbnails, they can be ONEZONE_IA. Why? Because we can recreate them. And we can also set up a lifecycle configuration to expire them or delete them after 45 days. 3. Let's move the source image to GLACIER. And the thumbnails can be on ONEZONE_IA because it's going to be cheaper. And in case we lose an entire AZ in AWS, we can easily, from the source image, recreate all the thumbnails.

Answer 74

1. So you need to enable S3 versioning, because we want to delete files, but we want to be able to recover them. And so with S3 versioning, we're going to have object versions and the deleted objects are going to be hidden by delete marker, and they can be easily recovered. 2. But we're going to have a non-current versions, basically the object's versions from before. And we want to transition them into S3_IA because it's very unlikely, that these old object versions are going to be accessed, but if so, then you need to make sure to recover them immediately. 3. And then afterwards, after these 15 days of grace period, to recover these non-current versions, you can transition them into DEEP_ARCHIVE, such as for 100 and for 365 days. It can be archived and they will be recoverable within 48 hours. Why don't we use just Glacier? because Glacier will cost us a little bit more money because we have a timeline of 48 hours. And so we can use all the tiers all the way up to DEEP_ARCHIVE, to reach your file and get even more savings.

Answer 75

by default Amazon S3 automatically scales to a very very high number of requests and has a very very low latency between 100 and 200 milliseconds to get the first bite out of S3. in terms of how many requests per second you can get: - thirty-five hundred Put, Copy, Post, Delete per second per prefix, and - five-thousand and five hundred Get, Head requests per second per prefix in a bucket. the prefix is going to be anything between the bucket and the file. There is not limits to the number of prefixes in your bucket.

Answer 76

if you have KMS encryption on your objects using SSE-KMS - then you may be impacted by the KMS limits. When you upload a file, it will call S3 on your behalf - the GenerateDataKey KMS API and when you download a file from S3 using SSE-KMS, KMS itself will call the Decrypt KMS API. by default, KMS has a quota of number requests per second and based on the region you're in it could be 5500 per second, 10000 per second, or 30000 per second requests. And you cannot change that quota. if you have more than 10000 requests per second in a specific region that only supports 5500 requests per second for KMS, then you will be throttled. These quotas are pretty big for normal usage but still good to know if you have many many files and a high usage of your S3 bucket.

Answer 77

1. multi-part upload | 2. S3 Transfer Acceleration

Answer 78

it is recommended to use multi-part upload for files that are over one hundred megabites and must be used for files that are over five gigabytes. It parallelizes uploads and that will help us speed up the transfers to maximize the bandwidth. We will divide it in parts, so, smaller chunks of that files and each of these parts will be uploaded in parallel to Amazon S3. In Amazon S3, once all the parts have been uploaded it's smart enough to put them together back into the big file.

Answer 79

only for upload, not for download. To increase transfer speed by transferring a file to an AWS edge location which will forward then the data to the S3 bucket in the target region. transfer acceleration is compatible with multi-part upload.

Answer 80

reading a file in the most efficient way, speed up downloads to parallelize GETs by getting specific byte ranges for your files. also in case you have a failure or to get a specific byte range, then you can retry a smaller byte range and you have better resilience in case of failrues.

Answer 81

1. very big file to download you want to request the first part which is the first few bytes of the file, then the second part, and then the the N part. So, we request all these parts as specific bytes range fetches. And all these requests can be made in parallel. 2. The second use case is to only retrieve a partial amount of the file. For example, if you know that the first fifty bytes of the file in S3 are the header and give you some information about the file, then you can just issue the header request with byte-range request for the headers using the first, say, fifty bytes.

Answer 82

We have a file in the US and we want to upload it to a S3 bucket in Australia. We will upload that file through an edge location in the US, which will be very very quick. And then from the edge location to the Amazon S3 bucket in Australia - the edge location will transfer it over the fast private AWS network. We minimized the amount of public internet that we go through. And, we maximize the amount of private AWS network that we go through.

Answer 83

we want to retrieve less data by performing server side filtering using SQL statements. you will use less network and less CPU cost client-side because you don't retrieve the full file. S3 will perform the filtering for you and only return to you what you need. you are up to 400% faster and up to 80% cheaper because you have less network traffic going through and the filtering happens server-side

Answer 84

We have the client asking to get a CSV file with S3 Select to only get a few columns and a few rows. Amazon S3 will perform server-side filtering on that CSV file to find the right columns and the rows we want and send back the data filtered back to our client, so obviously, less network, less CPU, and faster,

Answer 85

In the request, along with the SQL expression, you must also specify a data serialization format (JSON, CSV, or Apache Parquet) of the object. S3 uses this format to parse object data into records, and returns only records that match the specified SQL expression. You must also specify the data serialization format for the response.

Answer 86

some events happen in your S3 bucket, for example, this could be a new object created, an object removed, an object that has been restored, or there is some S3 replication happening. And you want to be able to react to all these events. You can create rules, and for these rules you can also filter by object names. For example, you want to react only to the Jpeg file. So *.jpg. You can create as many S3 events as desired, and most of the time they will be delivered in seconds, but sometimes it can take a minute or longer.

Answer 87

to generate thumbnails of images uploaded to Amazon S3.

Answer 88

1. SNS = simple notification service to send notifications and emails 2. SQS for a simple queue service to add messages into a queue 3. Lambda Functions to generate some custom code.

Answer 89

you need to enable versioning on your bucket.

S3 Flashcards

(113 cards)