Simple Storage Service (S3) Flashcards

1
Q

S3 Security

A

-S3 is private by default => the only identity which has any initial access to an S3 bucket, is the account root user of the account which owns/created that bucket. Anything else, so any other permissions, have to be explicitly granted.

And there’s a few ways, that this can be done:

-S3 Bucket Policies

–A form of resource policy => A resource policy is just like an identity policy, they’re attached to resources instead of identities.

–Provide a resource perspective on permissions

The difference between resource policies and identity policies:

Identity policies = You’re controlling what that identity can access. These have one limitation…. You can only attach identity policies to identities in your own account

Resource policies = You’re controlling who can access that resource.

–Resource policies ALLOW/DENY access from the SAME account or DIFFERENT accounts.

Since the policy is attached to the resource, it can reference any other identities inside that policy.

–Resource policies ALLOW/DENY access to Anonymous principals.

Resource policies can be used to open a bucket to the world by referencing all principals, even those not authenticated by AWS.

They have one major difference to identity policies and that’s the presence of an explicit “Principal” component. The principal part of a resource policy defines which principals are affected by the policy.

–Bucket policies can be used to control who can access objects, even allowing conditions which block specific IP addresses.

–There can only be ONE bucket policy on a bucket, but it can have multiple statements.

If an identity inside one AWS account is accessing a bucket also in that same account, then the effective access is a combination of ALL of the applicable identity policies, plus the resource policy, so the bucket policy. For any anonymous access, so access by an anonymous principal, then only the bucket policy applies.

If you’re doing cross-account access, the identity in their account needs to be able to access S3 in general and your bucket, and then your bucket policy needs to allow access from that identity, so from that external account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Access Control Lists (ACLs)

A

-Are a way to apply security to objects or buckets.

-They’re a sub-resource of that object/bucket

S3 Subresource only exist in the context of a specific bucket or object. S3 Subresource provides support to store, and manage the bucket configuration information and object-specific information.

Bucket Subresources: S3 Object Lifecycle Management, S3 Bucket Versioning, Static Website hosting, Bucket Policy and ACL (access control list), Bucket ACL, CORS (cross-origin resource sharing), Logging - S3 Access Logs, Tagging, Location, Notification.

Object Subresources: Object ACL

-They’re Legacy (AWS don’t recommend their use and prefer that you use Bucket Policies)

-Inflexible & only allow simple permissions

They can’t have conditions like bucket policies, so your’re restricted to some very broad conditions.

Which permissions can be controlled using an ACL:

What these five things do depend on, if they’re applied to a bucket or an object.

  • READ
  • WRITE
  • READ_ACP
  • WRITE_ACP
  • FULL_CONTROL

It’s significantly less flexible than an identity or a resource policy.

You don’t have the flexibility of being able to have a single ACL that affects a group of objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Block Public Access

A

This feature provides settings for access points, buckets, and accounts to help you manage public access to Amazon S3 resources. By default, new buckets, access points, and objects don’t allow public access. However, users can modify bucket policies, access point policies, or object permissions to allow public access. S3 Block Public Access settings override these policies and permissions so that you can limit public access to these resources.

With S3 Block Public Access, account administrators and bucket owners can easily set up centralized controls to limit public access to their Amazon S3 resources that are enforced regardless of how the resources are created.

When Amazon S3 receives a request to access a bucket or an object, it determines whether the bucket or the bucket owner’s account has a block public access setting applied. If the request was made through an access point, Amazon S3 also checks for block public access settings for the access point. If there is an existing block public access setting that prohibits the requested access, Amazon S3 rejects the request.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

EXAM POWER UP

A
  • Identity: Controlling different resources
  • Identity: You have a preference for IAM
  • Identity: Same account
  • Bucket: Just controlling S3
  • Bucket: Anonymous or Cross-Account
  • ACLs: NEVER - unless you must
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

S3 Static Website Hosting

A

-We’ve been accessing S3, via the normal method, which is using the AWS APIs

For instance, to access any objects within S3, we’re using the S3 APIs. Assuming we’re authenticated and authorized, we use the get object API call to access those resources. (Secure and Flexible)

-This feature allows access via standard HTTP - e.g Blogs..

-You enable it and doing so, you have to set an Index document and an Error document.

So in enabling static website hosting on an S3 bucket, we have to point the Index document (usually the entry point of a website) at a specific object in the S3 bucket.

The Error document is the same but it’s used when something goes wrong. So if you access a file which isn’t there, or there is another type of service side error, that’s when the Error document is shown.

Both of these need to be HTML documents, because the static website hosting feature, delivers HTML files.

-When you enable it, AWS creates a Website Endpoint.

This is a specific address that the buket can be accessed from using HTTP. The name of this endpoint, is influenced by the bucket name that you choose and the region that is in.

You can use your own custom domain name for a bucket, but if you want to do that, then your bucket name matters. You can only use your custom domain name if the name of the bucket matches the domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

There are two specific scenarios which are perfect for S3:

A

-Offloading

If you have a website hosted on a compute service (EC2), they can benefit from offloading their storage to Amazon S3.

What we can do is, we can take all of the media that the compute service hosts, and we can move that media to an S3 bucket that uses static website hosting.

Then when the compute service generates the HTML file and delivers this to the customer’s browser, this HTML file points at the media that’s hosted on the S3 bucket. So the media is retrieved from S3, not the compute service.

S3 is likely much cheaper for the storage and delivers of any media versus a compute service. (S3 is designed for the storage of large data at scale)

-Out-of-band pages

Is a method of accessing something that is outside of the main way.

So for example, you might use out-of-band server management and this lets you connect to a management card that’s in a server using the cellular network. That way, if the server is having networking issues, with the normal access methods (normal network), then you can still access it.

We can use this as a maintenance page or unscheduled maintenance periods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

S3 Pricing

A

-Storage = You pay for storing objects in your S3 buckets. The rate you’re charged depends on your objects’ size, how long you stored the objects during the month, and the storage class

-Request & data retrievals = You pay for requests made against your S3 buckets and objects. S3 request costs are based on the request type, and are charged on the quantity of requests

-Data Transfer

You pay for all bandwidth into and out of Amazon S3, except for the following:

–Data transferred out to the internet for the first 100GB per month, aggregated across all AWS Services and Regions (except China and GovCloud)
–Data transferred in from the internet.
–Data transferred between S3 buckets in the same AWS Region.
–Data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket (including to a different account in the same AWS Region).
–Data transferred out to Amazon CloudFront (CloudFront).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Object Versioning

A

-Is something which is controlled at a bucket level

-It starts off at a disabled state, you can optinnally enable versioning on a disabled bucket, but once enabled, you cannot disable it again. What you can do is, suspend it, and if desired, a suspended bucket can be re-enabled.

Without versioning enabled on a bucket, each object is identified solely by the object key, it’s name, which is unique inside the bucket.

If you modify an object, the original version of that object is replaced.

Versioning lets you store multiple versions of an object within a bucket. Any operations which would modify objects generate a new version.

-There’s an attribute of an object and it’s the ID of the object.

When versioning on a bucket is disabled, the ID of the objects in that bucket as set to “null”.

If you upload or put a new object into a bucket with versioning enabled, then S3 allocates the ID to that object, for example: id = 11111.

If any modifications are made to this object, S3 will allocate a new ID to the newer version, and it retains the old version. The newest version of any object in a version-enabled bucket is known as “current version”.

You can request an object from S3 and provide the ID of a specific version, to get that particular version back, rather than the current version. (Versions cna be individually accessed by specifying the ID, and if you don’t specify then it’s assumed that you want the current version)

-Versioning also impacts deletions

If we indicate to S3 that we want to delete the object and we don’t give any specific version ID, then S3 will add a new special version of that object, known as a “delete marker”.

The delete marker is a special version of an object, which hides all previous versions of that object. But you can delete the “delete marker”, which essentially undeletes the object, returning the current version to being active again.

If you want to truly delete the object, you have to specify the particular version ID.

If you’re truly deleting the current version of the object, then the next most current version, becomes the current version.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

IMPORTANT POINTS OF S3 VERSIONING

A

-Cannot be switched off - only suspended

-Space is consumed by ALL versions

-You are billed for ALL versions

-Only way to 0 costs - is to delete the bucket

-Suspending it, doesn’t actually remove any of the old versions, so you’re still billed for them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MFA Delete

A

-Enabled in Versioning configuration on a bucket

-When you enable MFA Delete, it means MFA is required to change bucket versioning states.

-MFA is required to delete versions.

How it works is that you provide:

-Serial number (MFA) + Code passed with API CALLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

S3 Performance Optimization

A

“It’s often about performance and reliability combined and this is especially relevant, when we’re talking about a distributed organization”

Features which help us in this regard:

  • Single PUT Upload
  • Multipart Upload
  • S3 Accelerated Transfer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Single PUT Upload

A

We know from the “Animals for Life” scenario, that remote workers need to upload large data sets and do so frequently, and we know that they’re often on unreliable internet connections.

-By default, when you upload an object to S3, it’s uploaded as a single data stream to S3.

A file becomes an object, and it’s uploaded using the PutObject API call and placed in a bucket, and this all happens as a single stream.

-If the stream fails, the upload fails

-Requires a full restart

Any delay can be costly and potentially risky.

-Speed & Reliability of the upload will always be limited, because of this single stream of data

“Single stream transfer can often provide much slower speeds than both ends of that transfer are capable of”

-If you utilize a single PUT upload, then you’re limited to 5GB of data as a maximum. (AWS Limit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multipart Upload

A

-Improves the speed and reliabilty of uploads to S3, and it does this by breaking data up into individual parts.

-The minimum size for using multipart upload is 100MB.

-The upload can be split into a maximum of 10.000 parts, and each part can range in size between 5MB-5GB.

-The last part can be smaller than 5MB.

-Each individual part is treated as it’s own isolated upload - each part can fail in isolation and be restarted in isolation, rather than restarting the whole thing.

-Improves the transfer rate = speeds of all parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

S3 Accelerated Transfer

A

To understand it, first, it’s required to understand how global transfer works to S3 buckets.

We have no control over the public internet data path, routers and ISPs are picking this path based on what they think is best, and potentially commercially viable. That doesn’t always align with what offers the best performance.

So using the public internet for data transit is never an optimal way to get data from source to destination.

S3 Transfer Acceleration uses the network os AWS Edge Locations, which are located in lots of convenient locations globally. An S3 bucket needs to be enabled for transfer acceleration, the default is that it’s switched OFF, and there are some restrictions for enabling it.

-The bucket name can not contain periods and it needs to be DNS compatible in it’s naming

Once enabled, data being uploaded, instead of going back to S3 bucket directly, it immediately enters the closest best performing AWS Edge Location. Now this part does occur over the public internet, but geographycally, it’s really close.

At this point, the Edge Locations transit the data being uploaded over the AWS global network, a network which is directly under the control of AWS, and this tends to be a direct link, between these Edge Locations and other areas of the AWS global network, in this case the S3 bucket.

-The internet, it’s not designed primarily for speeds, it’s designed for flexibility and resilience.

-The AWS network purpose is built to link regions to other regions in the AWS network. (It’s like an express train, only stoping at the source and destionation) (It’s much faster and with lower, consistent latency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Key Management Service (KMS)

A

AWS KMS is a secure and resilient service that uses hardware security modules that have been validated under FIPS 140-2, or are in the process of being validated, to protect your keys.

-Regional & Public Service

-Capable of multi-region features

-It lets you create, store and manage cryptographic keys (keys which can be used to convert plaintext to ciphertext, and vice versa)

-Capable of handling Symmetric and Asymmetric keys

-Capable of performing cryptographic operations (encrypt, decrypt & …)

-Keys never leave KMS - Provides FIPS 140-2 (L2) = US security standard, it’s often a key point of distinction between using KMS versus using something like CloudHSM.

(ASSUME HE IS TALKING ABOUT SYMMETRIC KEYS)

-The main type of key that KMS manages are KMS keys (also referred to as Customer Master Keys (CMK))

These KMS keys are used by KMS within cryptographic operations = You can use them, applications can use them, and other AWS services can use them.

They are logical, think of them as a container for the actual physical key material, and this is the data that really makes up the key.

-KMS keys contains - ID, data, policy, description & state

-Every KMS key is backed by physical key material

It’s this data which is held by KMS and it’s this material, which is actually used to encrypt and decrypt things that you give to KMS.

-The physical key material can be generated by KMS or imported into KMS

This material contained inside a KMS key can be used to directly encrypt or decrypt data up to 4KB in size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Encryption Keys (DEKs)

A

Are another type of keys which KMS can generate.

-They are generated by using a KMS key using the GenerateDataKey operation.

  • GenerateDataKey can encrypt or decrypt data > 4KB
  • GenerateDataKey are linked to the KMS key which created them. (KMS can tell that a specific data encryption key was created using a specific KMS key)
  • KMS doesn’t store the data encryption key in any way.

It provides it to you or the service using KMS and then it discards it.

The reason it discards it, is that KMS doesn’t actually do the encryption or decryption of data using DEKs, you do or the service using KMS performs those operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When a DEK is generated, KMS provides you with two versions of that DEK:

A

-Plaintext Version = Something which can be used immediately to perform cryptographic operations

-Cyphertext Version = Can be given back to KMS for it to be decrypted (Encrypted by using the KMS key that generated it)

-Encrypt data using plaintext key.

-Once finished with that process, discard the plaintext version of that DEK.

-Store encrypted key (Ciphertext) with that data encrypted data.

Decrypting that data is simple, you pass the encrypted data encryption key back to KMS and ask for it to decrypt it, using the same KMS key used to generate it.

Then you use the decyrpted data encryption key that KMS gives you back and decrypt the data with it and then you discard the decrypted data encryption key.

–S3 generates DEK for every single object

-KMS doesn’t track the usage of DEK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Key Concepts - KMS

A

-KMS keys are isolated to a region & never leave

-Supports Multi-region keys (if required) (Keys are replicated to other regions)

-Keys are either AWS owned or Customer owned.

AWS owned keys are a collection of KMS keys that an AWS service owns and manages for use in multiple AWS accounts. They operate in the background

-Customer owned keys have two types:

–Customer Managed = Created by the customer to use directly in an application or within an AWS service.

–AWS Managed = Created automatically by AWS, when you use a service such as S3

-Customer Managed keys are more configurable

You can edit the key policy which means you could allow cross-account access, so that other AWS account can use your keys.

-KMS keys support rotation

Rotation is where physical backing material, so the data used to actually do cryptographic operations is changed.

With AWS managed keys, this can’t be disabled. It’s set to rotate approximately once per year.

With Customer managed keys, rotation is optional, it’s enabled by default and happens approximately once every year.

-KMS keys contains the Backing Key (and previous backing keys caused by rotation)

It means that as a key is rotated, data encrypted with old verisons can still be decrypted.

-You can create Aliases, which is shortcuts to keys (also per region)

19
Q

Key Policies and Security

A
  • Key policies (resource)

-Every KMS key has one, and for Customer managed keys, you can change it.

-KMS has to be explicitly be told that keys trust the AWS account that they’re contained within.

-Key policies (trust the account) & IAM policies (let IAM users interact with the key)

In high security environments, you might want to remove this account trust and insist on any key permissions being added inside the key policy.

20
Q

S3 Encryption

A

“Buckets itself aren’t encrypted, Objects are”

“Each Object inside a Bucket could be using different encryption settings”

There are two main methods of encryption that S3 is capable of supporting:

  • Client-side Encryption

The objects being uploaded, are encrypted by the client before they ever leave. (Means that the data is Ciphertext
the entire time)

From AWS’s perspective, the data is received in a scrambled form and stored in a scrambled form. (AWS can’t see the data)

You own and control the keys, the process, and any tooling. So if your organization needs all of these, then you need
to utilize client-side encryption.

  • Server-side Encryption

Here, even though the data is encrypted in transit using HTTPS, the objects themselves aren’t initially encrypted.

Once this data, which is in plain text form, reaches the S3 endpoint, is still plain text. Once the data hits S3, then
it’s encrypted by the S3 infrastructure.

Using this method, you allow S3 to handle some or all of those processes.

Both of these methods use encryption in transit, between the user side and S3 (like an encrypted tunnel)

“Both of these, refer to Encryption at Rest”

The two components of Server-side encryption are:

-Encryption and Decryption process = So taking plain text, a key and an algorithm and generating cipher text and the
reverse.

-The generation and management of the cryptographic keys.

21
Q

Types of Server-side Encryption

A

-Server-Side Encryption with Customer-Provided Keys (SSE-C)

The customer is responsible for the encryption keys that are used for encryption and decryption and the S3 service
manages the actual encryption and decryption process.

You are essentially offloading the CPU requirements for this process but you still need to generate and manage the key
or the keys, that this S3 encryption process will use.

When you put an object(plaintext) into S3, you’re required to provide the key. So when this object and the encryption
key arrive at the S3 endpoint, the object is encrypted using the key and at the same time, a hash of the key is taken
and attached to the object. (the key is discarded)

The hash is one-way, it can’t be used to generate a new key, but if a key is provided during dencryption, the hash can
identify if the specific key is the one that was used or not to encrypt that object. (Safety feature)

If you want to decrypt the object, you need to provide the same key that was used to encrypt it.

-Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) (AES-256)

AWS manages both encryption and decryption processes, as well as the key generation and management.

With this method, when you put an Object into S3, you just provide the plain text data. S3 creates a root key to use for the encryption process. (handled end to end by AWS)

When an Object is uploaded to S3, using SSE-S3, it’s actually encrypted by a key that’s unique for every single object. (AWS generates this key)

S3 uses that key to encrypt that text object, then the root key is used to encrypt that unique key and the original unencrypted version of that key is discarded. The encrypted Object and the encrypted unique key is stored side by side in S3.

Since it’s very less admin overhead, it does present three significant problems:

-If you are in a regulatory environment, where you need to control the keys that are used and control access to those keys, then this isn’t usable.

-If you need to be able to control the rotation of the key.

-If you need role separation = Is that a full S3 admin, so somebody who has full S3 permissions to configure the bucket to manage the objects, he can also decrypt and view data.

-Server-Side Encryption with KMS KEYS stored in AWS Key Management Service (SSE-KMS)

AWS handles both the keys and the encryption process, but unlike SSE-S3, the root key is handled by a separate service. (KMS)

When you upload an Object and pick SSE-KMS for the first time, S3 liases with KMS and creates a AWS managed KMS key. This is the default key, which gets used when using SSE-KMS in the future.

Everytime an Object is uploaded, S3 uses a dedicated key to encrypt that object and that key is a data encryption key, which KMS generates using the KMS key. S3 is provided with a plain text version of the data encryption key as well as an encrypted one. The plain text one that’s used to encrypt the object and then it’s discarded. The encrypted data encryption key is stored along with the encrypted Object.

Every Object, which is uploaded and encrypted with SSE-KMS requires a KMS key. This KMS key is used to generate one unique data encryption key for every object that’s encrypted using SSE-KMS.

You don’t have to use the Default KMS key that S3 creates, you can pick to use your own customer managed KMS key. This means you can control the permissions on it and the rotation of the key material.

You can also have logging and auditing on the KMS key itself.

The best benefit provided by SSE-KMS is the role separation = To decrypt any object in a bucket, where those objects have been encrypted with SSE-KMS, you need access to the KMS key that was used to generatethe unique key.

The KMS key is used to decrypt the data encryption key for the object and then that decrypted data encryption key, decrypts the object itself. (If you don’t have access to KMS, you can’t access the Object)

22
Q

Default Bucket Encryption

A

When you’re uploading Objects to S3, you’re actually utilizing the “PutObject” operation. As part of this operation, you can specify a Header, which is “x-amz-server-side.encryption”. This is how you direct AWS to use server-side encryption.

-If you don’t specify this header, then Objects will not use encryption.

-If you do specify this header, if you use AES256 then this utilizes SSE-S3. If you use aws:kms then this utilizes SSE-KMS

You can set one of these types as default, so when you don’t specify a header, it will still use thattype of encryption.

For example: DEFAULT = AES256 (SSE-S3)

23
Q

S3 Object Storage Classes - S3 Standard

A

-This is the default storage of S3. So if you don’t specify the storage class, this is what you’re going to use.

-Data is replicated at least in 3 AZs (able to cope with multiple AZ failure)
-Provides 11 9s of durability (Availability)

-Replication uses MD5 checksums together with Cyclic Redundancy Checks (CRCs), to detect and resolve any data issues.

-When objects are uploaded to S3, have been stored durably, S3 responds with a HTTP 1.1 200 OK status.

-You are billed a GB/m fee for data stored.
-A dollar per GB charge for transfer OUT(IN is free) and price per 1,000 requests made to the product.
-No specific retrieval fee, no minimum duration, no minimum size.

-Makes data accessible inmediately = It has a milliseconds first byte latency and objects can be made publicly available. (When data is requested it’s available within milliseconds)

-S3 Standard should be used for Frequently Accessed Data which is important and non-replaceable.

24
Q

S3 Object Storage Classes - S3 Standard-IA (Infrequent Access)

A

-Data is replicated at least in 3 AZs. (able to cope with multiple AZ failure)
-Provides 11 9s of durability. (Availability)

-It’s about half of the price cheaper in comparison to S3 Standard. (Cost-Effective)
-You have a per request charge and a data transfer OUT cost. (Same as Standard)

-It has a per GB data retrieval fee (higher tha Standard), overall cost increases with frequent data access.

-It has a minimum duration charge of 30 days - Objects can be stored for less, but the minimum billing always applies.

-It has a minimum capacity charge of 128KB per object.

So this class is cost-effective for data, as long as you don’t access the data very often, or you don’t need to store it short term, or you don’t need to store lots of tiny objects.

-S3 Standard-IA should be used for long-lived data, which is important but where access is infrequent.

25
Q

S3 Object Storage Classes - S3 One Zone-IA

A

-It’s cheaper to storage than S3 Standard or Infrequent Access.

-Data is only stored in one AZ - It does not provide the multi-AZ resilience model.
-Provides 11 9s of durability. (Availability)

-It has a per GB data retrieval fee (higher tha Standard), overall cost increases with frequent data access.

-It has a minimum duration charge of 30 days - Objects can be stored for less, but the minimum billing always applies.

-It has a minimum capacity charge of 128KB per object.

-S3 One Zone-IA should be used for long-lived data, which is NON-CRITICAL & REPLACEABLE and where access
is INFREQUENT.

26
Q

S3 Object Storage Classes - S3 Glacier - Instant Retrieval

A

-Data is replicated at least in 3 AZs. (able to cope with multiple AZ failure)
-Provides 11 9s of durability. (Availability)

-It has a per GB data retrieval fee (higher tha Standard), overall cost increases with frequent data access.

-It has a minimum duration charge of 90 days - Objects can be stored for less, but the minimum billing always applies.

-It has a minimum capacity charge of 128KB per object.

S3 Glacier Instant should be used for LONG-LIVED DATA, accessed ONCE per QUARTER with MILLISECOND ACCESS

27
Q

S3 Object Storage Classes - S3 Glacier - Flexible Retrieval

A

-Data is replicated at least in 3 AZs. (able to cope with multiple AZ failure)
-Provides 11 9s of durability. (Availability)

-Objects cannot be made publicly accessible… any access of data (beyond object metadata) requires a retrieval process.

-When you retrieve Objects from this class, they’re stored in the S3 Standard-IA class on a temporary basis. You access them and then they’re removed.
-You can retrieve them permanently by changing the class back to one of the S3 ones.

Retrieval jobs come in three different types:

  • Expedited (1-5 minutes)
  • Standard (3-5 hours)
  • Bulk (5-12 hours)
    Faster = More Expensive

-It means it has a first byte latency of minutess or hours.

-It has a minimum duration charge of 90 days - Objects can be stored for less, but the minimum billing always applies.

-It has a minimum capacity charge of 40KB per object.

S3 Glacier Flexible Retrieval is for situations where you need to store archival data, where frequent or real-time access isn’t needed, for example: yearly access, and you’re OK with minutes to hours for retrieval operations.

28
Q

S3 Object Storage Classes - S3 Glacier - Glacier Deep Archive

A

-It’s the cheapest storage option

-Data is replicated at least in 3 AZs. (able to cope with multiple AZ failure)
-Provides 11 9s of durability. (Availability)

-It has a minimum duration charge of 180 days - Objects can be stored for less, but the minimum billing always applies.

-It has a minimum capacity charge of 40KB per object.

-Objects cannot be made publicly accessible… any access of data (beyond object metadata) requires a retrieval process.

-When you retrieve Objects from this class, they’re stored in the S3 Standard-IA class on a temporary basis. You access them and then they’re removed.
-You can retrieve them permanently by changing the class back to one of the S3 ones.

Retrieval jobs come in three different types:

  • Standard (12 hours)
  • Bulk (48 hours)

-It means it has a first byte latency of hours or days.

S3 Glacier Flexible Deep Archive should be used for data which is archival, which rarely, if ever, needs to be accessed and where hours or days is tolerable for the retrieval process.

It’s more suited for secondary long-term archival backups or data which comes under legal or regulatory requirements in terms of retention length.

29
Q

S3 Object Storage Classes - Intelligent-Tiering

A

It’s a storage class which contains five different storage tiers.

With it, when you move objects into this class, there are a range of ways that an Object can be stored.

  • Frequent Access = costs the same as S3 Standard
  • Infrequent Access = costs the same as S3 Standard-IA
  • Archive Instant Access = comparable costs to S3 Glacier Instant
  • Archive Access = comparable costs to S3 Glacier Flexible
  • Deep Archive = comparable costs to S3 Glacier Deep Archive

-You don’t have to worry about moving objects between tiers, with Intelligent-Tiering system does this for you.

-It will monitor the usage of the object..

If the object is in regular use, it would stay within the Frequent Access tier.

-Intelligent-Tiering monitors and automatically moves any objects not access for 30 days to a low cost infrequent access tier and eventually to archive instant access, archive access or deep archive tiers.

-You can also add a configuration, based on a bucket (prefix or object tag), any objects which are accessed less frequently can be moved into the three Archive tiers.

– Archive Instant Access (90 days)

–To move objects to even “Colder” tiers when objects aren’t accessed (90 > 270) - Archive Access (OPTIONAL)

–To move objects to even “Colder” tiers when objects aren’t accessed (180 > 730) - Deep Archive (OPTIONAL)

Using these two optional tiers, means that your applications must support these tiers, because retrieving objects requires specific API calls.

-Intelligent-Tiering has a monitoring and automation cost per 1,000 objects. (management fee)

Intelligent-Tiering should be used for LONG-LIVED DATA, with CHANGING or UKNNOWN patterns.

30
Q

S3 Bucket Keys

A

W/O Bucket Keys

Are a way to help S3 scale and reduce costs when using KMS encryption

-Each Object “Put” using sse-kms uses a unique DEK ***

-Unique DEK is stored with the Object

-Each DEK is an API call to KMS

For every single, object that the user uploads, it needs a single unique call to KMS to generate a DEK to return that DEK to S3, use that key to encrypt the object and then store the two side by side.

-Calls to KMS have a cost & levels where throttling occurs -To generate DEK operations can only be run either 5,500 or 10,000 or 50,000 p/s shared across regions. (number depends on which regions you use)

-Using a single KMS key results in a “scaling limit” for PUTS per second per key

W/ Bucket Keys

Same architecture, but instead of the KMS key being used to generate each individual DEK, instead it’s used to generate a time limited bucket key. ***

-This is given to the Bucket

-This is used for a period of time to generate any DEKs within the bucket, for individual object encryption operations

-Bucket keys significantly reduces KMS API calls - reducing cost and increasing scalability

-Not Retroactive, only effects objects after enabled on bucket

Things to keep in mind, when using Bucket Keys.

-CloudTrail KMS events now show the Bucket.

After you enable an S3 Bucket Key, if you’re using CloudTrail to look at KMS logs, then those logs are going to show the Bucket ARN instead of your Object ARN

-Works with cross/same region replication

When S3 replicates an encrypted object, it preserves the encryption settings of that encrypted object.

-If replicating plaintext to a bucket using Bucket Keys, the object is encrypted at the destination side (ETAG changes)

31
Q

S3 Lifecycle Configuration

A

You can create Lifecycle rules on S3 Buckets, which can automatically transition or expire Objects in the Bucket, they are a great way to optimize the cost for larger S3 Buckets.

-Is a set of rules
-Rules consist of actions, which apply based on criteria (Do X if Y is true)
-You can apply them on a Bucket or groups of objects (define by prefix or tags)

Theres two types of actions:

-Transition Actions = Which change the storage class of which ever Object or Objects are affected. You could transition Objects from S3 Standard to for example: S3 IA, after 30 days.

-Expiration Actions = Which can delete which ever Object or Objects versions are affected. You might want to expire Objects or versions entirely, after a certain time period. (This could be useful to keep Buckets tighty)

Both of these could work on versions if you have a version enabled Bucket.

Lifecycle Configuration offer a way to automate deletion of Objects or Objects Versions or change the class of Objects, to optimize costs overtime.

-Rules can be based on access.

32
Q

Transition Process

A

This process works like a “waterfall”.

-Transitions can’t happen on a upwards direction, only down.

There are some restrictions:

-Be careful when transitioning smaller objects, from Standard to IA to Intelligent-Tiering or One-Zone IA (because of the minimum in those classes) (Smaller Objects can cost more because of minimum size)

-There’s a 30-Day minimum period, where an Object needs to remain on S3 Standard before then moving to IA. (APPLIES ONLY WHEN USING LIFECYCLE CONFIGURATIONS)

-If you want to create a single rule which transitions Objects from Standard to IA or One Zone-IA and THEN to Glacier classes… you have to wait an additional of 30-days before then transitioning those Objects to any of the Glacier classes.

33
Q

S3 Replication

A

A feature which allows you to configure the replication of Objects between a source and destination S3 Bucket.

There are two types of replication supported by S3:

-Cross-Region Replication (CRR) = Allows the replication of Objects from a source Bucket to a destination Bucket, in different AWS Regions.

-Same-Region Replication (SRR) = Allows the replication of Objects from a source Bucket to a destination Bucket, in the same AWS Region.

The architecture for both types of replication is simple, it only differs depending on wether the Buckets are in the same AWS account or not. (Both types of replication supports both)

-In both cases, the replication configuration is applied to the source Bucket.

The replication configuration configures S3 to replicate from the source Bucket to a destination Bucket, and it specifies a few important things:

  • Destination Bucket to use.
  • IAM role to use for the process.. the role is configured to allow the S3 service to assume it. (Trust Policy)
    The roles Permissions Policy gives it the permission to read Objects on the source Bucket and permisisons to replicate those Objects to the destination Bucket.
  • The replication is encrypted (SSL)

Inside ONE account, both S3 Buckets are owned by the same AWS account. (They both trust the account)

In different AWS account, the destination Bucket because it’s in a different account doesn’t trust the source account or the role that’s used to replicate the Bucket content.

There’s a requirement if you want to configure a replication between different accounts:

-Add a Bucket Policy on the destination Bucket, which allows the role in the source account to replicate Objects into it.

34
Q

S3 Replication Options

A

-The default is to replicate All Objects within a Bucket or a subset (filter objects by prefix or tags or both)

-Select which storage class, the Objects will use in the destination Bucket. - Default is to maintain the same class

-You can define the ownership of the Objects in the destination Bucket. - Default is to maintain the same owner (source account)

  • Replication Time Control (RTC) = This is a feature which adds a guaranteed 15-minute replication SLA on to this process. Even adds extra monitoring to see which Objects are queued to replication. (Only use if you got really strict set of requirements)
35
Q

S3 Replication Considerations

A
  • Replication is NOT retroactive & Versioning needs to be ON. (in both Buckets) (If you enable replication on a Bucket that already has Objects, those won’t be replicated )

-It’s a ONE-WAY replication process from source to destination.

  • Replication is capable of handling Objects, which are unencrypted and encrypted using SSE-S3 & SSE-KMS (with extra config)
    -Not capable of replicating Objects, that are using SS3-C, because S3 is not in possesion of the keys in order to access the plaintext version of the Object.

-Replication requires the source bucket owner needs permissions to Object.

-It will not replicate system events (so if any changes are made in the source Bucket by Lifecycle Manament, they will not be replicated to the destination Bucket, only User events are) and they can’t be in Glacier or Glacier Deep Archive classes.

-By default DELETES are NOT replicated. (Delete markers) (You can enable it)

36
Q

Why use Replication?

A

-SRR - Log Aggregation = So if you got multiple different S3 Buckets, which stores logs in different systemsy, then you could use this, to aggregate those logs to a single S3 Bucket.

-SRR - PROD and TEST Sync = To configure some sort of synchronization between Production and Test accounts or maybe replicate data between them periodically.

-SRR - Resilience with strict sovereignty requirements = There are some countries and sectors which can not have data leaving the specific AWS region, because of sovereignty requirements. (Account level isolation)

-CRR - Global Resilience Improvements = So you can have backups copied in another AWS regions.

-CRR - Latency Reduction = For customers in another region

37
Q

S3 Presigned URLs

A

Are a way that you can give another person or an application, access to an Object inside a S3 Bucket, using our credentials in a safe and secure way. (Access to a private Bucket with no public permissions)

There are 3 common solutions to give an unauthenticated user access to a Bucket:

  • Give an AWS identity
  • Provide AWS Credentials
  • Make it Public

If the user only needs short-term access to the bucket, NONE of these are ideal. For this case you should use Presigned URLs.

How does it work?

An authenticated user with enough permissions, can request S3 to generate a presignedURL, but it would need to provide security credentials, specify a Bucket name, an Object key, and an expired date in time as well as indicate, how the Object would be accesed. Then S3 would create the presignedURL and return it to the user.

The presignedURL can then be passed to an Unauthenticated Principal, so it can used to access the specific Object in a specific Bucket, until it expires. When the person uses the presignedURL, that person is actually interacting with S3 as the “authenticated user” who generated it.

Can be used for:

  • Downloads from S3 (PUT)
  • Uploads to S3 (GET)
  • As part of a Serverless architecture, where access to a private S3 Bucket, needs to be controlled and you don’t want to run thick applications servers to broker that access.
  • Can be used for web applications that have offloaded their media files to S3. Since a lot of unauthenticated users have to access it, you can use the presignedURL to give them access. (When you offload media into S3)
  1. User requests.
  2. Application Server with an authenticated IAM identity, requests a presignedURL to S3.
  3. S3 generates the presignedURL to access the “Media Bucket”.
  4. S3 then returns the presignedURL to the Application Server and through the end user.
  5. The Web Application that’s running on the user’s computer will use the presignedURL to securely access the particular Object stored in the “Media Bucket”.
38
Q

Presigned URLs EXAM POWERUP

A
  • You can create a URL for an object you have NO ACCESS TO.
  • When using the URL, the permissions match the identity which generated it… (so if he has permissions, you will be able to see it, if not, you wouldn’t)
  • Access denied could mean the generating ID NEVER HAD ACCESS… or DOESN’T NOW.
  • DON’T GENERATE WITH A ROLE.. URL stops working when temporary credentials expire.
39
Q

S3 Select and Glacier Select

A

Are ways where you can retrieve parts of Objects, rather than the entire Object.

-S3 can store HUGE objects (up to 5TB) and infinite number of objects.

-You often want to retrieve the ENTIRE OBJECT.

-Retrieving a 5TB Object.. TAKES TIME, IT CONSUMES 5TB. **

-Filtering at the client side DOESN’T REDUCE THIS. **

-S3/Glacier select let you use SQL-Like statements. **

So you create this, supply it to that service, and that service uses this SQL-like statement to select parts of that Object and THIS part and ONLY this part is sent to the client in a PRE-FILTERED way by S3. (You only consume the pre-filtered part)

-Allow you to operate on a number of files format, with this level of functionality.

CSV, JSON, Parquet, BZIP2 compression for CSV and JSON

  • w/o S3 Select Filtering occurs in-app full amount retrieved and billed by S3
  • w/ S3 Select Filtering occurs ON THE SERVICE the data delivered by S3 is PRE-FILTERED

In this way, we can achieve faster speeds (~400%+) and (~80%) faster, because the filtering occurs before it’s transfered to the application.

40
Q

S3 Events

A

This is a feature which allows you to create event notifications configurations on a Bucket.

-Notification are generated when certain events occur in a Bucket.

-Can be delivered to SNS, SQS and Lambda Functions. (We have to add a Resource Policy allowing S3 principal access)

This means you can have event-driven processes which occur, as a result of things happening within S3.

Different types of events are supported:

-Object Created (Put, Post, Copy, CompleteMultiPartUpload)
-Object Deletion (*, Delete, DeleteMarkerCreated)
-Object Restore (Post (Initiated), (Completed))
-Replication (OperationMissedThreshold, OperationReplicatedAfterThreshold, OperationNotTracked, OperationFailedReplication)

41
Q

S3 Access Logs

A

-We have to enable this feature on the Bucket

-To perform this, you’ll need a Source Bucket and a Target bucket.

-It registers Bucket and Object Access

-The logging is managed by S3 Log Delivery Group, which reads the logging configuration you set on the source Bucket

Best efforts log delivery, accesses to Source Bucket are usually logged in Target bucket within a few hours. To use this feature you’ll need to give access to the Target Bucket, and this is done by using a Bucket ACL, which allows the “S3 Log Delivery Group”.

And this is how, it can deliver the logs generated on the Source Bucket to the Target Bucket.

-Logs are delivered as “Log Files”, which consist of Log Records, Records are newline-delimited, each record consists of Attributes (such as date/time, the requester, the operations, error codes and much more) and these are space-delimited.

A single Target Bucket can be used for many Source Buckets and you can separate these easily, using prefixes in the Target Bucket. (This is configured within the logging configuration which is set on the Source Bucket)

-Access Logging provides detail information of the requests, which are made to a Source Bucket and they are useful to many applications. (For example: Security Functions and Security Audits)

It can also help you to understand the access patterns of your customers based and understand any charges on your S3 bill.

-If you use this feature, you’ll need to personally manage the lifecycle or deletion of any of the Log Files.

42
Q

S3 Access Points

A

A feature of S3 which improves the manageability of S3 Buckets, especially when you have Buckets which are used by many different teams or users, or when Buckets store objects with a wide range of functions.

-Simplify managing access to S3 Buckets/Objects

-Rather than 1 Bucket w/ 1 Bucket Policy…

-You can create many access points for a Bucket and each of these can have different policies

So different access controls from a permissions perspective

-Each with different network access controls - can be limited, in terms of where they can be accessed from

-Each access point has its own endpoint address

-Create access points, via Console or “aws s3control create-access-point” –name secretcats –acount-id 123456789012 –bucket catpics ***

43
Q

S3 Access Points - Architecture

A

-Resource (Bucket) Policy applied to the Bucket are large and difficult to manage

-Access points w/ Internet Origin / Access points w/ VPC Origin

-Each Access Point has a unique DNS address for network access (This DNS address that we would give to our staff)

-Each Access Point has its own policy

Access point policies control permissions for access via Access point & is functionally equivalent to a bucket policy. Access Point policy can restrict identities to certain prefix(s), tags or actions based on need.

-From the VPC side, Access Points can be set to only allow VPC origin (Tied to a specific VPC) - Requires a VPC Endpoint. Access via this route can be enforced by endpoint policies

-Any permissions defined on an Access Point need to be also defined on the Bucket Policy - Matching permissions or delegation (on the bucket policy you grant wide open access via the access point)