Simple Storage Service (S3) Flashcards
S3 Security
-S3 is private by default => the only identity which has any initial access to an S3 bucket, is the account root user of the account which owns/created that bucket. Anything else, so any other permissions, have to be explicitly granted.
And there’s a few ways, that this can be done:
-S3 Bucket Policies
–A form of resource policy => A resource policy is just like an identity policy, they’re attached to resources instead of identities.
–Provide a resource perspective on permissions
The difference between resource policies and identity policies:
Identity policies = You’re controlling what that identity can access. These have one limitation…. You can only attach identity policies to identities in your own account
Resource policies = You’re controlling who can access that resource.
–Resource policies ALLOW/DENY access from the SAME account or DIFFERENT accounts.
Since the policy is attached to the resource, it can reference any other identities inside that policy.
–Resource policies ALLOW/DENY access to Anonymous principals.
Resource policies can be used to open a bucket to the world by referencing all principals, even those not authenticated by AWS.
They have one major difference to identity policies and that’s the presence of an explicit “Principal” component. The principal part of a resource policy defines which principals are affected by the policy.
–Bucket policies can be used to control who can access objects, even allowing conditions which block specific IP addresses.
–There can only be ONE bucket policy on a bucket, but it can have multiple statements.
If an identity inside one AWS account is accessing a bucket also in that same account, then the effective access is a combination of ALL of the applicable identity policies, plus the resource policy, so the bucket policy. For any anonymous access, so access by an anonymous principal, then only the bucket policy applies.
If you’re doing cross-account access, the identity in their account needs to be able to access S3 in general and your bucket, and then your bucket policy needs to allow access from that identity, so from that external account.
Access Control Lists (ACLs)
-Are a way to apply security to objects or buckets.
-They’re a sub-resource of that object/bucket
S3 Subresource only exist in the context of a specific bucket or object. S3 Subresource provides support to store, and manage the bucket configuration information and object-specific information.
Bucket Subresources: S3 Object Lifecycle Management, S3 Bucket Versioning, Static Website hosting, Bucket Policy and ACL (access control list), Bucket ACL, CORS (cross-origin resource sharing), Logging - S3 Access Logs, Tagging, Location, Notification.
Object Subresources: Object ACL
-They’re Legacy (AWS don’t recommend their use and prefer that you use Bucket Policies)
-Inflexible & only allow simple permissions
They can’t have conditions like bucket policies, so your’re restricted to some very broad conditions.
Which permissions can be controlled using an ACL:
What these five things do depend on, if they’re applied to a bucket or an object.
- READ
- WRITE
- READ_ACP
- WRITE_ACP
- FULL_CONTROL
It’s significantly less flexible than an identity or a resource policy.
You don’t have the flexibility of being able to have a single ACL that affects a group of objects.
Block Public Access
This feature provides settings for access points, buckets, and accounts to help you manage public access to Amazon S3 resources. By default, new buckets, access points, and objects don’t allow public access. However, users can modify bucket policies, access point policies, or object permissions to allow public access. S3 Block Public Access settings override these policies and permissions so that you can limit public access to these resources.
With S3 Block Public Access, account administrators and bucket owners can easily set up centralized controls to limit public access to their Amazon S3 resources that are enforced regardless of how the resources are created.
When Amazon S3 receives a request to access a bucket or an object, it determines whether the bucket or the bucket owner’s account has a block public access setting applied. If the request was made through an access point, Amazon S3 also checks for block public access settings for the access point. If there is an existing block public access setting that prohibits the requested access, Amazon S3 rejects the request.
EXAM POWER UP
- Identity: Controlling different resources
- Identity: You have a preference for IAM
- Identity: Same account
- Bucket: Just controlling S3
- Bucket: Anonymous or Cross-Account
- ACLs: NEVER - unless you must
S3 Static Website Hosting
-We’ve been accessing S3, via the normal method, which is using the AWS APIs
For instance, to access any objects within S3, we’re using the S3 APIs. Assuming we’re authenticated and authorized, we use the get object API call to access those resources. (Secure and Flexible)
-This feature allows access via standard HTTP - e.g Blogs..
-You enable it and doing so, you have to set an Index document and an Error document.
So in enabling static website hosting on an S3 bucket, we have to point the Index document (usually the entry point of a website) at a specific object in the S3 bucket.
The Error document is the same but it’s used when something goes wrong. So if you access a file which isn’t there, or there is another type of service side error, that’s when the Error document is shown.
Both of these need to be HTML documents, because the static website hosting feature, delivers HTML files.
-When you enable it, AWS creates a Website Endpoint.
This is a specific address that the buket can be accessed from using HTTP. The name of this endpoint, is influenced by the bucket name that you choose and the region that is in.
You can use your own custom domain name for a bucket, but if you want to do that, then your bucket name matters. You can only use your custom domain name if the name of the bucket matches the domain.
There are two specific scenarios which are perfect for S3:
-Offloading
If you have a website hosted on a compute service (EC2), they can benefit from offloading their storage to Amazon S3.
What we can do is, we can take all of the media that the compute service hosts, and we can move that media to an S3 bucket that uses static website hosting.
Then when the compute service generates the HTML file and delivers this to the customer’s browser, this HTML file points at the media that’s hosted on the S3 bucket. So the media is retrieved from S3, not the compute service.
S3 is likely much cheaper for the storage and delivers of any media versus a compute service. (S3 is designed for the storage of large data at scale)
-Out-of-band pages
Is a method of accessing something that is outside of the main way.
So for example, you might use out-of-band server management and this lets you connect to a management card that’s in a server using the cellular network. That way, if the server is having networking issues, with the normal access methods (normal network), then you can still access it.
We can use this as a maintenance page or unscheduled maintenance periods.
S3 Pricing
-Storage = You pay for storing objects in your S3 buckets. The rate you’re charged depends on your objects’ size, how long you stored the objects during the month, and the storage class
-Request & data retrievals = You pay for requests made against your S3 buckets and objects. S3 request costs are based on the request type, and are charged on the quantity of requests
-Data Transfer
You pay for all bandwidth into and out of Amazon S3, except for the following:
–Data transferred out to the internet for the first 100GB per month, aggregated across all AWS Services and Regions (except China and GovCloud)
–Data transferred in from the internet.
–Data transferred between S3 buckets in the same AWS Region.
–Data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket (including to a different account in the same AWS Region).
–Data transferred out to Amazon CloudFront (CloudFront).
Object Versioning
-Is something which is controlled at a bucket level
-It starts off at a disabled state, you can optinnally enable versioning on a disabled bucket, but once enabled, you cannot disable it again. What you can do is, suspend it, and if desired, a suspended bucket can be re-enabled.
Without versioning enabled on a bucket, each object is identified solely by the object key, it’s name, which is unique inside the bucket.
If you modify an object, the original version of that object is replaced.
Versioning lets you store multiple versions of an object within a bucket. Any operations which would modify objects generate a new version.
-There’s an attribute of an object and it’s the ID of the object.
When versioning on a bucket is disabled, the ID of the objects in that bucket as set to “null”.
If you upload or put a new object into a bucket with versioning enabled, then S3 allocates the ID to that object, for example: id = 11111.
If any modifications are made to this object, S3 will allocate a new ID to the newer version, and it retains the old version. The newest version of any object in a version-enabled bucket is known as “current version”.
You can request an object from S3 and provide the ID of a specific version, to get that particular version back, rather than the current version. (Versions cna be individually accessed by specifying the ID, and if you don’t specify then it’s assumed that you want the current version)
-Versioning also impacts deletions
If we indicate to S3 that we want to delete the object and we don’t give any specific version ID, then S3 will add a new special version of that object, known as a “delete marker”.
The delete marker is a special version of an object, which hides all previous versions of that object. But you can delete the “delete marker”, which essentially undeletes the object, returning the current version to being active again.
If you want to truly delete the object, you have to specify the particular version ID.
If you’re truly deleting the current version of the object, then the next most current version, becomes the current version.
IMPORTANT POINTS OF S3 VERSIONING
-Cannot be switched off - only suspended
-Space is consumed by ALL versions
-You are billed for ALL versions
-Only way to 0 costs - is to delete the bucket
-Suspending it, doesn’t actually remove any of the old versions, so you’re still billed for them.
MFA Delete
-Enabled in Versioning configuration on a bucket
-When you enable MFA Delete, it means MFA is required to change bucket versioning states.
-MFA is required to delete versions.
How it works is that you provide:
-Serial number (MFA) + Code passed with API CALLS
S3 Performance Optimization
“It’s often about performance and reliability combined and this is especially relevant, when we’re talking about a distributed organization”
Features which help us in this regard:
- Single PUT Upload
- Multipart Upload
- S3 Accelerated Transfer
Single PUT Upload
We know from the “Animals for Life” scenario, that remote workers need to upload large data sets and do so frequently, and we know that they’re often on unreliable internet connections.
-By default, when you upload an object to S3, it’s uploaded as a single data stream to S3.
A file becomes an object, and it’s uploaded using the PutObject API call and placed in a bucket, and this all happens as a single stream.
-If the stream fails, the upload fails
-Requires a full restart
Any delay can be costly and potentially risky.
-Speed & Reliability of the upload will always be limited, because of this single stream of data
“Single stream transfer can often provide much slower speeds than both ends of that transfer are capable of”
-If you utilize a single PUT upload, then you’re limited to 5GB of data as a maximum. (AWS Limit)
Multipart Upload
-Improves the speed and reliabilty of uploads to S3, and it does this by breaking data up into individual parts.
-The minimum size for using multipart upload is 100MB.
-The upload can be split into a maximum of 10.000 parts, and each part can range in size between 5MB-5GB.
-The last part can be smaller than 5MB.
-Each individual part is treated as it’s own isolated upload - each part can fail in isolation and be restarted in isolation, rather than restarting the whole thing.
-Improves the transfer rate = speeds of all parts.
S3 Accelerated Transfer
To understand it, first, it’s required to understand how global transfer works to S3 buckets.
We have no control over the public internet data path, routers and ISPs are picking this path based on what they think is best, and potentially commercially viable. That doesn’t always align with what offers the best performance.
So using the public internet for data transit is never an optimal way to get data from source to destination.
S3 Transfer Acceleration uses the network os AWS Edge Locations, which are located in lots of convenient locations globally. An S3 bucket needs to be enabled for transfer acceleration, the default is that it’s switched OFF, and there are some restrictions for enabling it.
-The bucket name can not contain periods and it needs to be DNS compatible in it’s naming
Once enabled, data being uploaded, instead of going back to S3 bucket directly, it immediately enters the closest best performing AWS Edge Location. Now this part does occur over the public internet, but geographycally, it’s really close.
At this point, the Edge Locations transit the data being uploaded over the AWS global network, a network which is directly under the control of AWS, and this tends to be a direct link, between these Edge Locations and other areas of the AWS global network, in this case the S3 bucket.
-The internet, it’s not designed primarily for speeds, it’s designed for flexibility and resilience.
-The AWS network purpose is built to link regions to other regions in the AWS network. (It’s like an express train, only stoping at the source and destionation) (It’s much faster and with lower, consistent latency)
Key Management Service (KMS)
AWS KMS is a secure and resilient service that uses hardware security modules that have been validated under FIPS 140-2, or are in the process of being validated, to protect your keys.
-Regional & Public Service
-Capable of multi-region features
-It lets you create, store and manage cryptographic keys (keys which can be used to convert plaintext to ciphertext, and vice versa)
-Capable of handling Symmetric and Asymmetric keys
-Capable of performing cryptographic operations (encrypt, decrypt & …)
-Keys never leave KMS - Provides FIPS 140-2 (L2) = US security standard, it’s often a key point of distinction between using KMS versus using something like CloudHSM.
(ASSUME HE IS TALKING ABOUT SYMMETRIC KEYS)
-The main type of key that KMS manages are KMS keys (also referred to as Customer Master Keys (CMK))
These KMS keys are used by KMS within cryptographic operations = You can use them, applications can use them, and other AWS services can use them.
They are logical, think of them as a container for the actual physical key material, and this is the data that really makes up the key.
-KMS keys contains - ID, data, policy, description & state
-Every KMS key is backed by physical key material
It’s this data which is held by KMS and it’s this material, which is actually used to encrypt and decrypt things that you give to KMS.
-The physical key material can be generated by KMS or imported into KMS
This material contained inside a KMS key can be used to directly encrypt or decrypt data up to 4KB in size
Data Encryption Keys (DEKs)
Are another type of keys which KMS can generate.
-They are generated by using a KMS key using the GenerateDataKey operation.
- GenerateDataKey can encrypt or decrypt data > 4KB
- GenerateDataKey are linked to the KMS key which created them. (KMS can tell that a specific data encryption key was created using a specific KMS key)
- KMS doesn’t store the data encryption key in any way.
It provides it to you or the service using KMS and then it discards it.
The reason it discards it, is that KMS doesn’t actually do the encryption or decryption of data using DEKs, you do or the service using KMS performs those operations.
When a DEK is generated, KMS provides you with two versions of that DEK:
-Plaintext Version = Something which can be used immediately to perform cryptographic operations
-Cyphertext Version = Can be given back to KMS for it to be decrypted (Encrypted by using the KMS key that generated it)
-Encrypt data using plaintext key.
-Once finished with that process, discard the plaintext version of that DEK.
-Store encrypted key (Ciphertext) with that data encrypted data.
Decrypting that data is simple, you pass the encrypted data encryption key back to KMS and ask for it to decrypt it, using the same KMS key used to generate it.
Then you use the decyrpted data encryption key that KMS gives you back and decrypt the data with it and then you discard the decrypted data encryption key.
–S3 generates DEK for every single object
-KMS doesn’t track the usage of DEK
Key Concepts - KMS
-KMS keys are isolated to a region & never leave
-Supports Multi-region keys (if required) (Keys are replicated to other regions)
-Keys are either AWS owned or Customer owned.
AWS owned keys are a collection of KMS keys that an AWS service owns and manages for use in multiple AWS accounts. They operate in the background
-Customer owned keys have two types:
–Customer Managed = Created by the customer to use directly in an application or within an AWS service.
–AWS Managed = Created automatically by AWS, when you use a service such as S3
-Customer Managed keys are more configurable
You can edit the key policy which means you could allow cross-account access, so that other AWS account can use your keys.
-KMS keys support rotation
Rotation is where physical backing material, so the data used to actually do cryptographic operations is changed.
With AWS managed keys, this can’t be disabled. It’s set to rotate approximately once per year.
With Customer managed keys, rotation is optional, it’s enabled by default and happens approximately once every year.
-KMS keys contains the Backing Key (and previous backing keys caused by rotation)
It means that as a key is rotated, data encrypted with old verisons can still be decrypted.
-You can create Aliases, which is shortcuts to keys (also per region)
Key Policies and Security
- Key policies (resource)
-Every KMS key has one, and for Customer managed keys, you can change it.
-KMS has to be explicitly be told that keys trust the AWS account that they’re contained within.
-Key policies (trust the account) & IAM policies (let IAM users interact with the key)
In high security environments, you might want to remove this account trust and insist on any key permissions being added inside the key policy.
S3 Encryption
“Buckets itself aren’t encrypted, Objects are”
“Each Object inside a Bucket could be using different encryption settings”
There are two main methods of encryption that S3 is capable of supporting:
- Client-side Encryption
The objects being uploaded, are encrypted by the client before they ever leave. (Means that the data is Ciphertext
the entire time)
From AWS’s perspective, the data is received in a scrambled form and stored in a scrambled form. (AWS can’t see the data)
You own and control the keys, the process, and any tooling. So if your organization needs all of these, then you need
to utilize client-side encryption.
- Server-side Encryption
Here, even though the data is encrypted in transit using HTTPS, the objects themselves aren’t initially encrypted.
Once this data, which is in plain text form, reaches the S3 endpoint, is still plain text. Once the data hits S3, then
it’s encrypted by the S3 infrastructure.
Using this method, you allow S3 to handle some or all of those processes.
Both of these methods use encryption in transit, between the user side and S3 (like an encrypted tunnel)
“Both of these, refer to Encryption at Rest”
The two components of Server-side encryption are:
-Encryption and Decryption process = So taking plain text, a key and an algorithm and generating cipher text and the
reverse.
-The generation and management of the cryptographic keys.
Types of Server-side Encryption
-Server-Side Encryption with Customer-Provided Keys (SSE-C)
The customer is responsible for the encryption keys that are used for encryption and decryption and the S3 service
manages the actual encryption and decryption process.
You are essentially offloading the CPU requirements for this process but you still need to generate and manage the key
or the keys, that this S3 encryption process will use.
When you put an object(plaintext) into S3, you’re required to provide the key. So when this object and the encryption
key arrive at the S3 endpoint, the object is encrypted using the key and at the same time, a hash of the key is taken
and attached to the object. (the key is discarded)
The hash is one-way, it can’t be used to generate a new key, but if a key is provided during dencryption, the hash can
identify if the specific key is the one that was used or not to encrypt that object. (Safety feature)
If you want to decrypt the object, you need to provide the same key that was used to encrypt it.
-Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) (AES-256)
AWS manages both encryption and decryption processes, as well as the key generation and management.
With this method, when you put an Object into S3, you just provide the plain text data. S3 creates a root key to use for the encryption process. (handled end to end by AWS)
When an Object is uploaded to S3, using SSE-S3, it’s actually encrypted by a key that’s unique for every single object. (AWS generates this key)
S3 uses that key to encrypt that text object, then the root key is used to encrypt that unique key and the original unencrypted version of that key is discarded. The encrypted Object and the encrypted unique key is stored side by side in S3.
Since it’s very less admin overhead, it does present three significant problems:
-If you are in a regulatory environment, where you need to control the keys that are used and control access to those keys, then this isn’t usable.
-If you need to be able to control the rotation of the key.
-If you need role separation = Is that a full S3 admin, so somebody who has full S3 permissions to configure the bucket to manage the objects, he can also decrypt and view data.
-Server-Side Encryption with KMS KEYS stored in AWS Key Management Service (SSE-KMS)
AWS handles both the keys and the encryption process, but unlike SSE-S3, the root key is handled by a separate service. (KMS)
When you upload an Object and pick SSE-KMS for the first time, S3 liases with KMS and creates a AWS managed KMS key. This is the default key, which gets used when using SSE-KMS in the future.
Everytime an Object is uploaded, S3 uses a dedicated key to encrypt that object and that key is a data encryption key, which KMS generates using the KMS key. S3 is provided with a plain text version of the data encryption key as well as an encrypted one. The plain text one that’s used to encrypt the object and then it’s discarded. The encrypted data encryption key is stored along with the encrypted Object.
Every Object, which is uploaded and encrypted with SSE-KMS requires a KMS key. This KMS key is used to generate one unique data encryption key for every object that’s encrypted using SSE-KMS.
You don’t have to use the Default KMS key that S3 creates, you can pick to use your own customer managed KMS key. This means you can control the permissions on it and the rotation of the key material.
You can also have logging and auditing on the KMS key itself.
The best benefit provided by SSE-KMS is the role separation = To decrypt any object in a bucket, where those objects have been encrypted with SSE-KMS, you need access to the KMS key that was used to generatethe unique key.
The KMS key is used to decrypt the data encryption key for the object and then that decrypted data encryption key, decrypts the object itself. (If you don’t have access to KMS, you can’t access the Object)
Default Bucket Encryption
When you’re uploading Objects to S3, you’re actually utilizing the “PutObject” operation. As part of this operation, you can specify a Header, which is “x-amz-server-side.encryption”. This is how you direct AWS to use server-side encryption.
-If you don’t specify this header, then Objects will not use encryption.
-If you do specify this header, if you use AES256 then this utilizes SSE-S3. If you use aws:kms then this utilizes SSE-KMS
You can set one of these types as default, so when you don’t specify a header, it will still use thattype of encryption.
For example: DEFAULT = AES256 (SSE-S3)
S3 Object Storage Classes - S3 Standard
-This is the default storage of S3. So if you don’t specify the storage class, this is what you’re going to use.
-Data is replicated at least in 3 AZs (able to cope with multiple AZ failure)
-Provides 11 9s of durability (Availability)
-Replication uses MD5 checksums together with Cyclic Redundancy Checks (CRCs), to detect and resolve any data issues.
-When objects are uploaded to S3, have been stored durably, S3 responds with a HTTP 1.1 200 OK status.
-You are billed a GB/m fee for data stored.
-A dollar per GB charge for transfer OUT(IN is free) and price per 1,000 requests made to the product.
-No specific retrieval fee, no minimum duration, no minimum size.
-Makes data accessible inmediately = It has a milliseconds first byte latency and objects can be made publicly available. (When data is requested it’s available within milliseconds)
-S3 Standard should be used for Frequently Accessed Data which is important and non-replaceable.
S3 Object Storage Classes - S3 Standard-IA (Infrequent Access)
-Data is replicated at least in 3 AZs. (able to cope with multiple AZ failure)
-Provides 11 9s of durability. (Availability)
-It’s about half of the price cheaper in comparison to S3 Standard. (Cost-Effective)
-You have a per request charge and a data transfer OUT cost. (Same as Standard)
-It has a per GB data retrieval fee (higher tha Standard), overall cost increases with frequent data access.
-It has a minimum duration charge of 30 days - Objects can be stored for less, but the minimum billing always applies.
-It has a minimum capacity charge of 128KB per object.
So this class is cost-effective for data, as long as you don’t access the data very often, or you don’t need to store it short term, or you don’t need to store lots of tiny objects.
-S3 Standard-IA should be used for long-lived data, which is important but where access is infrequent.