Simple Storage Service (S3) Flashcards
S3 Security
-S3 is private by default => the only identity which has any initial access to an S3 bucket, is the account root user of the account which owns/created that bucket. Anything else, so any other permissions, have to be explicitly granted.
And there’s a few ways, that this can be done:
-S3 Bucket Policies
–A form of resource policy => A resource policy is just like an identity policy, they’re attached to resources instead of identities.
–Provide a resource perspective on permissions
The difference between resource policies and identity policies:
Identity policies = You’re controlling what that identity can access. These have one limitation…. You can only attach identity policies to identities in your own account
Resource policies = You’re controlling who can access that resource.
–Resource policies ALLOW/DENY access from the SAME account or DIFFERENT accounts.
Since the policy is attached to the resource, it can reference any other identities inside that policy.
–Resource policies ALLOW/DENY access to Anonymous principals.
Resource policies can be used to open a bucket to the world by referencing all principals, even those not authenticated by AWS.
They have one major difference to identity policies and that’s the presence of an explicit “Principal” component. The principal part of a resource policy defines which principals are affected by the policy.
–Bucket policies can be used to control who can access objects, even allowing conditions which block specific IP addresses.
–There can only be ONE bucket policy on a bucket, but it can have multiple statements.
If an identity inside one AWS account is accessing a bucket also in that same account, then the effective access is a combination of ALL of the applicable identity policies, plus the resource policy, so the bucket policy. For any anonymous access, so access by an anonymous principal, then only the bucket policy applies.
If you’re doing cross-account access, the identity in their account needs to be able to access S3 in general and your bucket, and then your bucket policy needs to allow access from that identity, so from that external account.
Access Control Lists (ACLs)
-Are a way to apply security to objects or buckets.
-They’re a sub-resource of that object/bucket
S3 Subresource only exist in the context of a specific bucket or object. S3 Subresource provides support to store, and manage the bucket configuration information and object-specific information.
Bucket Subresources: S3 Object Lifecycle Management, S3 Bucket Versioning, Static Website hosting, Bucket Policy and ACL (access control list), Bucket ACL, CORS (cross-origin resource sharing), Logging - S3 Access Logs, Tagging, Location, Notification.
Object Subresources: Object ACL
-They’re Legacy (AWS don’t recommend their use and prefer that you use Bucket Policies)
-Inflexible & only allow simple permissions
They can’t have conditions like bucket policies, so your’re restricted to some very broad conditions.
Which permissions can be controlled using an ACL:
What these five things do depend on, if they’re applied to a bucket or an object.
- READ
- WRITE
- READ_ACP
- WRITE_ACP
- FULL_CONTROL
It’s significantly less flexible than an identity or a resource policy.
You don’t have the flexibility of being able to have a single ACL that affects a group of objects.
Block Public Access
This feature provides settings for access points, buckets, and accounts to help you manage public access to Amazon S3 resources. By default, new buckets, access points, and objects don’t allow public access. However, users can modify bucket policies, access point policies, or object permissions to allow public access. S3 Block Public Access settings override these policies and permissions so that you can limit public access to these resources.
With S3 Block Public Access, account administrators and bucket owners can easily set up centralized controls to limit public access to their Amazon S3 resources that are enforced regardless of how the resources are created.
When Amazon S3 receives a request to access a bucket or an object, it determines whether the bucket or the bucket owner’s account has a block public access setting applied. If the request was made through an access point, Amazon S3 also checks for block public access settings for the access point. If there is an existing block public access setting that prohibits the requested access, Amazon S3 rejects the request.
EXAM POWER UP
- Identity: Controlling different resources
- Identity: You have a preference for IAM
- Identity: Same account
- Bucket: Just controlling S3
- Bucket: Anonymous or Cross-Account
- ACLs: NEVER - unless you must
S3 Static Website Hosting
-We’ve been accessing S3, via the normal method, which is using the AWS APIs
For instance, to access any objects within S3, we’re using the S3 APIs. Assuming we’re authenticated and authorized, we use the get object API call to access those resources. (Secure and Flexible)
-This feature allows access via standard HTTP - e.g Blogs..
-You enable it and doing so, you have to set an Index document and an Error document.
So in enabling static website hosting on an S3 bucket, we have to point the Index document (usually the entry point of a website) at a specific object in the S3 bucket.
The Error document is the same but it’s used when something goes wrong. So if you access a file which isn’t there, or there is another type of service side error, that’s when the Error document is shown.
Both of these need to be HTML documents, because the static website hosting feature, delivers HTML files.
-When you enable it, AWS creates a Website Endpoint.
This is a specific address that the buket can be accessed from using HTTP. The name of this endpoint, is influenced by the bucket name that you choose and the region that is in.
You can use your own custom domain name for a bucket, but if you want to do that, then your bucket name matters. You can only use your custom domain name if the name of the bucket matches the domain.
There are two specific scenarios which are perfect for S3:
-Offloading
If you have a website hosted on a compute service (EC2), they can benefit from offloading their storage to Amazon S3.
What we can do is, we can take all of the media that the compute service hosts, and we can move that media to an S3 bucket that uses static website hosting.
Then when the compute service generates the HTML file and delivers this to the customer’s browser, this HTML file points at the media that’s hosted on the S3 bucket. So the media is retrieved from S3, not the compute service.
S3 is likely much cheaper for the storage and delivers of any media versus a compute service. (S3 is designed for the storage of large data at scale)
-Out-of-band pages
Is a method of accessing something that is outside of the main way.
So for example, you might use out-of-band server management and this lets you connect to a management card that’s in a server using the cellular network. That way, if the server is having networking issues, with the normal access methods (normal network), then you can still access it.
We can use this as a maintenance page or unscheduled maintenance periods.
S3 Pricing
-Storage = You pay for storing objects in your S3 buckets. The rate you’re charged depends on your objects’ size, how long you stored the objects during the month, and the storage class
-Request & data retrievals = You pay for requests made against your S3 buckets and objects. S3 request costs are based on the request type, and are charged on the quantity of requests
-Data Transfer
You pay for all bandwidth into and out of Amazon S3, except for the following:
–Data transferred out to the internet for the first 100GB per month, aggregated across all AWS Services and Regions (except China and GovCloud)
–Data transferred in from the internet.
–Data transferred between S3 buckets in the same AWS Region.
–Data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket (including to a different account in the same AWS Region).
–Data transferred out to Amazon CloudFront (CloudFront).
Object Versioning
-Is something which is controlled at a bucket level
-It starts off at a disabled state, you can optinnally enable versioning on a disabled bucket, but once enabled, you cannot disable it again. What you can do is, suspend it, and if desired, a suspended bucket can be re-enabled.
Without versioning enabled on a bucket, each object is identified solely by the object key, it’s name, which is unique inside the bucket.
If you modify an object, the original version of that object is replaced.
Versioning lets you store multiple versions of an object within a bucket. Any operations which would modify objects generate a new version.
-There’s an attribute of an object and it’s the ID of the object.
When versioning on a bucket is disabled, the ID of the objects in that bucket as set to “null”.
If you upload or put a new object into a bucket with versioning enabled, then S3 allocates the ID to that object, for example: id = 11111.
If any modifications are made to this object, S3 will allocate a new ID to the newer version, and it retains the old version. The newest version of any object in a version-enabled bucket is known as “current version”.
You can request an object from S3 and provide the ID of a specific version, to get that particular version back, rather than the current version. (Versions cna be individually accessed by specifying the ID, and if you don’t specify then it’s assumed that you want the current version)
-Versioning also impacts deletions
If we indicate to S3 that we want to delete the object and we don’t give any specific version ID, then S3 will add a new special version of that object, known as a “delete marker”.
The delete marker is a special version of an object, which hides all previous versions of that object. But you can delete the “delete marker”, which essentially undeletes the object, returning the current version to being active again.
If you want to truly delete the object, you have to specify the particular version ID.
If you’re truly deleting the current version of the object, then the next most current version, becomes the current version.
IMPORTANT POINTS OF S3 VERSIONING
-Cannot be switched off - only suspended
-Space is consumed by ALL versions
-You are billed for ALL versions
-Only way to 0 costs - is to delete the bucket
-Suspending it, doesn’t actually remove any of the old versions, so you’re still billed for them.
MFA Delete
-Enabled in Versioning configuration on a bucket
-When you enable MFA Delete, it means MFA is required to change bucket versioning states.
-MFA is required to delete versions.
How it works is that you provide:
-Serial number (MFA) + Code passed with API CALLS
S3 Performance Optimization
“It’s often about performance and reliability combined and this is especially relevant, when we’re talking about a distributed organization”
Features which help us in this regard:
- Single PUT Upload
- Multipart Upload
- S3 Accelerated Transfer
Single PUT Upload
We know from the “Animals for Life” scenario, that remote workers need to upload large data sets and do so frequently, and we know that they’re often on unreliable internet connections.
-By default, when you upload an object to S3, it’s uploaded as a single data stream to S3.
A file becomes an object, and it’s uploaded using the PutObject API call and placed in a bucket, and this all happens as a single stream.
-If the stream fails, the upload fails
-Requires a full restart
Any delay can be costly and potentially risky.
-Speed & Reliability of the upload will always be limited, because of this single stream of data
“Single stream transfer can often provide much slower speeds than both ends of that transfer are capable of”
-If you utilize a single PUT upload, then you’re limited to 5GB of data as a maximum. (AWS Limit)
Multipart Upload
-Improves the speed and reliabilty of uploads to S3, and it does this by breaking data up into individual parts.
-The minimum size for using multipart upload is 100MB.
-The upload can be split into a maximum of 10.000 parts, and each part can range in size between 5MB-5GB.
-The last part can be smaller than 5MB.
-Each individual part is treated as it’s own isolated upload - each part can fail in isolation and be restarted in isolation, rather than restarting the whole thing.
-Improves the transfer rate = speeds of all parts.
S3 Accelerated Transfer
To understand it, first, it’s required to understand how global transfer works to S3 buckets.
We have no control over the public internet data path, routers and ISPs are picking this path based on what they think is best, and potentially commercially viable. That doesn’t always align with what offers the best performance.
So using the public internet for data transit is never an optimal way to get data from source to destination.
S3 Transfer Acceleration uses the network os AWS Edge Locations, which are located in lots of convenient locations globally. An S3 bucket needs to be enabled for transfer acceleration, the default is that it’s switched OFF, and there are some restrictions for enabling it.
-The bucket name can not contain periods and it needs to be DNS compatible in it’s naming
Once enabled, data being uploaded, instead of going back to S3 bucket directly, it immediately enters the closest best performing AWS Edge Location. Now this part does occur over the public internet, but geographycally, it’s really close.
At this point, the Edge Locations transit the data being uploaded over the AWS global network, a network which is directly under the control of AWS, and this tends to be a direct link, between these Edge Locations and other areas of the AWS global network, in this case the S3 bucket.
-The internet, it’s not designed primarily for speeds, it’s designed for flexibility and resilience.
-The AWS network purpose is built to link regions to other regions in the AWS network. (It’s like an express train, only stoping at the source and destionation) (It’s much faster and with lower, consistent latency)
Key Management Service (KMS)
AWS KMS is a secure and resilient service that uses hardware security modules that have been validated under FIPS 140-2, or are in the process of being validated, to protect your keys.
-Regional & Public Service
-Capable of multi-region features
-It lets you create, store and manage cryptographic keys (keys which can be used to convert plaintext to ciphertext, and vice versa)
-Capable of handling Symmetric and Asymmetric keys
-Capable of performing cryptographic operations (encrypt, decrypt & …)
-Keys never leave KMS - Provides FIPS 140-2 (L2) = US security standard, it’s often a key point of distinction between using KMS versus using something like CloudHSM.
(ASSUME HE IS TALKING ABOUT SYMMETRIC KEYS)
-The main type of key that KMS manages are KMS keys (also referred to as Customer Master Keys (CMK))
These KMS keys are used by KMS within cryptographic operations = You can use them, applications can use them, and other AWS services can use them.
They are logical, think of them as a container for the actual physical key material, and this is the data that really makes up the key.
-KMS keys contains - ID, data, policy, description & state
-Every KMS key is backed by physical key material
It’s this data which is held by KMS and it’s this material, which is actually used to encrypt and decrypt things that you give to KMS.
-The physical key material can be generated by KMS or imported into KMS
This material contained inside a KMS key can be used to directly encrypt or decrypt data up to 4KB in size
Data Encryption Keys (DEKs)
Are another type of keys which KMS can generate.
-They are generated by using a KMS key using the GenerateDataKey operation.
- GenerateDataKey can encrypt or decrypt data > 4KB
- GenerateDataKey are linked to the KMS key which created them. (KMS can tell that a specific data encryption key was created using a specific KMS key)
- KMS doesn’t store the data encryption key in any way.
It provides it to you or the service using KMS and then it discards it.
The reason it discards it, is that KMS doesn’t actually do the encryption or decryption of data using DEKs, you do or the service using KMS performs those operations.
When a DEK is generated, KMS provides you with two versions of that DEK:
-Plaintext Version = Something which can be used immediately to perform cryptographic operations
-Cyphertext Version = Can be given back to KMS for it to be decrypted (Encrypted by using the KMS key that generated it)
-Encrypt data using plaintext key.
-Once finished with that process, discard the plaintext version of that DEK.
-Store encrypted key (Ciphertext) with that data encrypted data.
Decrypting that data is simple, you pass the encrypted data encryption key back to KMS and ask for it to decrypt it, using the same KMS key used to generate it.
Then you use the decyrpted data encryption key that KMS gives you back and decrypt the data with it and then you discard the decrypted data encryption key.
–S3 generates DEK for every single object
-KMS doesn’t track the usage of DEK