AWS Storage Services - S3 Flashcards
S3 Overview
S3 stores data as objects within buckets
an object consists of a file and optionally any metadata that describes that file.
a key is the unique identifier for an object within a bucket
storage capacity is virtually unlimited
S3 Overview
Buckets
For each bucket you can:
control access to it (create, delete, and list objects in the bucket)
View access logs for it and its objects
choose the geographical region where to store the bucket and its contents
Bucket name must be a unique DSN-compliant name:
the name must be unique across all existing bucket names in Amazon S3
after you create the bucket you cannot change the name
the bucket name is visible in the URL that points to the objects that you’re going to put in your bucket.
By default you can create up to 100 buckets in each of your AWS accounts
you can’t change its region after creation
you can host static websites by configuring your bucket for website hosting
you can’t delete an S3 bucket using the S3 console if the bucket contains 100,000 or more objects. You can’t delete an S3 bucket using the CLI if versioning is enabled.
Buckets
Data consistency model
read-after-write consistency for PUTS of new objects in your S3 bucket in all regions.
eventual consistency for read-after-write HEAD of GET requests
eventual consistency for overwrite PUTS and DELETES in all regions
strong read-after-write consistency for any storage request
Data consistency model
Storage classes
storage classes for frequently access objects:
S3 standard for general-purpose storage of frequently accessed data
Storage classes for infrequently access objects:
S3 Standard_lA for long-lived but less frequently access data. It stores the object data redundantly across multiple geographically separated AZs
S3 Onezone_lA stores the object data in only one AZ. Less expensive than Standard_lA but data is not resilient to the physical loss of the AZ
These two storage classes are suitable for objects larger than 128 kb that you plan to store for at least 30 days. If an object is less than 128 S3 charges you for 128 kb. If you delete an object before the 30 day minimum you are charged for 30 days.
S3 Intelligent Tiering is a storage class designed for customers who want to optimize storage costs automatically when data access patterns change, without performance impact or operational overhead is the first cloud object storage class that delivers automatic cost savings by moving data between two access tiers - frequent access or infrequent access - when access patterns change, and is ideal for data with unknown or changing access patterns. S3 Intelligent Tiering monitors access patterns and moves objects that have not been accessed for 30 consecutive days to the infrequent access tier. If an object in the infrequent access tier is accessed later, it is automatically moved back to the frequent access tier. S3 Intelligent Tiering supports the archive access tier. If the objects haven't been accessed for 90 consecutive days it will be moved to the archive access tier. After 180 consecutive days of no access, it is automatically moved to the deep archive access tier. There are no retrieval fees for S3 intelligent tiering
Glacier
for long-term archive
archived objects are not available for real-time access. You must first restore the objects before you can access them.
you cannot specify glacier as the storage class at the time you create an object.
Glacier objects are visible through S3 only
retrieval options
expedited - allows you to quickly access your data when occasional urgent requests for a subset of archives are required. For all but the largest archived objects, data accessed are typically made available within 1-5 minutes. There are two types of expedited retrievals: on-demand requests are similar to EC2 on-demand instances and are available most of the time. provisioned requests are guaranteed to be available when you need them.
standard - allows you to access any of your archived objects within several hours. Standard retrievals typically complete within 3-5 hours. this is the default option for retrieval requests that do not specify the retrieval option.
Bulk - glacier’s lowest cost retrieval option, enabling you to retrieve large amounts, even petabytes, of data inexpensively in a day. bulk retrievals typically complete within 5-12 hours.
For S3 standard, S3 standard-lA, and glacier storage classes, your objects are automatically stored across multiple devises spanning a minimum of 3 AZs
S3 Glacier Deep Archive
a new S3 storage class providing secure and durable object storage for long-term retention of data that is accessed rarely in a year.
S3 glacier deep archive offers the lowest cost storage in the cloud, at prices lower than storing and maintaining data in on-premises magnetic tape libraries or archiving data offsite.
all objects stored in the S3 glacier deep archive storage class are replicated and stored across at least three geographically-dispersed AZs protected by 99.99999999% durability, and can be restored within 12 hours or less
S3 glacier deep archive also offers a bulk retrieval option, where you can retrieve petabytes of data within 48 hours
Storage classes
Objects
Are private by default. Grant permissions to others users.
Each S3 object has data, a key, and metadata
you cannot modify object metadata after object is uploaded
two kinds of metadata
system metadata: date, content-length, last-modified, content-MD5, x-amz-server-side-encyption, x-amz-version-id,x-amz-delete-marker, x-amx-storage-class,x-amz-website-redirect-location, x-amz-server-side-enryption-aws-kms-key-id,x-amz-server-side-encryption-customer-algorithm
user-defined metadata - key-value pair that you provide
you can upload and copy objects of up to 5 GB in size in a single operation. for objects greater than 5 GB up to 5 TB you must use the multipart upload API
Tagging - you can associate up to 10 tags with an object. Tags associated with an object must have unique tag keys. A tag key can be up to 128 unicode characters in length and tag values can be up to 256 unicode characters in length. Key and values are case sensitive.
Object delete
deleting objects from a version-enabled bucket - specify a non-versioned delete request - specify only the object’s key and not the version id. specify a versioned delete request - specify both the key and also a version id
deleting objects from an MFA-enabled bucket - if you provide an invalid MFA token the request always fails. if you are not deleting a versioned object, and you don’t provide an MFA token the delete succeeds.
Object lock
prevents objects from being deleted or overwritten for a fixed amount of time or indefinitely.
object retention options: retention period - object remains locked until the retention period expires. legal hold - object remains locked until your explicitly remove it. Object lock works only in versioned buckets and only applies to individual versions of objects.
Object ownership - with the bucket-owner-full-control ACL you can automatically assume ownership of objects that are uploaded to your buckets
Objects
S3 Select
is an Amazon S3 capability designed to pull out only the data you need from an object, which can dramatically improve the performance and reduce the cost of applications that need to access data in S3.
S3 Select works on objects stored in CSV and JSON format, apache parquet format, JSON arrays, and BZIP2 compression for CSV and JSON objects.
Cloudwatch metrics for S3 select lets you monitor S3 select usage for your applications. These metrics are available at 1 minute intervals and lets you quickly identify and act on operational issues.
S3 Select
Pricing
S3 charges you only for what you actually use, with no hidden feeds and no overage charges
No charge for creating a bucket, but only for storing objects in the bucket and for transferring objects in and out of the bucket.
Storage - you pay for storing objects in your S3 bucket. The rate you’re charged depends on your objects’s size, how long you stored the objects during the month, and the storage class.
Requests you pay for requests, for example, GET requests, made against your S3 buckets and objects. This includes lifecycle requests. The rates for requests depend on what kind of request you’re making.
Retrievals - you pay for retrieving objects that are stored in standard_IA, onezone_lA, and glacier storage
Early deletes - if you delete an object stored in standard_lA, onezone_lA, or glacier storage before the minimum storage commitment has passed you pay an early deletion fee for that object
storage management - you pay for the storage management features that are enabled on your account’s buckets
Bandwidth - you pay for all bandwidth into and out of S3, except for the following: data transferred into from the internet, data transferred out to an EC2 instance, when the instance is in the same region as the S3 bucket, data transferred out to CloudFront. You also pay a fee for any data transferred using S3 transfer acceleration.
Pricing
Pricing
S3 charges you only for what you actually use, with no hidden feeds and no overage charges
No charge for creating a bucket, but only for storing objects in the bucket and for transferring objects in and out of the bucket.
Storage - you pay for storing objects in your S3 bucket. The rate you’re charged depends on your objects’s size, how long you stored the objects during the month, and the storage class.
Requests you pay for requests, for example, GET requests, made against your S3 buckets and objects. This includes lifecycle requests. The rates for requests depend on what kind of request you’re making.
Retrievals - you pay for retrieving objects that are stored in standard_IA, onezone_lA, and glacier storage
Early deletes - if you delete an object stored in standard_lA, onezone_lA, or glacier storage before the minimum storage commitment has passed you pay an early deletion fee for that object
storage management - you pay for the storage management features that are enabled on your account’s buckets
Bandwidth - you pay for all bandwidth into and out of S3, except for the following: data transferred into from the internet, data transferred out to an EC2 instance, when the instance is in the same region as the S3 bucket, data transferred out to CloudFront. You also pay a fee for any data transferred using S3 transfer acceleration.
Pricing
Networking
Hosted-style access - S3 routes any virtual hosted-style request to the US East (N.Virginia) region by default if you use the end point S3.amazonaws.com, instead of the region-specific endpoint. Format: http://bucket.S3.amazonaws.com or http://bucket.s3-aws-region.amazonaws.com
path-style access - in a path style URL the endpoint you use must match the region in which the bucket resides. Format: US East region endpoint: http://bucket.S3.amazonaws.com or region-specific endpoint: http://bucket.s3-aws-region.amazonaws.com
customize S3 URLs with CNAMEs - the bucket name must be the same as the CNAME
Amazon S3 Transfer acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. It takes advantage of CloudFront’s globally distributed edge locations.
Transfer acceleration cannot be disabled and can only be suspended. Transfer acceleration URL is bucket.s3-accelerate.amazonaws.com
Networking
Versioning - use versioning to keep multiple versions of an object in one bucket. Versioning protects you from the consequences of unintended overwrites and deletions. you can also use versioning to archive objects so you have access to previous versions. since versioning is disabled by default, need to explicitly enable. When you PUT an object in a versioning-enabled bucket the non-current version is not overwritten.
When you delete an object, all versions remain in the bucket and S3 inserts a delete marker. Performing a simple GET object request when the current version is a delete marker returns a 404 not found error you can however GET a non-current version of an object by specifying its version id. you can permanently delete an object by specifying the version you want to delete. only the owner of an S3 bucket can permanently delete a version.
Versioning
Encryption server side encryption using S3 managed keys, kms-managed keys, or customer provided keys. client-side encryption using KMS-managed customer master key client-side master key
Encryption
MFA Delete
MFA delete grants additional authentication for either of the following operations: change the versioning state of your bucket or permanently delete an object version
MFA delete requires two forms of authentication together- your security credentials and the concatenation of a valid serial number, a space, and the six digit code displayed on an approved authentication device
MFA delete
Cross-account access
you can provide another AWS account access to an object that is stored in an A3 bucket. These are the methods on how to grant cross-account access to objects that are stored in your own S3 bucket:
resource-based policies and IAM policies for programmatic-only access to S3 bucket objects.
resource-based Access Control List (ACL) and IAM policies for programmatic-only access to S3 bucket objects
cross-account IAM roles for programmatic and console access to S3 bucket objects
Requester pays buckets - bucket owners pay for all of the S3 storage and data transfer costs associated with their bucket. To save on costs, you can enable the requester pays feature so the requester will pay the cost of the request and the data download from the bucket instead of the bucket owner. Take note that the bucket owner always pays the cost of storing data.
cross-account access
Monitoring
automated monitoring tools to watch S3
Amazon CloudWatch Alarms - watch a single metric over a time period that you specify, and perform one or more actions based on the value of the metric relative to a given threshold over a number of time periods
AWS CloudTail log monitoring - share log files between accounts, monitor cloudtrail log files in real time by sending them to cloudwatch logs, write log processing applications in java, and validate that your log files have not changed after delivery by cloudtrail
monitoring with cloudwatch
daily storage metrics for buckets - you can monitor bucket storage using cloudwatch, which collects and processes storage data from S3 into readable, daily metrics
request metrics - you can choose to monitor S3 request to quickly identify and act on operational issues. The metrics are available at 1 minute intervals after some latency to process.
you can have a maximum of 1000 metrics configurations per bucket
supported event activities that occur on S3 are recorded in a CloudTrail event along with other AWS service events in event history
Monitoring
Website Hosting
enable website hosting in your bucket properties
your static website is available via the region-specific website endpoint
you must make the objects that you want to serve publicly readable by writing a bucket policy that grants everyone s3:getobject permission
website hosting