AWS Storage Services - S3 Flashcards

1
Q

S3 Overview
S3 stores data as objects within buckets
an object consists of a file and optionally any metadata that describes that file.
a key is the unique identifier for an object within a bucket
storage capacity is virtually unlimited

A

S3 Overview

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Buckets
For each bucket you can:
control access to it (create, delete, and list objects in the bucket)
View access logs for it and its objects
choose the geographical region where to store the bucket and its contents
Bucket name must be a unique DSN-compliant name:
the name must be unique across all existing bucket names in Amazon S3
after you create the bucket you cannot change the name
the bucket name is visible in the URL that points to the objects that you’re going to put in your bucket.
By default you can create up to 100 buckets in each of your AWS accounts
you can’t change its region after creation
you can host static websites by configuring your bucket for website hosting
you can’t delete an S3 bucket using the S3 console if the bucket contains 100,000 or more objects. You can’t delete an S3 bucket using the CLI if versioning is enabled.

A

Buckets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data consistency model
read-after-write consistency for PUTS of new objects in your S3 bucket in all regions.
eventual consistency for read-after-write HEAD of GET requests
eventual consistency for overwrite PUTS and DELETES in all regions
strong read-after-write consistency for any storage request

A

Data consistency model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Storage classes

storage classes for frequently access objects:
S3 standard for general-purpose storage of frequently accessed data

Storage classes for infrequently access objects:
S3 Standard_lA for long-lived but less frequently access data. It stores the object data redundantly across multiple geographically separated AZs
S3 Onezone_lA stores the object data in only one AZ. Less expensive than Standard_lA but data is not resilient to the physical loss of the AZ
These two storage classes are suitable for objects larger than 128 kb that you plan to store for at least 30 days. If an object is less than 128 S3 charges you for 128 kb. If you delete an object before the 30 day minimum you are charged for 30 days.

S3 Intelligent Tiering
is a storage class designed for customers who want to optimize storage costs automatically when data access patterns change, without performance impact or operational overhead
is the first cloud object storage class that delivers automatic cost savings by moving data between two access tiers - frequent access or infrequent access - when access patterns change, and is ideal for data with unknown or changing access patterns.
S3 Intelligent Tiering monitors access patterns and moves objects that have not been accessed for 30 consecutive days to the infrequent access tier. If an object in the infrequent access tier is accessed later, it is automatically moved back to the frequent access tier.
S3 Intelligent Tiering supports the archive access tier. If the objects haven't been accessed for 90 consecutive days it will be moved to the archive access tier. After 180 consecutive days of no access, it is automatically moved to the deep archive access tier. There are no retrieval fees for S3 intelligent tiering

Glacier
for long-term archive
archived objects are not available for real-time access. You must first restore the objects before you can access them.
you cannot specify glacier as the storage class at the time you create an object.
Glacier objects are visible through S3 only
retrieval options
expedited - allows you to quickly access your data when occasional urgent requests for a subset of archives are required. For all but the largest archived objects, data accessed are typically made available within 1-5 minutes. There are two types of expedited retrievals: on-demand requests are similar to EC2 on-demand instances and are available most of the time. provisioned requests are guaranteed to be available when you need them.
standard - allows you to access any of your archived objects within several hours. Standard retrievals typically complete within 3-5 hours. this is the default option for retrieval requests that do not specify the retrieval option.
Bulk - glacier’s lowest cost retrieval option, enabling you to retrieve large amounts, even petabytes, of data inexpensively in a day. bulk retrievals typically complete within 5-12 hours.
For S3 standard, S3 standard-lA, and glacier storage classes, your objects are automatically stored across multiple devises spanning a minimum of 3 AZs

S3 Glacier Deep Archive
a new S3 storage class providing secure and durable object storage for long-term retention of data that is accessed rarely in a year.
S3 glacier deep archive offers the lowest cost storage in the cloud, at prices lower than storing and maintaining data in on-premises magnetic tape libraries or archiving data offsite.
all objects stored in the S3 glacier deep archive storage class are replicated and stored across at least three geographically-dispersed AZs protected by 99.99999999% durability, and can be restored within 12 hours or less
S3 glacier deep archive also offers a bulk retrieval option, where you can retrieve petabytes of data within 48 hours

A

Storage classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Objects
Are private by default. Grant permissions to others users.
Each S3 object has data, a key, and metadata
you cannot modify object metadata after object is uploaded
two kinds of metadata
system metadata: date, content-length, last-modified, content-MD5, x-amz-server-side-encyption, x-amz-version-id,x-amz-delete-marker, x-amx-storage-class,x-amz-website-redirect-location, x-amz-server-side-enryption-aws-kms-key-id,x-amz-server-side-encryption-customer-algorithm
user-defined metadata - key-value pair that you provide
you can upload and copy objects of up to 5 GB in size in a single operation. for objects greater than 5 GB up to 5 TB you must use the multipart upload API
Tagging - you can associate up to 10 tags with an object. Tags associated with an object must have unique tag keys. A tag key can be up to 128 unicode characters in length and tag values can be up to 256 unicode characters in length. Key and values are case sensitive.
Object delete
deleting objects from a version-enabled bucket - specify a non-versioned delete request - specify only the object’s key and not the version id. specify a versioned delete request - specify both the key and also a version id
deleting objects from an MFA-enabled bucket - if you provide an invalid MFA token the request always fails. if you are not deleting a versioned object, and you don’t provide an MFA token the delete succeeds.
Object lock
prevents objects from being deleted or overwritten for a fixed amount of time or indefinitely.
object retention options: retention period - object remains locked until the retention period expires. legal hold - object remains locked until your explicitly remove it. Object lock works only in versioned buckets and only applies to individual versions of objects.
Object ownership - with the bucket-owner-full-control ACL you can automatically assume ownership of objects that are uploaded to your buckets

A

Objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

S3 Select
is an Amazon S3 capability designed to pull out only the data you need from an object, which can dramatically improve the performance and reduce the cost of applications that need to access data in S3.
S3 Select works on objects stored in CSV and JSON format, apache parquet format, JSON arrays, and BZIP2 compression for CSV and JSON objects.
Cloudwatch metrics for S3 select lets you monitor S3 select usage for your applications. These metrics are available at 1 minute intervals and lets you quickly identify and act on operational issues.

A

S3 Select

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pricing
S3 charges you only for what you actually use, with no hidden feeds and no overage charges
No charge for creating a bucket, but only for storing objects in the bucket and for transferring objects in and out of the bucket.
Storage - you pay for storing objects in your S3 bucket. The rate you’re charged depends on your objects’s size, how long you stored the objects during the month, and the storage class.
Requests you pay for requests, for example, GET requests, made against your S3 buckets and objects. This includes lifecycle requests. The rates for requests depend on what kind of request you’re making.
Retrievals - you pay for retrieving objects that are stored in standard_IA, onezone_lA, and glacier storage
Early deletes - if you delete an object stored in standard_lA, onezone_lA, or glacier storage before the minimum storage commitment has passed you pay an early deletion fee for that object
storage management - you pay for the storage management features that are enabled on your account’s buckets
Bandwidth - you pay for all bandwidth into and out of S3, except for the following: data transferred into from the internet, data transferred out to an EC2 instance, when the instance is in the same region as the S3 bucket, data transferred out to CloudFront. You also pay a fee for any data transferred using S3 transfer acceleration.

A

Pricing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pricing
S3 charges you only for what you actually use, with no hidden feeds and no overage charges
No charge for creating a bucket, but only for storing objects in the bucket and for transferring objects in and out of the bucket.
Storage - you pay for storing objects in your S3 bucket. The rate you’re charged depends on your objects’s size, how long you stored the objects during the month, and the storage class.
Requests you pay for requests, for example, GET requests, made against your S3 buckets and objects. This includes lifecycle requests. The rates for requests depend on what kind of request you’re making.
Retrievals - you pay for retrieving objects that are stored in standard_IA, onezone_lA, and glacier storage
Early deletes - if you delete an object stored in standard_lA, onezone_lA, or glacier storage before the minimum storage commitment has passed you pay an early deletion fee for that object
storage management - you pay for the storage management features that are enabled on your account’s buckets
Bandwidth - you pay for all bandwidth into and out of S3, except for the following: data transferred into from the internet, data transferred out to an EC2 instance, when the instance is in the same region as the S3 bucket, data transferred out to CloudFront. You also pay a fee for any data transferred using S3 transfer acceleration.

A

Pricing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Networking
Hosted-style access - S3 routes any virtual hosted-style request to the US East (N.Virginia) region by default if you use the end point S3.amazonaws.com, instead of the region-specific endpoint. Format: http://bucket.S3.amazonaws.com or http://bucket.s3-aws-region.amazonaws.com

path-style access - in a path style URL the endpoint you use must match the region in which the bucket resides. Format: US East region endpoint: http://bucket.S3.amazonaws.com or region-specific endpoint: http://bucket.s3-aws-region.amazonaws.com

customize S3 URLs with CNAMEs - the bucket name must be the same as the CNAME
Amazon S3 Transfer acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. It takes advantage of CloudFront’s globally distributed edge locations.
Transfer acceleration cannot be disabled and can only be suspended. Transfer acceleration URL is bucket.s3-accelerate.amazonaws.com

A

Networking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Versioning - use versioning to keep multiple versions of an object in one bucket. Versioning protects you from the consequences of unintended overwrites and deletions. you can also use versioning to archive objects so you have access to previous versions. since versioning is disabled by default, need to explicitly enable. When you PUT an object in a versioning-enabled bucket the non-current version is not overwritten.
When you delete an object, all versions remain in the bucket and S3 inserts a delete marker. Performing a simple GET object request when the current version is a delete marker returns a 404 not found error you can however GET a non-current version of an object by specifying its version id. you can permanently delete an object by specifying the version you want to delete. only the owner of an S3 bucket can permanently delete a version.

A

Versioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
Encryption
server side encryption using S3 managed keys, kms-managed keys, or customer provided keys. 
client-side encryption using
KMS-managed customer master key
client-side master key
A

Encryption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MFA Delete
MFA delete grants additional authentication for either of the following operations: change the versioning state of your bucket or permanently delete an object version
MFA delete requires two forms of authentication together- your security credentials and the concatenation of a valid serial number, a space, and the six digit code displayed on an approved authentication device

A

MFA delete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cross-account access
you can provide another AWS account access to an object that is stored in an A3 bucket. These are the methods on how to grant cross-account access to objects that are stored in your own S3 bucket:
resource-based policies and IAM policies for programmatic-only access to S3 bucket objects.

resource-based Access Control List (ACL) and IAM policies for programmatic-only access to S3 bucket objects

cross-account IAM roles for programmatic and console access to S3 bucket objects

Requester pays buckets - bucket owners pay for all of the S3 storage and data transfer costs associated with their bucket. To save on costs, you can enable the requester pays feature so the requester will pay the cost of the request and the data download from the bucket instead of the bucket owner. Take note that the bucket owner always pays the cost of storing data.

A

cross-account access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Monitoring
automated monitoring tools to watch S3
Amazon CloudWatch Alarms - watch a single metric over a time period that you specify, and perform one or more actions based on the value of the metric relative to a given threshold over a number of time periods
AWS CloudTail log monitoring - share log files between accounts, monitor cloudtrail log files in real time by sending them to cloudwatch logs, write log processing applications in java, and validate that your log files have not changed after delivery by cloudtrail

monitoring with cloudwatch
daily storage metrics for buckets - you can monitor bucket storage using cloudwatch, which collects and processes storage data from S3 into readable, daily metrics
request metrics - you can choose to monitor S3 request to quickly identify and act on operational issues. The metrics are available at 1 minute intervals after some latency to process.

you can have a maximum of 1000 metrics configurations per bucket
supported event activities that occur on S3 are recorded in a CloudTrail event along with other AWS service events in event history

A

Monitoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Website Hosting
enable website hosting in your bucket properties
your static website is available via the region-specific website endpoint
you must make the objects that you want to serve publicly readable by writing a bucket policy that grants everyone s3:getobject permission

A

website hosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

S3 event notification
to enable notifications, add a notification configuration identifying the events to be published, and the destinations where to send the event notifications
can publish following events:
a new object created event, an object removal event, a reduced redundancy storage (RRS) object lost event

supports the following destinations for your events:
amazon simple notification service (SNS) topic
amazon simple queue service (SQS) topic
AWS Lambda

A

S3 event notification

17
Q

Cross Region Replication
enables automatic, asynchronous copying of objects across buckets in different AWS regions
when to use: comply with compliance requirements, minimize latency, increase operational efficiency, maintain object copies under different ownership

requirements of CRR: both source and destination buckets must have versioning enabled. the source and destination buckets must be in different AWS regions. S3 must have permissions to replicate objects from the source bucket to the destination bucket on your behalf. If the owner of the source bucket doesn’t own the object in the bucket, the object owner must grant the bucket owner read and read_acp permissions with the object ACL

only the following are replicated:
objects created after you add a replication configuration. both unencrypted objects and objects encrypted using S3 managed keys or kms managed keys although you must explicitly enable the option to replicate objects encrypted using KMS keys. The replicated copy of the object is encrypted using the same type of server-side encryption that was used for the source object.
object metadata. Only objects in the source bucket for which the bucket owner has permissions to read objects and access control lists.
Object ACL updates, unless you direct S3 to change the replica ownership when source and destination buckets are not owned by the same accounts.
object tags

what is not replicated
objects that existed before you added the replication configuration to the bucket
objects created with server side encryption using customer provided encryption keys
objects created with server side encryption using kms-managed encryption keys
objects in the source bucket that the bucket owner doesn’t have permissions for
updates to bucket level subresources
actions performed by lifecycle configuration
objects in the source bucket that are replicas created by another cross-region replication

CCR delete operations
if you make a delete request without specifying an object version id, S3 adds a delete marker. if you specify an object version id to delete in a delete request, S3 deletes that object version in the source bucket, but it does not replicate the deletion in the destination bucket. this protects data from malicious deletions

A

Cross Region Replication

18
Q

S3 batch operations
a new feature that makes it simple to manage billions of objects stored in S3. Customers can make changes to object properties and metadata, and perform other storage management tasks - such as copying objects between buckets, replacing tag sets, modifying access controls, and restoring archived objects from S3 glacier - for any number of S3 objects in minutes

A

S3 batch operations

19
Q

S3 Glacier
long-term archival solution optimized for infrequently used data or cold data
glacier is a REST based web service
you can store an unlimited number of archives and an unlimited amount of data
you cannot specify glacier as the storage class at the time you create an object
it is designed to provide an average annual durability of 99.99999999% for an archive. Glacier synchronously stores your data across multiple AZs before confirming a successful load
to prevent corruption of data packets over the wire, glacier uploads the checksub of the data during data upload. it compares the received checksum with the checksum of the received data and validates data authenticity with checksum during data retrieval
Glacier works together with S3 lifecycle rules to help automate archiving of S3 data and reduce your overall storage costs. requested archival data is copied to S3 one zone-la

A

S3 Glacier Overview

20
Q

Glacier data model
vault- a container for storing archives, each vault resource has a unique address with form: https://region-specific endpoint/account-id/vaults/valutname. you can store an unlimited number of archives in a vault. vault operations are region specific.
archive - can be any data such as a photo, video, or document and is a base unit of storage in glacier. each archive has a unique address with form: https://region-specific endpoint/account-id/vaults/vault-name/archives/archive-id
Job - you can perform a select query on an archive, retrieve an archive, or get an inventory of a vault. glacier select runs the query in place and writes the output results to S3. select, archive retrieval, and vault inventory jobs are associated with a vault. a vault can have multiple jobs in progress at any point in time
Notification configuration - because jobs take time to complete, glacier supports a notification mechanism to notify you when a job is complete

A

Glacier data model

21
Q

Glacier operations
retrieving an archive (asynchronous operation). retrieving a vault inventory (list of archives). create and delete vaults. get the vault description for a specific vault or for all vaults in a region.
set, retrieve, and delete a notification configuration on the vault
upload and delete archives. you cannot update an existing archive
glacier jobs - select, archive-retrieval, inventory-retrieval

A

Glacier operations

22
Q

Vaults
vault operations are region specific
vault names must be unique within an account and the region in which the vault is being created
you can delete a vault only if there are no archives in the vault as of the last inventory that glacier computed and there have been no writes to the vault since the last inventory
you can retrieve vault information such as the vault creation date, number of archive in the vault, and the total size of all archives in the vault
glacier maintains an inventory of all archives in each of your vaults for disaster recovery or occasional reconciliation. a vault inventory refers to the list of archives in a vault. Glacier updates the vault inventory approximately once a day. downloading a vault inventory is an asynchronous operation.
you can assign your own metadata to glacier vaults in the form of tags. a tag is a key-value pair that you define for a vault.
glacier vault lock allows you to easily deploy and enforce compliance controls for individual glacier vaults with a vault lock policy. You can specify controls such as write once read many in a vault lock policy and lock the policy from future edits. Once locked the policy can no longer be changed.

A

Vaults

23
Q

Archives
glacier supports the following basic archive operations: upload, download, and delete. downloading an archive is an asynchronous operation.
you can upload an archive in a single operation or upload it in parts
using the multipart upload API, you can upload large archives, up to 10,000 x 4 GB
you can not upload archives to glacier by using the management console. using the CLI or write code to make requests, by using the API directly or by using the AWS SDKs
You cannot delete an archive using the Glacier management console. Glacier provides an API call that you can use to delete one archive at a time.
After you upload an archive you cannot update its content or its description. The only way you can update the archive content or its description is by deleting the archive and uploading another archive.
glacier does not support any additional metadata for the archives

A

Archives

24
Q

Glacier Select
you can perform filtering operations using simple SQL statements directly on your data in glacier.
you can run queries and custom analytics on your data that is stored in glacier without having to restore your data to a hotter tier like S3
when you perform select queries, glacier provides three data access tiers
expedited - data accessed is typically made available within 1-5 minutes
standard - data accessed is typically made available within 3-5 hours
bulk - data accessed is typically made available within 5-12 hours

A

Glacier Select

25
Q

Glacier data retrieval policies
set data retrieval limits and manage the data retrieval activities across your AWS account in each region
three types of policies:
free tier only - you can keep your retrievals within your daily free tier allowance and not incur any data retrieval cost
max retrieval rate - ensures that the peak retrieval rate from all retrieval jobs across your account in a region does not exceed the bytes per hour limit you set
no retrieval limit

A

Glacier data retrieval policies

26
Q

Security
glacier encrypts your data at rest by default and supports secure data transit via SSL
data stored in glacier is immutable, meaning that after an archive is created it can not be updated
access to glacier requires credentials that AWS can use to authenticate your requests. those credentials must have permissions to access glacier vaults or s3 buckets
glacier requires all requests to be signed for authentication protection. to sign a request, you calculate a digital signature using a cryptographic hash function that returns a has value that you include in the request as a signature
glacier supports policies only at the vault level
you can attach identity-based policies to IAM identities
A glacier vault is the primary resource and resource-based policies are referred to as vault policies
when activity occurs in glacier, that activity is recorded in a cloudtrail event along with other aws service events in event history

A

Glacier security