S3 Flashcards
S3
Simple Storage Service
What is S3
provides developers and IT Teams with secure, durable, highly-scalable object storage.
Object based storage
where you can store files, pictures, pdfs etc
Block Based Storage
EC2 - where you install operating system, databases or applications
File size that can be stored in S3
0 - 5 TB
S3 storage
unlimited
S3 Files are stored in
Buckets
S3 naming
universal namespace - it has be unique globally;
Sample of an S3 name
https://s3-eu-west-1.amazonaws.com/acloudguru
When you upload a file to S3, this will be returned when the upload is successful
HTTP 200
Data consistency model for S3
- Read after write consistency for PUTS of new objects
2. Eventual Consistency for overwrite PUTS and DELETES (can take sometime to propagate)
Read after Write consistency
Uploading a file and able to read/access it right away (millisecond after)
Eventual consistency
if we’re updating/deleting a file, we might get 2 files -old and new but eventually you’ll be able to get the new file (after a min) - delay is due to the fact that S3 is spread across multiple AZs
S3 is object based; objects consists of the following
- Key
- Value
- Version ID
- Metadata
- Subresources
Key
name of the object
Value
simply the data and is made up of sequence of bytes; it’s the data inside the files (hello cloud gurus)
Metadata
tags (owned by sales marketing etc)
Access Control List
putting individual permissions on a file
Version ID
important for versioning
S3 durability for all storage classes
99.99999999 (11x9s)
S3 Tiered Storage
storage classes
LIfecycle Management
archiving files; moving files from one storage tier to another based on how old the file is
Versioning
multiple versions of the file
Securing your data in S3 using:
- Access Control Lists
2. Bucket Policies
Access Control Lists
goes down to the individual file level
Bucket Policies
locking down the bucket itself at the bucket level
S3 has the following features
- Tiered Storage Available
- Lifecycle Management
- Versioning
- Encryption
- MFA Delete
- Securing data using ACL and BP
S3 Storage Classes
- S3 Standard
- S3 Intelligent Tiering
- S3 Standard-IA
- S3 One Zone-IA
- S3 Glacier
- S3 Glacier Deep Archive
S3 Standard Availability Availability SLA Availability Zones Min capacity charge/object Min storage duration charge Retrieval Fee First byte latency
99.99%
99.9%
>=3
NA
NA
NA
milliseconds
S3 Standard is stored
redundantly across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently
S3 Standard - IA Availability Availability SLA Availability Zones Min capacity charge/object Min storage duration charge Retrieval Fee First byte latency
99.9% 99% >=3 128kb 30 days Per GB retrieved milliseconds
S3 - IA is for
Use case
- data that is accessed less frequently but requires rapid access when needed
- lower fee than S3 but you are charged a retrieval fee
ideally suited for long-term file storage, older sync and share storage, and other aging data
S3 One Zone - IA Availability Availability SLA Availability Zones Min capacity charge/object Min storage duration charge Retrieval Fee First byte latency
99.5% 99% 1 128kb 30 days Per GB retrieved milliseconds
S3 One Zone - IA is for
Use case
- were you want a lower cost option for infrequently accessed data but do not require the multiple AZ data resilience
backup copies, disaster recovery copies or other easily re-creatable data.
S3 Intelligent Tiering Availability Availability SLA Availability Zones Min capacity charge/object Min storage duration charge Retrieval Fee First byte latency
99.9% 99% >=3 NA 30 days NA milliseconds
S3 - Intelligent Tiering
Use case
Designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead
Unknown access patterns;an also be used to store new data sets where, shortly after upload, access is frequent, but decreases as the data set ages. Then you can move the data set to S3 One Zone-IA or archive it to S3 Glacier.
S3 Glacier Availability Availability SLA Availability Zones Min capacity charge/object Min storage duration charge Retrieval Fee First byte latency
99.99%
99.9%
>=3
40 KB
90 days
per GB retrieved
select minutes or hours
S3 Glacier
Retrieval times
Use case
- secure, durable and low-cost storage class for data archiving
- retrieval times configurable from minutes to hours
media asset workflows, healthcare information archiving, regulatory and compliance archiving, scientific data storage, digital preservation, magnetic tape replacement
S3 Glacier Deep Archive Availability Availability SLA Availability Zones Min capacity charge/object Min storage duration charge Retrieval Fee First byte latency
99.99%
99.9%
>=3
40 KB
180 days
per GB retrieved
select hours
S3 Glacier Deep Archive
Use case
Amazon S3’s lowest cost storage class where a retrieval time of 12 hours is acceptable
can also be used for backup and disaster recovery use cases, and is a cost-effective and easy-to-manage alternative to magnetic tape systems, whether they are on-premises libraries or off-premises services.
S3 charges
- Storage
- Requests
- Storage Management Pricing
- Data Transfer Pricing
- Transfer Acceleration
S3 Transfer Acceleration
enables fast, easy and secure transfers of files over long distances between end users and an s3 bucket
Transfer acceleration takes advantage of
Amazon’s CloudFront’s globally distributed edge locations. As the data arrives at an edge location - data is routed to Amazon S3 over an optimized network path.
Transfer acceleration process
○ The users upload their files to the edge location instead of directly uploading it to the S3 bucket.
○ Edge location - is a small Data Center that is near the user
○ Once uploaded, it is then sent over amazon’s backbone network
Instead of the user uploading it using their internet access directly to the S3 bucket, they are uploading it to the edge location then amazon have a much better communication between their edge locations and data centers which will be a lot faster.
S3 naming bucket rules
- Bucket names must be unique across all existing bucket names in Amazon S3.
- Bucket names must comply with DNS naming conventions.
- Bucket names must be at least 3 and no more than 63 characters long.
- Bucket names must not contain uppercase characters or underscores.
- Bucket names must start with a lowercase letter or number.
- Bucket names must be a series of one or more labels. Adjacent labels are separated by a single period (.). Bucket names can contain lowercase letters, numbers, and hyphens. Each label must start and end with a lowercase letter or a number.
- Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
MFA Delete
You can turn it on for security
S3 Encryption
- Client Side Encryption
2. Server Side Encryption
3 Server Side Encryptions (Encryption at Rest)
- Server Side encryption with Amazon S3 Managed Keys (SSE-S3)
- Server Side encryption with KMS (SSE-KMS)
- Server side encryption with Customer Provided Keys (SSE-C)
S3 Managed Keys - SSE - S3
- Amazon manages the keys for you automatically so you don’t have to worry about the keys at all
- 256-bit Advanced Encryption Standard (AES-256)
SSE - KMS
You and amazon manage the keys together
SSE-C
Where you give amazon your own keys that you manage and you can encrypt your S3 objects doing that
S3 Versioning, once enabled, can it still be disabled?
you can only suspend it, if you want to delete it, you have to delete the bucket itself
How to restore a deleted file in S3?
Remove the delete marker in the versioning
Versioning integrates with
Lifecycle rules
Cross Region replication requires
versioning to be turned on for the bucket
what will happen to the existing files when you just turn on Cross Region replication
It will not copy existing files, it has to be manually moved
When new files are created after the Cross Region replication has been turned on
they will automatically be replicated in the target bucket
What won’t be replicated in the Target bucket?
- If you delete files from the source bucket, they won’t be replicated in the target bucket
- If you delete individual versions, it won’t get replicated either
CDN
Content delivery network is a system of distributed servers that delivers webpages and other web content to a user based on geographic locations of the user, the origin of the webpage, and a content delivery server
Edge location
location where the contents are cached
Origin
origin of all the files that the CDN will distribute. This can be an S3 bucket, EC2 instance, an Elastic Load Balancer or Route 53.
Distribution
name given the CDN which consists of a collection of Edge Locations
Web Distribution
typically used for websites
RTMP
Used for media streaming
TTL
Time to Live; objects are cached for the life of the TTL
What will happen if you invalidate cached objects?
you will be charged
Snowball
petabye scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS
Snowball sizes
50 TB or 80 TB
AWS Snowball Edge
is a 100TB data transfer service with on board storage and compute capabilities
AWS Snowmobile
Exabyte scale data transfer service used to move extremely large amounts of data to AWS
Snowball can ? to S3
Import to S3 and Export from S3
AWS Storage Gateway
> service that connects an on premise software appliance with cloud based storage to provide seamless and secure integration between an organization’s on-premise IT environment and AWS’s storage infrastructure
a virtual of physical device that’s going to replicate your data into AWS (used to be virtual device but they’ve released a hardware appliance now so you can actually have a physical storage gateway)
How can the AWS Storage Gateway’s software used?
available for download as a virtual machine (VM) image that you install on your datacenter
Storage Gateway supports :
- VMWare ESXi
2. Microsoft Hyper V
3 different types of storage gateway
- File Gateway (NFS)
- Volume Gateway (iSCSI)
- Tape Gateway (VTL)
File Gateway
way of storing files in S3
File Gateway: Files are stored as objects in your S3 buckets, access through:
a network file system (NFS) mount point
Volume Gateway
> presents your applications with disk volumes using the iSCSI block protocol.
Data written to these volumes can be asynchronously backed up as a point in time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots.
Snapshots
are incremental backups that capture only changed blocks. All snapshot storage is also compressed to minimize your storage charges
2 types of Volume Gateway
- Stored Volumes - data written to your stored volumes is stored on your on prem storage hardware then it is asynchronously backed up to S3 in the form of EBS snapshots. (1GB - 16 TB in size for stored volumes)
- Cached Volumes - uses S3 as the primary data storage while retaining frequently accessed data locally in your storage gateway. (1GB- 32 TB)
Tape Gateway
durable cost effective solution to archive your data in the AWS cloud
Tape Gateway is supported by
NetBackup, Backup Exec, Veeam
Are there differences between how Amazon EC2 and Amazon S3 work with Availability Zone-specific resources?
Yes. Amazon EC2 provides you the ability to pick the AZ to place resources, such as compute instances, within a region. When you use S3 One Zone-IA, S3 One Zone-IA assigns an AWS Availability Zone in the region according to available capacity.
Q: Can I have a bucket that has different objects in different storage classes and Availability Zones?
Yes, you can have a bucket that has different objects stored in S3 Standard, S3 Standard-IA and S3 One Zone-IA.
S3 Restore Speed Upgrade
override of an in progress restore to a faster restore tier if access to the data becomes urgent
How much data can I retrieve from Amazon S3 Glacier for free?
You can retrieve 10GB of your Amazon S3 Glacier data per month for free with theAWS free tier.
There are three ways to restore data from Amazon S3 Glacier –
Expedited, Standard, and Bulk Retrieval
What is “Query in Place” functionality?
Amazon S3 allows customers to run sophisticated queries against data stored without the need to move data into a separate analytics platform.
What is S3 Select?
S3 Select is an Amazon S3 feature that makes it easy to retrieve specific data from the contents of an object using simple SQL expressions without having to retrieve the entire object.
What is Amazon Athena?
Amazon Athena is an interactive query service that makes it easy toanalyze data in Amazon S3 using standard SQL queries. Athena is serverless, so there is no infrastructure to setup or manage, and you can start analyzing data immediately.
What is Amazon Redshift Spectrum?
Amazon Redshift Spectrum is a feature of Amazon Redshift that enables you torun queries against exabytes of unstructured data in Amazon S3with no loading or ETL required. When you issue a query, it goes to the Amazon Redshift SQL endpoint, which generates and optimizes a query plan. Amazon Redshift determines what data is local and what is in Amazon S3, generates a plan to minimize the amount of Amazon S3 data that needs to be read, requests Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3.
What are Amazon S3 Event Notifications?
Amazon S3 event notifications can be sent in response to actions in Amazon S3 like PUTs, POSTs, COPYs, or DELETEs. Notification messages can be sent through eitherAmazon SNS,Amazon SQS, or directly toAWS Lambda.
What does it cost to use Amazon S3 event notifications?
There are no additional charges for using Amazon S3 for event notifications. You pay only for use of Amazon SNS or Amazon SQS to deliver event notifications, or for the cost of running an AWS Lambda function.
S3 supports how many requests per second to add data?
3500
S3 supports how many requests per second to retrieve data?
5500
How many buckets can I have per account by default?
100