Module 4: Adding a storage layer with S3 Flashcards
As a cloud engineer working with S3 :
Consider access pattern and use cases to choose the correct configuration options, while:
=> Optimising Cost
=> Supporting performance
=> Compliance
And as always, security best practices to protect the resources
Type of storage
Block storage
Hierarchical storage (file storage)
Object storage
Block storage
data stored in fixed block. The application ships the block and store them where is the most efficient. Blocks can be stored accross servers and on different OS.
File storage
File storage creates a shared file system. The data is stored in a hierarchical structure. Similar to One drive for example
Object storage
Object storage stores files as object based on attributes and meta data.
An object is data, metadata and a key.
he object key is the unique identifier of the object. When you update an object the entire object is updated
Difference object storage vs block storage
In object storage, the entire object must be updated when there is a change to the data. While in Block storage, only part of the data can be changed.
Amazon Simple Storage Services (S3)
Object storage. Stores massive amounts of unstructured data
Type of storage and what is it stored in with S3
Object storage, stored in buckets
Max size of a single object?
5TB
indentifier in S3
Unique URL for each object (universal namespace)
Component of objects
key, version ID, value, metadata and subresources
What does immutable means
It’s the charateristic of an object. You can’t change part of it you have to change the whole object outside of S3
What are buckets for?
Container of objects. They organize the Amazon S3 namespace and identify the account in charge of the objects stored in it.
Bucket Geography
Regional. Objects stored in a bucket never leave the region unless they are explicitely transfered to another region
what is a prefix in a bucket ?
similar to a path name, when querying for a prefix it will return the files with a similar path name /photos/2022 for example
S3 Benefits
Durability
Availability
High performance
How Durable is S3?
S3 standard storage has 11 nines (99.999999999% of durability) meaning that every year there is a 0.000000001 percent chance of losing an object.
Why is S3 so durable?
S3 redundantly stores objects on multiple devices accross multiple facilities in the designated region. It detects and repair failures by comparing files stored in different places. It verifies the integrity of the data by using check sums
How available is S3?
S3 provides 4 nines of availability 99.99%. Meaning the ability to access the data quickly when you want it. Out of 10000 request one would not succeed. It is also scalable (unlimited storage) and gives the ability to encrypt data.
Why is S3 high performing?
thousands of transactions per second. Scales to high requests.
S3 use cases:
1) Host web content: Use high availability and high performance to address fluctuating and potentially high traffic to the data
2) Static site: Simple storage of html files, videos images…
3) financial (or other) analysis: Stor data that other services can use for analysis
4) Disaster recovery
S3 example of media hosting
S3 caching video content through cloud front to make data available more quickly to a user streaming it, whil another user downloads it directly.
Do you need to provision storage for s3?
No it scales at need
Static vs dynamic website
on static website the content is statiy and might include client side script. On dynamic website it relies on server side scripts such as PHP, JSP or ASP. S3 does not support server-side scripting. Other AWS services do.
Static website with S3
You can host everything on S3 for a static website no need to have a server, nor a virtual machine
S3 for analysis
Load the raw data in a bucket. Use ETL tool to transform it. (provision an EC2 server for that and use Splot fleet or EMR cluster). Return the transformed data to a new bucket. terminate the instance used for ETL. Perform your analysis on the objects stred into the second bucket. (Athena or Quicksight given as example for analysis)
S3 for disaster recovery
Store everything in one s3 bucket. Replicate in another one in another region. Additionnally you can move long term data to S3 Glacier.
What is cross region replication
Duplicate data in another Bucket, in another region
What permission do I need to store something in S3
Write permissions
Objects encrypted by default in S3 (true/False)
True and false. Encrypted with server side encryption at upload and decrypted at download but must be enabled
4 ways to upload on S3
AWS management console
AWS Command Line Interface (AWS CLI)
AWS Softwae Devlopment Kit (SDK)
Amazon S3 Rest API
Uploading an object through the console
Use a wizard (UX) based approach to move data in or out of S including drag and drop option. the limit for the management console is 160 GB.
File size limitation to upload through the management console?
160GB
For size larger than 160GB?
Use CLI, SDK, or Rest API
CLI and S3?
Use command line interface to prompt for an upload or download through a script
SDK and S3
Programmatically code the access to S3 in your applications
API and S3
Use put request to upload and get request to download. API access can be embeded into application codes.
You want to upload a big file to S3 or a file for which you know there is a chance of failure in the upload. What can you use?
Multipart upload
What is multipart upload?
The object is separated into mulitple part, reassembled and then stored into a bucket
Advantages of multipart upload?
Improved throughput: Uploaded in parallel means quicker storage
Recover quickly from network issues
Pause and resume upload.
Begin an upload as the object is still being built
S3 transfer acceleration
Bucket level feature that optimizes transfer speeds . Uses Cloudfront and edge location to optimize the network path
Why use trnasfer accelerations?
Your customers upload to a centralized bucket?
You transfer gigabytes or terabytes regularly accross continents?
You can’t use all your bandwith when uploading?
About acceleration: The further from the S3 bucket
The better the acceleration
AWS transfer Family
Fully managed trasnfer service
For what services is AWS transfer Family available?
Amazon S3 Storage
Amazon Elastic File System (EFS)
Network File System (NFS)
Protocoles supported by the Transfer Family
Secure Shell (SSH)
File Transfer Protocol
Secure File Transfer Protocol
Applicability Statement 2 (AS2)
Transfer Family Benefits
Managed service that scales
you don’t need to modify your app or run file transfer protocol infrastructure.
everythinng is managed and included into the AWS family
Only pay for what you use
Use case of transfer family for S3
Data lakes for upload from third parties
Subscription based data distribution with customers
Internal transfers within org
Use case of transfer family for Elastic File System (EFS)
Data distribution
Supply Chain
Content Management
Web Serving application
Type of S3 storage
General Purpose
Intelligent Tiering
Infrequent access
Archive
S3 General purpose
Suitable for frequent access due to high availability and low latency. Durability accross at least 3 AZ
S3 Intelligent tiering
Automatically adjust the storage type of the objects, depending on access frequency to move it to the most cost effective tier.
S3 Infrequent access
Standard infrequent access:
Similar to Stabdard but run on another cost model. There is a standard 30 days storage fee and the cost is higher to retreive the data.
One Zone Infrequent access:
low cost opiton, availability and resiliency not so needed. Good choice for secondary back up that you can recreate, or back up from another region.
S3 Archive
Glacier instant retrieval: Rarely access data needing still to be accessed rapidly
Glacier Flexible retrieval: Needs the possiblity to access large dataset 1-2 times a year. some latency in accessing the data
Glacier deep archive: Long term retention for rarely accessed data. Good for customer needing to keep older data for compliance
S3 on outpost: S3 infrstructure for data that needs to be stored close to the customer. Kind of renting the hardware and having it on permise. So not quite cloud. (If I get it right)
Storage duration charge for Infrequent Access ?
30 days
Storage duration charge for Glacier ?
90 days or 180 days for deep archive
Number of AZ with S3
> = 3 except S One Zone IA where it’s one.
S3 and retrieval charges
Retrieval charges per GB retrieved apply except for standard and intelligent tiering
What is an S3 lifecycle configuration ?
It’s a policy determining the transition of an object from one storage class to another. E.g: No access over the last 30 days => Infrequent access.
No access and object last access more than x month ago => deletion
Lifecycle transition or expiration have associated costs
Type of lifecycle operations ?
Transition
Or
Expiration
At objetct or Bucket level
Advantage of lifecycle on S3
The cycling reduces the cost as you pay less for data the further it loses in relevance for you.
Lifecycle use case
- Delete automatically logs after 30 days
- Documents are stored in standard for 60 days, in infrequently accessed for 1 year, in Glacier for 7 years, then deleted
S3 Versioning use case
Protect for accidental overwrite and delets
Enables recovery
At what level is versioning enabled
At the bucket level
How does versioning works
Each object has a Version ID and new publication of the object increment the version id by 1. The previous object is not overwritten. When deleted Amazon simply adds a “deleted” marker. But the object remains.
Is versioning enabled by default ?
No
What mechanism allows for object retrieval in Versioning?
The version ID
Can I recover the object if versioning is Suspended?
No
Can I recover a deleted object with versioning ?
Yes
What is the cost of versioning
None except for storage cost
What issue may you face trying to get an object if the most recent version of it has a delete marker ?
It will not succeed and return a 404 not Found error.
If you use a GET request specifying the version then you can access the object
How to permanantly delete an object when versioning is active ?
You must be the owner of the bucket and specify the version of the object you want to delete.
What is the meaning of CORS
Cross Origin Resource Sharing
What is Cross Origin Resource Sharing?
It’s an XML document in which are written:
The origins: Resources enabled to access your document
The Operations (HTTP methods) that will support each origin
Additional operation specific information
What is Cross Origin Resource Sharing used for
It’s a way for client web application to access storage of another application
Example of CORS
You have a web font that you use for a website. You want another one to access this resources you create a CORS allowing your second website to access the ressource of the first websote
What is strong consistency
A mechanism ensuring that object put on the bucket are consistent with what has been effectively transfered. Read-after-write. Allows to not have to make the checks yourself and provision the infrastructure to do it.
It simplifies the migration of on permises workloads.
It is by default
Outside of the by default capacity of S3 for strong consistency another Amazon Service allows for consistency control. What’s the name of the service ?
S3Guard
S3 default security configuration
Objects are private and protected by default
Encryption is configured by default
Default encryption: S3 managed keys (SSE-S3)
When sharing S3 access
Manage and control the access.
Use least priviledge principle
Are new S objects encrypted in transit?
NO but they are encrypted at rest
Default S3 encrpytion
S3 Managed Keys (SSE-S3)
Can I use another encryption than SSE-S3 ?
Yes use AWS KMS (Key management Services) for:
Server side encryption (SSE-KMS)
OR
Dual layer server side encryption (DSSE-KMS)
OR
Customer provided key (SSE-C)
Can I protect data in transit ?
Yes but yo need client side encryption for that. It happens before being transfered to S3
Tools for protecting Buckets and object
Block public access option
IAM policies
ACL (Access Control Lists)
S3 Access Point
Preassigned URL (Time limited URL=
AWS Trusted advisor (provides bucket permission Check)
ACL vs IAM
ACL predates IAM. Prefer IAM or be extra mindful or your ACL setup.
Region choice for storage
Data privacy laws and compliance
Proximity of users
Service availability in the region
Cost effectinveness
What is S3 inventory for?
Help manage storage
Use it to audit and report.
You can set up weekly reports and exports in different file formats (CSV, ORC….)
Can I query S3 Inventory through a DBS?
Yes with Athena, or redshift for example but also other tools…
Default pricing of S3
Pay for what you use:
Storage:
Per GB of objects stored per month.
Different pricing for region and storage class
Operation:
PUT; COPY; POST; LIST and lifecylce transition
S3 has no charge for transfer
1)Out to the internet for up to 100GB a month
2) In from the internet
3) Between S3 Buckets
4) From an S3 bucket to any AWS service with the sae Region
5) Out to Cloud fromt
Additional cost for intelligent tiering
Monthly monitoring and automation charge for each object
S3 cost depends on
Object size, storage duration, storage class
What are Ingest charges
Cost associated to request with PUT, COPY, POST or LIST request. Plus Lifecycle operations
Encryption fees in S3.
No Fees for standard SSE-S3 or SSE-C Pay for encryption when using AWS KMS.
DSSE-KMS includes further charges for the second encryption layer
Free tier in S3
Gb of storage. 20000 GET;2000 PUT,COPY, POST or LIST; 100 GB of data Transfer each month
Well architected best practices,
Security Pillar for S3
Enfore Encryption at Rest
Enfore Access Control
Well architected best practices,
Performance Efficiencyr for S3
Learn About and understand availavble cloud servicies and features
Factor cost into architectual decisions
Well architected best practices,
Cost Optimization for S3
Perform cost analysis for different usage over time
Reliability
Select the appropriate location and multi-lication devlopment if appropriate