Module 4: Adding a storage layer with S3 Flashcards
As a cloud engineer working with S3 :
Consider access pattern and use cases to choose the correct configuration options, while:
=> Optimising Cost
=> Supporting performance
=> Compliance
And as always, security best practices to protect the resources
Type of storage
Block storage
Hierarchical storage (file storage)
Object storage
Block storage
data stored in fixed block. The application ships the block and store them where is the most efficient. Blocks can be stored accross servers and on different OS.
File storage
File storage creates a shared file system. The data is stored in a hierarchical structure. Similar to One drive for example
Object storage
Object storage stores files as object based on attributes and meta data.
An object is data, metadata and a key.
he object key is the unique identifier of the object. When you update an object the entire object is updated
Difference object storage vs block storage
In object storage, the entire object must be updated when there is a change to the data. While in Block storage, only part of the data can be changed.
Amazon Simple Storage Services (S3)
Object storage. Stores massive amounts of unstructured data
Type of storage and what is it stored in with S3
Object storage, stored in buckets
Max size of a single object?
5TB
indentifier in S3
Unique URL for each object (universal namespace)
Component of objects
key, version ID, value, metadata and subresources
What does immutable means
It’s the charateristic of an object. You can’t change part of it you have to change the whole object outside of S3
What are buckets for?
Container of objects. They organize the Amazon S3 namespace and identify the account in charge of the objects stored in it.
Bucket Geography
Regional. Objects stored in a bucket never leave the region unless they are explicitely transfered to another region
what is a prefix in a bucket ?
similar to a path name, when querying for a prefix it will return the files with a similar path name /photos/2022 for example
S3 Benefits
Durability
Availability
High performance
How Durable is S3?
S3 standard storage has 11 nines (99.999999999% of durability) meaning that every year there is a 0.000000001 percent chance of losing an object.
Why is S3 so durable?
S3 redundantly stores objects on multiple devices accross multiple facilities in the designated region. It detects and repair failures by comparing files stored in different places. It verifies the integrity of the data by using check sums
How available is S3?
S3 provides 4 nines of availability 99.99%. Meaning the ability to access the data quickly when you want it. Out of 10000 request one would not succeed. It is also scalable (unlimited storage) and gives the ability to encrypt data.
Why is S3 high performing?
thousands of transactions per second. Scales to high requests.
S3 use cases:
1) Host web content: Use high availability and high performance to address fluctuating and potentially high traffic to the data
2) Static site: Simple storage of html files, videos images…
3) financial (or other) analysis: Stor data that other services can use for analysis
4) Disaster recovery
S3 example of media hosting
S3 caching video content through cloud front to make data available more quickly to a user streaming it, whil another user downloads it directly.
Do you need to provision storage for s3?
No it scales at need
Static vs dynamic website
on static website the content is statiy and might include client side script. On dynamic website it relies on server side scripts such as PHP, JSP or ASP. S3 does not support server-side scripting. Other AWS services do.
Static website with S3
You can host everything on S3 for a static website no need to have a server, nor a virtual machine
S3 for analysis
Load the raw data in a bucket. Use ETL tool to transform it. (provision an EC2 server for that and use Splot fleet or EMR cluster). Return the transformed data to a new bucket. terminate the instance used for ETL. Perform your analysis on the objects stred into the second bucket. (Athena or Quicksight given as example for analysis)
S3 for disaster recovery
Store everything in one s3 bucket. Replicate in another one in another region. Additionnally you can move long term data to S3 Glacier.
What is cross region replication
Duplicate data in another Bucket, in another region
What permission do I need to store something in S3
Write permissions
Objects encrypted by default in S3 (true/False)
True and false. Encrypted with server side encryption at upload and decrypted at download but must be enabled
4 ways to upload on S3
AWS management console
AWS Command Line Interface (AWS CLI)
AWS Softwae Devlopment Kit (SDK)
Amazon S3 Rest API
Uploading an object through the console
Use a wizard (UX) based approach to move data in or out of S including drag and drop option. the limit for the management console is 160 GB.
File size limitation to upload through the management console?
160GB
For size larger than 160GB?
Use CLI, SDK, or Rest API
CLI and S3?
Use command line interface to prompt for an upload or download through a script
SDK and S3
Programmatically code the access to S3 in your applications
API and S3
Use put request to upload and get request to download. API access can be embeded into application codes.
You want to upload a big file to S3 or a file for which you know there is a chance of failure in the upload. What can you use?
Multipart upload
What is multipart upload?
The object is separated into mulitple part, reassembled and then stored into a bucket
Advantages of multipart upload?
Improved throughput: Uploaded in parallel means quicker storage
Recover quickly from network issues
Pause and resume upload.
Begin an upload as the object is still being built