Module 5 - Storage & Transfer Flashcards
What are the 3 kinds of storage? What AWS services handle each?
Block (EBS, instance store)
File (AWS EFS, FSx)
Object (S3, Glacier)
What are some ways to migrate data online?
AWS Storage Gateway Kinesis (Firehose and Streams) DataSync S3 Transfer Acceleration AWS Direct Connect
What are some ways to migrate data offline?
AWS Snow family
What is AWS S3? How do you change data?
Simple Storage Service. It is object-level storage, meaning if you want to change a part of a file, you must make the change and then reupload the entire modified file. Max 5TB per file.
How can you access S3?
Through the web-based AWS Management Console,
or programmatically through the API and SDKs
What is an S3 object made up of?
- key (full path to file including folders),
- the file itself (value),
- version ID (if enabled)
- any metadata that describes the file (key/value pairs),
- tags (Unicode key/value pair, for security/lifecycle)
Identify the parts of this S3 URL
http://doc.s3-us-west-1.amazonaws.com/2006-03-01/AmazonS3.html
“doc” is the NAME of the bucket
“2006-03-01/AmazonS3.html” is the KEY
(“2006-03-01/” in the object key is called the PREFIX)
How do buckets work?
How many buckets can an account have?
• Objects are stored in buckets.
• 1 AWS account can have 1-100 buckets.
• You can choose one REGION and control access.
You can access bucket logs.
How do you control access to a bucket? What is the default access to a bucket?
Everything is PRIVATE by default. The account that created the resource can grant permissions by writing access policies (CONTROLLED ACCESS).
You can also make it PUBLIC if necessary (uncommon).
How do you make sure your buckets are never exposed to public access?
Turn on “block all public access” at the account level. These settings apply account-wide for all current and future buckets.
What are the ways that access is granted to a bucket?
ACLs and bucket policies.
How can you recover objects from accidental deletion or overwrite?
1) Enable versioning on your bucket.
2) S3 Object Lock - uses the write once, read many (WORM) model.
• Use retention periods for locking an object for a fixed period of time
• use Legal Hold for a lock until explicitly removed.
What are the different S3 storage classes? What are they used for?
- S3 Standard - general-purpose storage of frequently accessed data
- S3 Intelligent-Tiering for data with unknown or changing access patterns (uses ML to determine your needs)
- S3 Standard-Infrequent Access (S3 Standard-IA) and S3 One Zone-Infrequent Access (S3 One Zone-IA which is cheaper still) for long-lived, but less frequently accessed data. Cheap if you access less than once a month. 50% less than standard. There is a retrieval fee.
- S3 Glacier and S3 Glacier Deep Archive for long-term archive and digital preservation
What is an archive?
Any object stored in a vault in S3 Glacier. It has a unique ID and optional description. When you store it, Glacier returns a regionally unique archive ID.
What do you use to manage S3 Glacier vaults?
Via the management console to create and delete.
For everything else, use the CLI, REST API, or SDKs
What is a vault?
A container for storing archives.
You specify its name and region.
You can lock it with Vault Lock. (for compliance; data and lock can’t be removed)
What are the retrieval times for S3 Glacier?
Instant retrieval - milliseconds
Flexible Retrieval:
•Expedited - 1-5 minutes
• Standard - 3-5 hours
• Bulk - 5-12 hours
Deep Archive:
• standard - 12 hours
• bulk - 48 hours
What is an S3 Lifecycle Policy in Lifecycle Management?
An automated system to move (transition) or delete (expire) your data based on age. (Saves you money on storage).
You can set rules per object or per bucket.
Works with versioning.
What are 3 ways to encrypt data at rest in S3?
1) SSE-S3: Server-side encryption (SSE) with Amazon S3-managed keys. AWS handles the key, uses AES-256 algorithm. How? Put in the header: “x-amz-server-side-encryption”:”AES256”
2) SSE-KMS: Server-side encryption with AWS KMS keys (KMS keys) stored in AWS KMS. Envelope encryption, you and AWS manage the keys. Why? You can control who has access and you get an audit trail. Put in the header: “x-amz-server-side-encryption”:”KMS”
3) SSE-C: Server-side encryption with customer-managed keys. You manage the keys. Must use HTTPS. CLI only. Must include key in header because it’s discarded every time.
How does S3 handle replication?
All data is replicated in at least 3 AZs (except for S3 One Zone-IA).
What is S3 Transfer Acceleration? When would you choose this?
It’s a way to move data faster over long distances. It uses CloudFront global edge locations with a distinct URL (…s3-accelerate…). Once it’s uploaded, it is automatically routed to S3 using an optimized network path (AWS backbone network).
You only get charged if there is a performance improvement. Enabled at the bucket level.
Good for when :
• you have customers worldwide using the same bucket
• you transfer giga- or terabytes of data worldwide
⭐️ What is S3 multipart upload? When would you want to do this?
This is a way to break large objects up into manageable parts. Once they are uploaded, S3 reassembles the object. You can’t do this with the console. Recommended for files > 100 MB. Required if > 5 GB.
Use for:
• Improved throughput: You can upload parts in parallel to improve throughput.
• Quick recovery from any network issues: Smaller part sizes minimize the impact of restarting a failed upload due to a network error.
• Pausing and resuming object uploads: You can upload object parts over time. When you have initiated a multipart upload, there is no expiration. You must explicitly complete or cancel the multipart upload.
• Beginning an upload before you know the final object size: You can upload an object as you are creating it.
• Uploading large objects: Using the multipart upload API, you can upload large objects, up to 5 TB.
What are S3 Access Points?
Named network endpoints that you can use to perform S3 object operations, such as GetObject and PutObject.
They each have their own policy for permissions and network controls.
These only work for objects, not S3 operations like modifying buckets.
What is Object Lambda?
You add your own code to process data from a GET request Access Point.
E.g. convert data format (XML to JSON), resize images, augment data with another service, etc.
What are the costs for S3? What are the factors that change pricing?
STARRL
• Storage pricing
• Request and data retrieval pricing (only transfer OUT to other regions or the internet; only PUT, COPY, POST, LIST, GET requests)
• Data transfer and Amazon S3 Transfer Acceleration pricing
• Data management and analytics pricing
• S3 Replication pricing
• Processing with S3 Object Lambda
When is S3 a good choice?
Use when:
• Need to write once, and read many times.
• Have a large number of users.
• Have growing data sets.
• Have spiky access to data (in this case, use S3 Standard and S3 Intelligent-Tiering.)
What storage should I use when multiple instances need to use the same file storage?
You could use EBS, but only up to 16 attachments.
You could use S3 but because it is object-store you don’t have the high-performance and read/write capacity of file storage systems so it’s not ideal.
If you need high throughput changes to files of different sizes, EFS is best.
What is AWS EFS?
Elastic File System. It’s a managed Network File System, meaning that lots of instances in different AZs can connect to it.
No need to provision for capacity, it scales automatically. Pay per use.
Highly durable, scalable, and expensive. When a client makes a request, EFS routes to the “mount target” in the AZ closest to the client.
Only for Linux-based AMIs (POSIX). Great for CMS, WordPress, content sharing, web serving
What is FSx?
FSx for Windows and FSx for Lustre (Linux) are managed services that handle 3rd party file systems for you. Integrates with S3. Links long-term storage with high-performance file systems.
What are the options for data migration in the Snow family?
For all: Transfer data and ship back to AWS, stores data in your bucket. (For uploading OR downloading)
If it takes longer than a week over the network, use snow!
100TB over 1gbps = 12 days
- Snow Cone - small, portable data storage device 8Tb
- SnowBall Edge - 80TB device (storage or compute optimized). ❗️Has computing options, can pre-process data.
- Snowmobile - shipping container on a semi. 100 PB.