Developing Storage Solutions with Amazon S3 Flashcards
Differents storage options in AWS
Amazon S3 Amazon S3 Glacier Amazon Elastic File System (EFS) Amazon Storage Gateway Amazon Elastic Block Store (EBS)
What is S3 Glacier
Low-cost storage service that provides highly secure, durable, and flexible storage for data archiving and online backup
Type of retrieval in S3 Glacier
- Expedited retrievals : 1–5 minutes
- Standard retrievals : 3–5 hours
- Bulk retrievals : 5–12 hours.
What is Elastic File System (EFS) and when to use it
Network file system as a service to EC2 instances.
It is designed to meet the performance needs of big data and analytics, media processing, content management, web serving, and home directories.
What is Storage Gateway and when to use it
Seamless and secure storage integration between an organization’s on-premises IT environment and the AWS storage infrastructure, such as Amazon S3, Amazon S3 Glacier, and Amazon EBS.
AWS Storage Gateway use cases include the following:
• Corporate file sharing
• Enabling existing on-premises backup applications to store primary backups on Amazon S3
• Disaster recovery
• Mirroring data to cloud-based compute resources and then archiving it to Amazon S3 Glacier
What is Elastic Block Store and when to use it
EBS volumes are network-attached storage that persists independently from the running life of a single EC2 instance. With Amazon EBS, you can also create point-in-time snapshots of volumes, which are stored in Amazon S3.
Amazon EBS typical use cases include the following:
• Big data analytics engines (such as the Hadoop/HDFS ecosystem and Amazon EMR clusters)
• Relational and NoSQL databases (such as Microsoft SQL Server and MySQL or Cassandra and MongoDB)
• Stream and log processing applications (such as Kafka and Splunk)
• Data warehousing applications (like Vertica and Teradata)
What is Amazon S3 and when to use it
Amazon S3 (simple storage service) provides highly secure, durable, and scalable object storage.
You can use Amazon S3 as a storage solution for use cases such as: • Content storage and distribution • Backup and archiving • Big data analytics • Static website hosting • Disaster recovery
Basic components of Amazon S3
The basic components of Amazon S3 are the bucket, objects, keys, and the unique object url.
Different parts of bucket’s URL and object’s URL
https: //[bucket_name].s3.[region endpoint].amazonaws.com
https: //[bucket_name].s3.[region endpoint].amazonaws.com/[object key]
Requirements for S3 bucket name
The bucket name must be unique across Amazon S3.
3-63 characters
Lowercase letters, numbers and hyphens (-)
Do not use period (.) which can cause certificate exceptions when accessed with HTTPS
Do not use underscore (_)
A bucket is associated with an AWS Region
Requirements for object key name
Encoded in UTF-8
Max 1024 bytes
Safe characters : 0-9 a-z A-Z ! - _ . * ‘ ( ) /
Avoid : \ ; : + = @ , ? & $ ` space % < > [ ] # | { } ^ “ ~ non-printable
Two main types of metadata
System-defined metadata includes information such as object creation date, size, and MD5 digest.
User-defined metadata are name-value pairs assigned when an object is uploaded.
The prefix “x-amz-meta-” is automatically added to the metadata name.
How does versioning work in S3
An object’s version ID is part of the system-defined metadata.
By default, versioning is disabled in S3 buckets.
• In versioning-disabled buckets, an object has a version ID of null.
• In versioning-enabled buckets, each version of an object has a unique version ID.
Old path-style vs. Virtual hosted-style URL
Old path-style :
http://[region specific endpoint]/[bucket name]/[object key]
Virtual hosted-style :
http://[bucket name].s3.amazonaws.com/[object key]
Name of operation to upload object and max size of objects uploaded to S3
Upload an object with PUT
You can upload or copy objects of up to 5 GB in a single PUT operation. Larger object => multipart upload
How does multipart work
Using multipart upload, you can upload a single object as a set of parts.
You can upload each part separately. If one of the parts fails to upload, you can retransmit that particular part without retransmitting the remaining parts. After all the parts of your object are uploaded to the server, you must send a complete multipart upload request that indicates that multipart upload has been completed. Amazon S3 then assembles these parts and creates the complete object.
You can also stop a multipart upload. When you stop an upload, Amazon S3 deletes all the parts that were already uploaded and frees up storage.
Amazon S3 retains all the parts on the server until you complete or stop the upload. To avoid unnecessary storage costs related to incomplete uploads, complete or stop an upload.
Consider using multipart upload for objects larger than 100 MB
The benefits of multipart uploads
- Upload parts in parallel to improve throughput
- Recover quickly from network issues
- Pause and resume object uploads
- Begin an upload before you know the final size of an object
When to use copy operations
- Create copies of an object
- Rename an object
- Move it to a different Amazon S3 location
- Update its metadata
- Change the storage class of an object from standard to reduced redundancy or vice versa
Name of operation to retrieve an object
GET
You can also retrieve an object in parts by specifying the range of bytes needed.
Name of operation to retrieve data within an object
SELECT
Amazon S3 Select analyzes and processes data within an object in Amazon S3 buckets faster and cheaper. You can retrieve a subset of data from an object in Amazon S3 using simple SQL expressions.
Name of operation to remove an object
DELETE
You can delete a single object or delete multiple objects in a single delete request.
Versioning disabled
In a bucket that is not versioning-enabled, you can permanently delete an object by specifying the key that you want to delete.
Versioning enabled
In a bucket that is versioning-enabled, you can permanently delete an object by invoking a delete request with a key and version ID. To completely remove an object, you must delete each individual version.
How to apply a hierarchy within objects
There is no hierarchy of objects in S3 buckets.
To organize your keys and create a logical hierarchy, you can use delimiters (any string such as / or _) in key names.
If you use prefixes and delimiters to organize keys in a bucket, you can retrieve subsets of keys that match certain criteria. You can list keys by prefix. You can also retrieve a set of common key prefixes by specifying a delimiter.
What are presigned URLs, and how to create them
All objects and buckets are private by default. Presigned URLs are useful if you want your user to be able upload a specific object to your bucket without being required to have AWS security credentials or permissions.
A presigned URL is created with:
• Your security credentials
• Bucket name
• Object key
• HTTP method (PUT for uploading objects, GET for retrieving objects)
• Expiration date and time
Share the pre-signed URL with users who need to access your S3 bucket to put or retrieve objects.
Different techniques to secure data
SSL-encrypted endpoints with HTTPS
Client-side encryption
Server-side encryption
What is SSE-S3
Server-side encryption with Amazon S3-managed keys (SSE-S3)
Each object is encrypted with a unique key employing strong multi-factor encryption. It also encrypts the key itself with a primary key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.
What is SSE-KMS
Server-side encryption with AWS KMS-managed keys (Key Management Service)
Similar to SSE-S3, with a key management system scaled for the cloud. SSE-KMS also provides you with an audit trail of when your key was used and by whom. Additionally, you can create and manage encryption keys yourself, or use a default key that is unique to you.
What is SSE-C
Server-side encryption with customer-provided keys (SSE-C)
You manage the encryption keys. Amazon S3 manages the encryption as it writes to disks and decryption, when you access your objects.
When to use ACL on S3
Amazon S3 access control list (ACL) grants coarse-grained user permissions at the bucket or object level. It enables you to apply the principle of least privilege. You can grant users READ, WRITE, or FULL-CONTROL permissions at the object or bucket level.
When to use bucket policies on S3
A S3 bucket policy is a IAM policy that grants granular permissions to S3 resources. You add a policy to a bucket to grant other AWS accounts or IAM users access permissions for the bucket and the objects in it.
What is CORS for S3
Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in another domain.
Best practices for requests on buckets
Avoid unnecessary requests:
- Handle NoSuchBucket errors instead of checking for existence of fixed buckets.
- Set the object metadata before uploading an object.
- Avoid using the copy operation to update metadata.
- Cache bucket and key names if your application design allows it.
Best practices for network latency
- Choose the bucket region closest to latency-sensitive customers.
- Compress data stored in Amazon S3.
- Use a CDN, such as Amazon CloudFront, to distribute content, with low-latency and high data-transfer speeds.
Best practices for data integrity
- Ensure that data has not been corrupted in transit.
- Check the MD5 checksum of the object retrieved from the GET and PUT operation.
Differences between 400 and 500 error codes
A 400-series error code indicates that you cannot perform the operation.
- BucketAlreadyExists
- InvalidBucketName
A 500-series error code indicates that you may retry the operation.
- InternalError
- SlowDown
With an exponential backoff algorithm, you specify progressively longer waits after each failed attempt before retrying your request