Developing Storage Solutions with Amazon S3 Flashcards by Thomas Doussau

Differents storage options in AWS

Amazon S3
Amazon S3 Glacier
Amazon Elastic File System (EFS)
Amazon Storage Gateway
Amazon Elastic Block Store (EBS)

How well did you know this?

Not at all

Perfectly

What is S3 Glacier

Low-cost storage service that provides highly secure, durable, and flexible storage for data archiving and online backup

How well did you know this?

Not at all

Perfectly

Type of retrieval in S3 Glacier

Expedited retrievals : 1–5 minutes
Standard retrievals : 3–5 hours
Bulk retrievals : 5–12 hours.

How well did you know this?

Not at all

Perfectly

What is Elastic File System (EFS) and when to use it

Network file system as a service to EC2 instances.

It is designed to meet the performance needs of big data and analytics, media processing, content management, web serving, and home directories.

How well did you know this?

Not at all

Perfectly

What is Storage Gateway and when to use it

Seamless and secure storage integration between an organization’s on-premises IT environment and the AWS storage infrastructure, such as Amazon S3, Amazon S3 Glacier, and Amazon EBS.

AWS Storage Gateway use cases include the following:
• Corporate file sharing
• Enabling existing on-premises backup applications to store primary backups on Amazon S3
• Disaster recovery
• Mirroring data to cloud-based compute resources and then archiving it to Amazon S3 Glacier

How well did you know this?

Not at all

Perfectly

What is Elastic Block Store and when to use it

EBS volumes are network-attached storage that persists independently from the running life of a single EC2 instance. With Amazon EBS, you can also create point-in-time snapshots of volumes, which are stored in Amazon S3.

Amazon EBS typical use cases include the following:
• Big data analytics engines (such as the Hadoop/HDFS ecosystem and Amazon EMR clusters)
• Relational and NoSQL databases (such as Microsoft SQL Server and MySQL or Cassandra and MongoDB)
• Stream and log processing applications (such as Kafka and Splunk)
• Data warehousing applications (like Vertica and Teradata)

How well did you know this?

Not at all

Perfectly

What is Amazon S3 and when to use it

Amazon S3 (simple storage service) provides highly secure, durable, and scalable object storage.

You can use Amazon S3 as a storage solution for use cases such as: 
• Content storage and distribution 
• Backup and archiving 
• Big data analytics 
• Static website hosting 
• Disaster recovery

How well did you know this?

Not at all

Perfectly

Basic components of Amazon S3

The basic components of Amazon S3 are the bucket, objects, keys, and the unique object url.

How well did you know this?

Not at all

Perfectly

Different parts of bucket’s URL and object’s URL

https: //[bucket_name].s3.[region endpoint].amazonaws.com
https: //[bucket_name].s3.[region endpoint].amazonaws.com/[object key]

How well did you know this?

Not at all

Perfectly

Requirements for S3 bucket name

The bucket name must be unique across Amazon S3.

3-63 characters
Lowercase letters, numbers and hyphens (-)
Do not use period (.) which can cause certificate exceptions when accessed with HTTPS
Do not use underscore (_)

A bucket is associated with an AWS Region

How well did you know this?

Not at all

Perfectly

Requirements for object key name

Encoded in UTF-8
Max 1024 bytes

Safe characters : 0-9 a-z A-Z ! - _ . * ‘ ( ) /

Avoid : \ ; : + = @ , ? & $ ` space % < > [ ] # | { } ^ “ ~ non-printable

How well did you know this?

Not at all

Perfectly

Two main types of metadata

System-defined metadata includes information such as object creation date, size, and MD5 digest.

User-defined metadata are name-value pairs assigned when an object is uploaded.
The prefix “x-amz-meta-” is automatically added to the metadata name.

How well did you know this?

Not at all

Perfectly

How does versioning work in S3

An object’s version ID is part of the system-defined metadata.

By default, versioning is disabled in S3 buckets.
• In versioning-disabled buckets, an object has a version ID of null.
• In versioning-enabled buckets, each version of an object has a unique version ID.

How well did you know this?

Not at all

Perfectly

Old path-style vs. Virtual hosted-style URL

Old path-style :
http://[region specific endpoint]/[bucket name]/[object key]

Virtual hosted-style :
http://[bucket name].s3.amazonaws.com/[object key]

How well did you know this?

Not at all

Perfectly

Name of operation to upload object and max size of objects uploaded to S3

Upload an object with PUT

You can upload or copy objects of up to 5 GB in a single PUT operation. Larger object => multipart upload

How well did you know this?

Not at all

Perfectly

How does multipart work

Using multipart upload, you can upload a single object as a set of parts.

You can upload each part separately. If one of the parts fails to upload, you can retransmit that particular part without retransmitting the remaining parts. After all the parts of your object are uploaded to the server, you must send a complete multipart upload request that indicates that multipart upload has been completed. Amazon S3 then assembles these parts and creates the complete object.

You can also stop a multipart upload. When you stop an upload, Amazon S3 deletes all the parts that were already uploaded and frees up storage.

Amazon S3 retains all the parts on the server until you complete or stop the upload. To avoid unnecessary storage costs related to incomplete uploads, complete or stop an upload.

Consider using multipart upload for objects larger than 100 MB

The benefits of multipart uploads

Upload parts in parallel to improve throughput
Recover quickly from network issues
Pause and resume object uploads
Begin an upload before you know the final size of an object

When to use copy operations

Create copies of an object
Rename an object
Move it to a different Amazon S3 location
Update its metadata
Change the storage class of an object from standard to reduced redundancy or vice versa

Name of operation to retrieve an object

GET

You can also retrieve an object in parts by specifying the range of bytes needed.

Name of operation to retrieve data within an object

SELECT

Amazon S3 Select analyzes and processes data within an object in Amazon S3 buckets faster and cheaper. You can retrieve a subset of data from an object in Amazon S3 using simple SQL expressions.

Name of operation to remove an object

DELETE

You can delete a single object or delete multiple objects in a single delete request.

Versioning disabled
In a bucket that is not versioning-enabled, you can permanently delete an object by specifying the key that you want to delete.

Versioning enabled
In a bucket that is versioning-enabled, you can permanently delete an object by invoking a delete request with a key and version ID. To completely remove an object, you must delete each individual version.

How to apply a hierarchy within objects

There is no hierarchy of objects in S3 buckets.

To organize your keys and create a logical hierarchy, you can use delimiters (any string such as / or _) in key names.

If you use prefixes and delimiters to organize keys in a bucket, you can retrieve subsets of keys that match certain criteria. You can list keys by prefix. You can also retrieve a set of common key prefixes by specifying a delimiter.

What are presigned URLs, and how to create them

All objects and buckets are private by default. Presigned URLs are useful if you want your user to be able upload a specific object to your bucket without being required to have AWS security credentials or permissions.

A presigned URL is created with:
• Your security credentials
• Bucket name
• Object key
• HTTP method (PUT for uploading objects, GET for retrieving objects)
• Expiration date and time

Share the pre-signed URL with users who need to access your S3 bucket to put or retrieve objects.

Different techniques to secure data

SSL-encrypted endpoints with HTTPS
Client-side encryption
Server-side encryption

What is SSE-S3

Server-side encryption with Amazon S3-managed keys (SSE-S3) Each object is encrypted with a unique key employing strong multi-factor encryption. It also encrypts the key itself with a primary key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.

What is SSE-KMS

Server-side encryption with AWS KMS-managed keys (Key Management Service) Similar to SSE-S3, with a key management system scaled for the cloud. SSE-KMS also provides you with an audit trail of when your key was used and by whom. Additionally, you can create and manage encryption keys yourself, or use a default key that is unique to you.

What is SSE-C

Server-side encryption with customer-provided keys (SSE-C) You manage the encryption keys. Amazon S3 manages the encryption as it writes to disks and decryption, when you access your objects.

When to use ACL on S3

Amazon S3 access control list (ACL) grants coarse-grained user permissions at the bucket or object level. It enables you to apply the principle of least privilege. You can grant users READ, WRITE, or FULL-CONTROL permissions at the object or bucket level.

When to use bucket policies on S3

A S3 bucket policy is a IAM policy that grants granular permissions to S3 resources. You add a policy to a bucket to grant other AWS accounts or IAM users access permissions for the bucket and the objects in it.

What is CORS for S3

Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one domain to interact with resources in another domain.

Best practices for requests on buckets

Avoid unnecessary requests: - Handle NoSuchBucket errors instead of checking for existence of fixed buckets. - Set the object metadata before uploading an object. - Avoid using the copy operation to update metadata. - Cache bucket and key names if your application design allows it.

Best practices for network latency

- Choose the bucket region closest to latency-sensitive customers. - Compress data stored in Amazon S3. - Use a CDN, such as Amazon CloudFront, to distribute content, with low-latency and high data-transfer speeds.

Best practices for data integrity

- Ensure that data has not been corrupted in transit. | - Check the MD5 checksum of the object retrieved from the GET and PUT operation.

Differences between 400 and 500 error codes

A 400-series error code indicates that you cannot perform the operation. - BucketAlreadyExists - InvalidBucketName A 500-series error code indicates that you may retry the operation. - InternalError - SlowDown With an exponential backoff algorithm, you specify progressively longer waits after each failed attempt before retrying your request