AWS Hello, Storage Concepts Flashcards

Question 1

Q

Data Dimension

Answer

A

3 V’s of big data.
Consider the storage mechanism most suitable for a particular workload. NOT a single data store for the entire system.
- Right tool for the right job

Question 2

Q

Highly structured data

Answer

A

Has a pre-defined schema.
Ex: Relational database
Each entity of the same type has the same number of attributes and the domain of allowed values for an attribute can be further constrained.
Advantages: self-described nature

Question 3

Q

Loosely structured data

Answer

A

Has entities, which have attribute / fields
Field uniquely identifies an entity
However, attributes are not required to be the same in every entity
Result: data more difficult to analyst and process in an automated fashion. Higher burden of reasoning about the data on the consumer or application.

Question 4

Q

Unstructured data

Answer

A

Does not have sense or structure.
No entities or attributes
Can contain useful information.
Result: any useful information must be extracted from consumer

Question 5

Q

BLOB data

Answer

A

Useful as a whole
But little benefit trying to extract value from a piece or attribute.
Result: systems that store BLOB treat as a “black box” to store/retrieve as a whole.

Question 6

Q

Data Temperature

Answer

A

Another useful way to look at data to determine the right storage for application
Helps us to understand how “lively” data is (how much is being written/read and how soon it needs to be available)
Ex: Hot, Warm, Cold, Frozen
The same data can start hot and gradually cool.
When this happens, tolerance of read latency increases as does data set size.

Question 7

Q

Data value

Answer

A

Some data must be preserved at all costs, other data can be easily regenerated or even lost without significant impact.
Value of data will impact the investment in durability.
3.

Question 8

Q

Data value tip!

Answer

A

To optimize cost and/or performance further, segment data within each workload by value and temperature, and consider different data storage options for different segments.

Question 9

Q

Data dimensions tip!

Answer

A

Think in terms of a data storage mechanism that is most suitable for a particular workload - not a single data store for the entire system. Choose the right tool for the job.

Question 10

Q

Storage tip - One size does not fit all!

Answer

A

Know the availablity, level of durability, and cost factors for each storage option and how they compare.

Question 11

Q

AWS Shared Responsibility Model and Storage

Answer

A

AWS: responsible for securing the storage services
Developer/customer: responsible for securing access to and using encryption on artifacts you create/store.
Best practice to always use principle of least privilege.

Question 12

Q

CIA model

Answer

A

Confidentiality, Integrity, Availablity forms the fundamentals of information security. These should be applied to AWS storage.
Availablity (1) sits on top of Integrity (2) and Confidentiality (3) to form “Information Security”

Question 13

Q

EBS characteristics

Answer

A

EBS presents data to EC2 instance as a disk volume.
Provides lowest-latency access to your data from single EC2 instances.
EBS provides durable, persistent block storage volume for use with EC2 instances.
Automatically replicated within AZ (offering high availablity and durability)
Offers consitent low-latency performance.
Can scale up and down within minutes. Pay for what you provision

Question 14

Q

Typical use cases for EBS

Answer

A

Boot volumes on EC2 instances
Relational / NoSql databases
Steam and log processing aapplications
Data warehousing applications
Big data analytics engines (Hadoop) and Amazon EMR clusters.

Question 15

Q

EBS designed to achieve:

Answer

A

Availablity 99.999%
Durability of replication within a single AZ.
Annual failure rate (AFR) between 0.1 - 0.2 percent

Question 16

Q

EBS Volume attributes

Answer

A

Persist independently from the running life of an EC2 instance. (After EBS is attached to an instance, use it like any other physical hard drive.)
Very flexible. (Current generation volumes attached to current generation instance types, can dynamically increase size, modify provisioned input/output operations per second (OPS) capacity, and change the volume type on live production volumes.

Question 17

Q

EBS Volume types

Answer

A

SSD-backed volumes

Question 18

Q

SDD Use Cases

Answer

A

GENERAL PURPOSE: recommended for most workloads.
- System boot volumes.
- Virtual Desktops
- Low-latency interactive.
- Apps.
- Development and test environments.
PROVISIONED IOPS:
- I/O intensive workloads
- Relational DBs
- NoSql DBs

Question 19

Q

HDD Use Cases

Answer

A

THROUGHPUT-OPTIMIZED:
- Streaming workloads requiring consistent, fast throughput at a low price
- Big data
- Data warehouse
- Log processing
- Cannot be a boot volume
COLD:
- Throughput-oriented storage for large volumes of data that is infrequently accessed
- Scenarios where the lowest storage cost is important
- Cannot be a boot volume

Question 20

Q

Elastic Volume benefits

Answer

A

Can be done with no downtime, performance impact, changes to application.
Create the volume with capacity/performance needed to deploy b/c you can always change later.
Saves hours of planning cycles and prevents overprovisioning.

Question 21

Q

EBS Snapshot

Answer

A

Point in time snapshot of EBS volumes
Backed up to S3 for long-term durability.
Volume does not need to be attached to a running instance to take a snapshot.
Snapshots are incremental back ups, only the blocks that have changed are updated, making it much more cost-effective way to store block data
When deleting, EBS will retain the most recent snapshot to restore from.
EBS determines which dependent snapshots can be deleted to ensure that all other snapshots will still work.

Question 22

Q

Elastic Volume

Answer

A

Allows you to increase capacity dynamically, tune performance, and change the type of volume live.
Feature of EBS.
Can be done with no downtime, performance impact, changes to application.

Question 23

Q

EBS Optimization

Answer

A

Remember EBS volumes are network-attached (not attached directly to the host like instance stores)
On instances WITHOUT support from EBS-optimized throughput, network traffic can contend with traffic b/n your instance and your amazon EBS volumes.
EBS-optimized instances, these two types of traffic are separated.
Some instance configurations incur an extra cost for using Amazon EBS-optimized, while other are always EBS-optimized, at no extra cost.

Question 24

Q

EBS Encryption

Answer

A

For simplified DATA encryption, create encrypted EBS volumes with EBS Encryption feature
All EBS volume types support encryption.
EBS uses 256-bit Advanced Encryption Standard (AES-256) algorthims and Amazon-managed Amazon Key Management Service (AWS KMS).

Question 25

Q

EBS encryption options

Answer

A

Use AWS KMS-generated key OR
Can choose to select Customer Master Key (CMK) that you create separately using AWS KMS.
Can also encrypt files prior to placing them on volume.
Snapshots of EBS volumes are automatically encrypted. (As are any restorations from snapshots.)

Question 26

Q

EBS Performance Best Practices*

Answer

A

Use EBS-optimized Instances.
- Dedicated throughput makes volume performance more predicable and consistent.
- EBS volume network traffic won’t compete with your other instance traffic b/c they are separated in EBS-optimized.
Understand how performance is calculated.
- Must understand the units of measures involved and how performance is calculated.
Understand your workload.
- Relationship between maximum performance of EBS volumes, size and number of I/O operations, and teh time it takes for each action to complete.
- Each of these factors affects the others and different applications are more sensitive to one factor or another.
Be Aware- performance penalty when initializing volumes from snapshots.
- (New EBS volumes receive their maximum performance the moment they are available and do not require initialization.)
- aka Initialization

Question 27

Q

EBS Workload implications

Answer

A

One of the EBS Performance best practices.
On a given volume configuration, certain I/O characteristics drive the performance behavior for your Amazon EBS volumes.
SSD-backed volumes, General purpose SSD, Provisioned IOPS SSD = consistent performance whether an I/O operation is random or sequential.
HDD-backed volumes, Throughput-Optimized HDD, and Cold HDD deliver optimal performance only when I/O operations are large and sequential

Question 28

Q

EBS Workload theory

Answer

A

To understand how SSD and HDD backed volumes will perform, must understand the connection between:
- demand on the volume
- the quantity of IOPS available to it
- the time it takes for an I/O operation to complete
- volume’s throughput limits

Question 29

Q

Factors that can degrade HDD performance

Answer

A

When you create a snapshot of a Throughput-optimized HDD or Cold HDD volume, performance may drop as far as the volume’s baseline while the snapshot is in progress
Specific only to these volume types
Other factors that can limit performance
- driving more throughput than the instance can support
- performance penalty encountered when initializing volumes restored from a snapshot
- excessive amounts of small, random I/O on the volume

Question 30

Q

How to increase read-ahead for high-throughput, read-heavy workloads

Answer

A

If your workload is read-heavy and accesses the block device through the operation system page cache (ex: from file system)
To achieve max. throughput, recommended that you configure the read-ahead setting to 1MiB.
This is a per-block-device setting that should be applied ONLY to your HDD volumes.

Question 31

Q

How to maximize utilization of instance resources.

Answer

A

Use RAID 0.
Some instance types can drive more I/O throughput than what you can provision in a single EBS volume. Can join multiple volumes of certain instance types together in a RAID 0 configuration.
This will use the available bandwidth of these instances.

Question 32

Q

EBS Troubleshooting - If using EBS volume as a boot volume and your instance is no longer accessible, what do you do?

Answer

A

If you are using as boot volume, instance is no longer accessible.
- Can’t use SSH or RDP to access boot volume.
However, can use these steps to access volume:
- If you have an EC2 instance based on an AMI, you can choose to terminate the instance and create a new one.
If you need access to that EBS boot volume, do these steps to make accessible:
- Create a new EC2 instance with it’s own boot volume (a micro instance is great for this)
- Detach the root EBS from the troubled instance.
- Attach the root EBS volume from the troubled instance to your new EC2 instance as a secondary volume.
- Connect the new EC2 instance, and access the files on the secondary volume.

Question 33

Q

AMI

Answer

A

Amazon Machine Image
Provides the information required to launch an instance.
Must specify an AMI when you launch an instance.
Can launch multiple instances from a single AMI (when you need multiple instances with the same configurations).
Can use different AMIS to launch instances when you need different configurations.
AMI includes:
- One or more EBS snapshots or (for instance-store-backed AMIs) a template for the root volume of the instance (ex: an OS, application server and applications)
- Launch permissions that control which AWS accounts can use the AMI to launch instances.
- A block device mapping that specifies the volumes to attach to the instance when it’s launched.

Question 34

Q

Instance Store

Answer

A

Another type of block storage available to your EC2 instance for short-lived storage.
Provides TEMPORARY block-level storage.
Storage is located on disks that are physically attached to the host computer. (Unlike EBS volumes which are network attached)
Does not persist if the instance fails or is terminated.
Because it is on the host computer of EC2 instance, will provide the lowest-latency storage to your instance (other than RAM).
Used when incurring large amounts of I/O for your application for lowest possible latency.
MUST ensure you have another source of truth for your data and that the only copy is NOT place in instance store!
For durable data, EBS volumes are recommended.

Question 35

Q

When is your data a candidate for the EC2 instance store?

Answer

A

If your data does NOT need to be resilient to reboots, restarts, or auto recovery.
But, exercise caution.

Question 36

Q

Instance Store Volumes - available instance types.

Answer

A

Not all instance types come with available instance store volume(s).
The size and type of volume vary by instance type.
When you launch an instance, the instance store is available at not additional cost (depending on instance type).
However, must enable these volumes when you launch EC2 instance b/c you cannot add instance store volumes to EC2 instance after launch.

Question 37

Q

When is an Instance Store volume available to the EC2 instance?

Answer

A

After you launch an instance, the storage volume is available.
However, you cannot access them until they are mounted.

Question 38

Q

ADDITIONAL INFORMATION:

1. learn more about how to mount EBS volumes on different OS.

Question 39

Q

Using both EBS and Instance Store data with instances.

Answer

A

Many customers use a combination of EBS volumes and Instance Store.
Ex: May want to put scratch data, tempdb, or other temporary files on instance store while your root volume is on EBS.
NEVER use instance store for any production data.

Question 40

Q

Instance Store-backed EC2 instances

Answer

A

Can have your instance boot of instance store, however, would want to configure so you are using an AMI and that new instances will be created if one fails.
NOT recommended for primary instances (where uses would have issues if instance fails)
But this configuration can save money on storage costs instead of using EBS as your boot volume in cases where your system is configured to e resilient to instances re-launching.
Must understand application and infrastructure needs before choosing to use instance store-backed EC2 instances. Choose carefully!
- EC2 Instance Store-backed instances CANNOT be stopped or take advantage of auto-recovery feature of EC2 instances.
It is possible to build instances on the fly that are completely resilient to reboot, relaunch or failure and use instance-store as their root volume. (But requires due diligence regarding your application and infrastructure to ensure this scenario would work for you.)

Question 41

Q

S3

Answer

A

Allows you to build web applications, delivering content to users by retrieving data via API calls over the internet.
Storage for the internet.
Simple storage service offers developers highly scalable, reliable, and low-latency data storage infrastructure at low cost.

Question 42

Q

Bucket Limitations (in S3)

Answer

A

*Do not use buckets as folders (b/c there is a 100 bucket limit per account)
Cannot create a bucket within another bucket.
Bucket is owned by the AWS account that created it.
Bucket ownership is NOT transferable.
A bucket must be empty before you can delete it.
After a bucket is deleted, that name becomes available for reuse.
However, you might not be able to reuse the name if someone else has taken the name after you release it when deleting the bucket
If you expect to reuse the bucket, do not delete it.

Question 43

Q

Universal Namespace (buckets)

Answer

A

A bucket name must be unique across all existing bucket names in S3 across ALL of AWS. (Not just within your account or AWS Region.)
Must comply with DNS naming conventions when choosing a bucket name.

Question 44

Q

DNS - compliant bucket name rules

Answer

A

Must be at least 3 and no more than 63 characters long.
Must consist of a series of one or more labels, with adjacent labels separated by a single period.
Must contain lowercase letters, numbers, and hyphens.
Each label must start and end with a lowercase letter or number
Must not be formatted like IP addresses
AWS recommends that you do not use periods in bucket names. (B/c when using virtual hosted-style buckets with SSL, the SSL wildcard certificate only matches buckets that do not have periods.)
- To work around this use HTTP or write your own certificate verification logic.

Question 45

Q

Create a bucket using Java - code snippet

Answer

A

private static String bucketName     = "*** bucket name ***";
public static void main(String[] args) throws IOException {
AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
s3client.setRegion(Region.getRegion(Regions.US_WEST_1));
if(!(s3client.doesBucketExist(bucketName))){
     // Note that CreateBucketRequest does not specify region. So bucket is
     // created in the region specified in the client.
     s3client.createBucket(new CreateBucketRequest(bucketName));
     }

// Get location.
String bucketLocation = s3client.getBucketLocation(new GetBucketLocationRequest (bucketName));
System.out.println("bucket location = " + bucketLocation);

Question 46

Q

When to use versioning

Answer

A

To preserve, retrieve, and restore every version of every object stored in your S3 bucket.
- including recovering deleted objects.
With versioning, can easily recover from both unintended user actions and application failures.
Versioning is turned OFF by default.

Question 47

Q

Reasons a developer would turn on versioning of files in S3.

Answer

A

Protecting from accidental deletion.
Recovering an earlier version.
Retrieving deleted objects.

Question 48

Q

How to retrieve any particular object in a versioned bucket.

Answer

A

Perform a GET on the object key name and the particular version.
S3 versioning tracks changes over time.

Question 49

Q

How does S3 versioning protect against unintended deletes?

Answer

A

If you issue a delete command against an object in a versioned bucket, AWS places a delete marker at the top of that object.
Then when you perform a GET on it, you’ll get an error since the object does not exists.
However, an administrator or someone with the necessary permissions, could remove the delete marker and access the data.
When a delete request is issued against a versioned bucket on a particular object, S3 retains the data but removes access for users to retrieve that data.
Can also be MFA delete-enabled for an additional security layer.

Question 50

Q

T/F - Versioning is turned off by default?

Question 51

Q

How many objects can you store within S3?

Answer

A

Unlimited
But, an object size can only be between 1 byte to 5TB
If you have an object larger than 5TB, use a file splitter and upload in chunks to S3 (reassemble later if you download for later use)

Question 52

Q

Largest object that can be uploaded in a single PUT

Answer

A

5GB
For objects larger than 100MB, should consider using multipart upload
(Anything larger than 5GB, you must use a multipart upload)

Question 53

Q

Object facets

Answer

A

Key
VersionID
Value
Metadata
Subresources
Access Control Information

Question 54

Q

Key (object facet)

Answer

A

Name that you assign to an object, may include a simulated folder structure.
Each key mus be unique within a bucket (unless versioning is turned on)
S3 URLs are a basic data map between “bucket + key + version” and the webservice endpoint.
Ex: URL http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, doc is the name of the bucket and 2006-03-01/AmazonS3.wsdl is the key.

Question 55

Q

VersionID (object facet)

Answer

A

Within a bucket, a key and versionID uniquely identify an object.
If versioning is turned on you have multiple versions of a stored object.

Question 56

Q

Value (object facet)

Answer

A

Actual content you are storing.
Can be any sequence of bytes.
Objects can range in size from 1byte -> 5TB.

Question 57

Q

Metadata (object facet)

Answer

A

Set of name-value pairs with which you can store information regarding the object.
Can assign metadata (referred to as user-defined metadata) to your objects in S3.
S3 also assigns system metadata to manage these objects.

Question 58

Q

Subresources (object facets)

Answer

A

S3 uses this sub-resource mechanism to store additional object-specific information.
Subordinates to objects so they are always associated with some other entity like a bucket or object (which it uses for managing objects)
Ex: ACL and Torrent

Question 59

Q

ACL

Answer

A

Access Control List
A list of grants identifying the grantees and the permissions they granted.
A Type of subresource associated with S3 objects.
Resource-based

Question 60

Q

Torrent

Answer

A

Returns the torrent file associated with the specific object.
A type of subresource associated with S3 objects.

Question 61

Q

Resource-based v. user-based access control

Answer

A

Resource-based:
- ACL
- bucket policies
User-based

Question 62

Q

How many tags can you associate with an object?

Answer

A

10 tags to an object
- each tag associated with an object must have unique tag keys.
Tag key can be up to 128 unicode characters long
Tag values can be up to 256 unicode characters long