AWS Hello, Storage Concepts Flashcards

1
Q

Data Dimension

A
  1. 3 V’s of big data.
  2. Consider the storage mechanism most suitable for a particular workload. NOT a single data store for the entire system.
    • Right tool for the right job
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Highly structured data

A
  1. Has a pre-defined schema.
  2. Ex: Relational database
  3. Each entity of the same type has the same number of attributes and the domain of allowed values for an attribute can be further constrained.
  4. Advantages: self-described nature
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Loosely structured data

A
  1. Has entities, which have attribute / fields
  2. Field uniquely identifies an entity
  3. However, attributes are not required to be the same in every entity
  4. Result: data more difficult to analyst and process in an automated fashion. Higher burden of reasoning about the data on the consumer or application.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unstructured data

A
  1. Does not have sense or structure.
  2. No entities or attributes
  3. Can contain useful information.
  4. Result: any useful information must be extracted from consumer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

BLOB data

A
  1. Useful as a whole
  2. But little benefit trying to extract value from a piece or attribute.
  3. Result: systems that store BLOB treat as a “black box” to store/retrieve as a whole.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Temperature

A
  1. Another useful way to look at data to determine the right storage for application
  2. Helps us to understand how “lively” data is (how much is being written/read and how soon it needs to be available)
  3. Ex: Hot, Warm, Cold, Frozen
  4. The same data can start hot and gradually cool.
  5. When this happens, tolerance of read latency increases as does data set size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data value

A
  1. Some data must be preserved at all costs, other data can be easily regenerated or even lost without significant impact.
  2. Value of data will impact the investment in durability.
    3.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data value tip!

A
  1. To optimize cost and/or performance further, segment data within each workload by value and temperature, and consider different data storage options for different segments.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data dimensions tip!

A
  1. Think in terms of a data storage mechanism that is most suitable for a particular workload - not a single data store for the entire system. Choose the right tool for the job.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Storage tip - One size does not fit all!

A
  1. Know the availablity, level of durability, and cost factors for each storage option and how they compare.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

AWS Shared Responsibility Model and Storage

A
  1. AWS: responsible for securing the storage services
  2. Developer/customer: responsible for securing access to and using encryption on artifacts you create/store.
  3. Best practice to always use principle of least privilege.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CIA model

A
  1. Confidentiality, Integrity, Availablity forms the fundamentals of information security. These should be applied to AWS storage.
  2. Availablity (1) sits on top of Integrity (2) and Confidentiality (3) to form “Information Security”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

EBS characteristics

A
  1. EBS presents data to EC2 instance as a disk volume.
  2. Provides lowest-latency access to your data from single EC2 instances.
  3. EBS provides durable, persistent block storage volume for use with EC2 instances.
  4. Automatically replicated within AZ (offering high availablity and durability)
  5. Offers consitent low-latency performance.
  6. Can scale up and down within minutes. Pay for what you provision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Typical use cases for EBS

A
  1. Boot volumes on EC2 instances
  2. Relational / NoSql databases
  3. Steam and log processing aapplications
  4. Data warehousing applications
  5. Big data analytics engines (Hadoop) and Amazon EMR clusters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

EBS designed to achieve:

A
  1. Availablity 99.999%
  2. Durability of replication within a single AZ.
  3. Annual failure rate (AFR) between 0.1 - 0.2 percent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

EBS Volume attributes

A
  1. Persist independently from the running life of an EC2 instance. (After EBS is attached to an instance, use it like any other physical hard drive.)
  2. Very flexible. (Current generation volumes attached to current generation instance types, can dynamically increase size, modify provisioned input/output operations per second (OPS) capacity, and change the volume type on live production volumes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

EBS Volume types

A
  1. SSD-backed volumes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

SDD Use Cases

A
  1. GENERAL PURPOSE: recommended for most workloads.
    - System boot volumes.
    - Virtual Desktops
    - Low-latency interactive.
    - Apps.
    - Development and test environments.
  2. PROVISIONED IOPS:
    - I/O intensive workloads
    - Relational DBs
    - NoSql DBs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

HDD Use Cases

A
  1. THROUGHPUT-OPTIMIZED:
    - Streaming workloads requiring consistent, fast throughput at a low price
    - Big data
    - Data warehouse
    - Log processing
    - Cannot be a boot volume
  2. COLD:
    - Throughput-oriented storage for large volumes of data that is infrequently accessed
    - Scenarios where the lowest storage cost is important
    - Cannot be a boot volume
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Elastic Volume benefits

A
  1. Can be done with no downtime, performance impact, changes to application.
  2. Create the volume with capacity/performance needed to deploy b/c you can always change later.
  3. Saves hours of planning cycles and prevents overprovisioning.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

EBS Snapshot

A
  1. Point in time snapshot of EBS volumes
  2. Backed up to S3 for long-term durability.
  3. Volume does not need to be attached to a running instance to take a snapshot.
  4. Snapshots are incremental back ups, only the blocks that have changed are updated, making it much more cost-effective way to store block data
  5. When deleting, EBS will retain the most recent snapshot to restore from.
  6. EBS determines which dependent snapshots can be deleted to ensure that all other snapshots will still work.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Elastic Volume

A
  1. Allows you to increase capacity dynamically, tune performance, and change the type of volume live.
  2. Feature of EBS.
  3. Can be done with no downtime, performance impact, changes to application.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

EBS Optimization

A
  1. Remember EBS volumes are network-attached (not attached directly to the host like instance stores)
  2. On instances WITHOUT support from EBS-optimized throughput, network traffic can contend with traffic b/n your instance and your amazon EBS volumes.
  3. EBS-optimized instances, these two types of traffic are separated.
  4. Some instance configurations incur an extra cost for using Amazon EBS-optimized, while other are always EBS-optimized, at no extra cost.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

EBS Encryption

A
  1. For simplified DATA encryption, create encrypted EBS volumes with EBS Encryption feature
  2. All EBS volume types support encryption.
  3. EBS uses 256-bit Advanced Encryption Standard (AES-256) algorthims and Amazon-managed Amazon Key Management Service (AWS KMS).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

EBS encryption options

A
  1. Use AWS KMS-generated key OR
  2. Can choose to select Customer Master Key (CMK) that you create separately using AWS KMS.
  3. Can also encrypt files prior to placing them on volume.
  4. Snapshots of EBS volumes are automatically encrypted. (As are any restorations from snapshots.)
26
Q

EBS Performance Best Practices*

A
  1. Use EBS-optimized Instances.
    - Dedicated throughput makes volume performance more predicable and consistent.
    - EBS volume network traffic won’t compete with your other instance traffic b/c they are separated in EBS-optimized.
  2. Understand how performance is calculated.
    - Must understand the units of measures involved and how performance is calculated.
  3. Understand your workload.
    - Relationship between maximum performance of EBS volumes, size and number of I/O operations, and teh time it takes for each action to complete.
    - Each of these factors affects the others and different applications are more sensitive to one factor or another.
  4. Be Aware- performance penalty when initializing volumes from snapshots.
    - (New EBS volumes receive their maximum performance the moment they are available and do not require initialization.)
    - aka Initialization
27
Q

EBS Workload implications

A
  1. One of the EBS Performance best practices.
  2. On a given volume configuration, certain I/O characteristics drive the performance behavior for your Amazon EBS volumes.
  3. SSD-backed volumes, General purpose SSD, Provisioned IOPS SSD = consistent performance whether an I/O operation is random or sequential.
  4. HDD-backed volumes, Throughput-Optimized HDD, and Cold HDD deliver optimal performance only when I/O operations are large and sequential
28
Q

EBS Workload theory

A
  1. To understand how SSD and HDD backed volumes will perform, must understand the connection between:
    - demand on the volume
    - the quantity of IOPS available to it
    - the time it takes for an I/O operation to complete
    - volume’s throughput limits
29
Q

Factors that can degrade HDD performance

A
  1. When you create a snapshot of a Throughput-optimized HDD or Cold HDD volume, performance may drop as far as the volume’s baseline while the snapshot is in progress
  2. Specific only to these volume types
  3. Other factors that can limit performance
    - driving more throughput than the instance can support
    - performance penalty encountered when initializing volumes restored from a snapshot
    - excessive amounts of small, random I/O on the volume
30
Q

How to increase read-ahead for high-throughput, read-heavy workloads

A
  1. If your workload is read-heavy and accesses the block device through the operation system page cache (ex: from file system)
  2. To achieve max. throughput, recommended that you configure the read-ahead setting to 1MiB.
  3. This is a per-block-device setting that should be applied ONLY to your HDD volumes.
31
Q

How to maximize utilization of instance resources.

A
  1. Use RAID 0.
  2. Some instance types can drive more I/O throughput than what you can provision in a single EBS volume. Can join multiple volumes of certain instance types together in a RAID 0 configuration.
  3. This will use the available bandwidth of these instances.
32
Q

EBS Troubleshooting - If using EBS volume as a boot volume and your instance is no longer accessible, what do you do?

A
  1. If you are using as boot volume, instance is no longer accessible.
    - Can’t use SSH or RDP to access boot volume.
  2. However, can use these steps to access volume:
    - If you have an EC2 instance based on an AMI, you can choose to terminate the instance and create a new one.
  3. If you need access to that EBS boot volume, do these steps to make accessible:
    - Create a new EC2 instance with it’s own boot volume (a micro instance is great for this)
    - Detach the root EBS from the troubled instance.
    - Attach the root EBS volume from the troubled instance to your new EC2 instance as a secondary volume.
    - Connect the new EC2 instance, and access the files on the secondary volume.
33
Q

AMI

A
  1. Amazon Machine Image
  2. Provides the information required to launch an instance.
  3. Must specify an AMI when you launch an instance.
  4. Can launch multiple instances from a single AMI (when you need multiple instances with the same configurations).
  5. Can use different AMIS to launch instances when you need different configurations.
  6. AMI includes:
    - One or more EBS snapshots or (for instance-store-backed AMIs) a template for the root volume of the instance (ex: an OS, application server and applications)
    - Launch permissions that control which AWS accounts can use the AMI to launch instances.
    - A block device mapping that specifies the volumes to attach to the instance when it’s launched.
34
Q

Instance Store

A
  1. Another type of block storage available to your EC2 instance for short-lived storage.
  2. Provides TEMPORARY block-level storage.
  3. Storage is located on disks that are physically attached to the host computer. (Unlike EBS volumes which are network attached)
  4. Does not persist if the instance fails or is terminated.
  5. Because it is on the host computer of EC2 instance, will provide the lowest-latency storage to your instance (other than RAM).
  6. Used when incurring large amounts of I/O for your application for lowest possible latency.
  7. MUST ensure you have another source of truth for your data and that the only copy is NOT place in instance store!
  8. For durable data, EBS volumes are recommended.
35
Q

When is your data a candidate for the EC2 instance store?

A
  1. If your data does NOT need to be resilient to reboots, restarts, or auto recovery.
  2. But, exercise caution.
36
Q

Instance Store Volumes - available instance types.

A
  1. Not all instance types come with available instance store volume(s).
  2. The size and type of volume vary by instance type.
  3. When you launch an instance, the instance store is available at not additional cost (depending on instance type).
  4. However, must enable these volumes when you launch EC2 instance b/c you cannot add instance store volumes to EC2 instance after launch.
37
Q

When is an Instance Store volume available to the EC2 instance?

A
  1. After you launch an instance, the storage volume is available.
  2. However, you cannot access them until they are mounted.
38
Q

ADDITIONAL INFORMATION:

1. learn more about how to mount EBS volumes on different OS.

A

TBD

39
Q

Using both EBS and Instance Store data with instances.

A
  1. Many customers use a combination of EBS volumes and Instance Store.
  2. Ex: May want to put scratch data, tempdb, or other temporary files on instance store while your root volume is on EBS.
  3. NEVER use instance store for any production data.
40
Q

Instance Store-backed EC2 instances

A
  1. Can have your instance boot of instance store, however, would want to configure so you are using an AMI and that new instances will be created if one fails.
  2. NOT recommended for primary instances (where uses would have issues if instance fails)
  3. But this configuration can save money on storage costs instead of using EBS as your boot volume in cases where your system is configured to e resilient to instances re-launching.
  4. Must understand application and infrastructure needs before choosing to use instance store-backed EC2 instances. Choose carefully!
    - EC2 Instance Store-backed instances CANNOT be stopped or take advantage of auto-recovery feature of EC2 instances.
  5. It is possible to build instances on the fly that are completely resilient to reboot, relaunch or failure and use instance-store as their root volume. (But requires due diligence regarding your application and infrastructure to ensure this scenario would work for you.)
41
Q

S3

A
  1. Allows you to build web applications, delivering content to users by retrieving data via API calls over the internet.
  2. Storage for the internet.
  3. Simple storage service offers developers highly scalable, reliable, and low-latency data storage infrastructure at low cost.
42
Q

Bucket Limitations (in S3)

A
  1. *Do not use buckets as folders (b/c there is a 100 bucket limit per account)
  2. Cannot create a bucket within another bucket.
  3. Bucket is owned by the AWS account that created it.
  4. Bucket ownership is NOT transferable.
  5. A bucket must be empty before you can delete it.
  6. After a bucket is deleted, that name becomes available for reuse.
  7. However, you might not be able to reuse the name if someone else has taken the name after you release it when deleting the bucket
  8. If you expect to reuse the bucket, do not delete it.
43
Q

Universal Namespace (buckets)

A
  1. A bucket name must be unique across all existing bucket names in S3 across ALL of AWS. (Not just within your account or AWS Region.)
  2. Must comply with DNS naming conventions when choosing a bucket name.
44
Q

DNS - compliant bucket name rules

A
  1. Must be at least 3 and no more than 63 characters long.
  2. Must consist of a series of one or more labels, with adjacent labels separated by a single period.
  3. Must contain lowercase letters, numbers, and hyphens.
  4. Each label must start and end with a lowercase letter or number
  5. Must not be formatted like IP addresses
  6. AWS recommends that you do not use periods in bucket names. (B/c when using virtual hosted-style buckets with SSL, the SSL wildcard certificate only matches buckets that do not have periods.)
    - To work around this use HTTP or write your own certificate verification logic.
45
Q

Create a bucket using Java - code snippet

A
private static String bucketName     = "*** bucket name ***";
public static void main(String[] args) throws IOException {
AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
s3client.setRegion(Region.getRegion(Regions.US_WEST_1));
if(!(s3client.doesBucketExist(bucketName))){
     // Note that CreateBucketRequest does not specify region. So bucket is
     // created in the region specified in the client.
     s3client.createBucket(new CreateBucketRequest(bucketName));
     }
// Get location.
String bucketLocation = s3client.getBucketLocation(new GetBucketLocationRequest (bucketName));
System.out.println("bucket location = " + bucketLocation);
46
Q

When to use versioning

A
  1. To preserve, retrieve, and restore every version of every object stored in your S3 bucket.
    - including recovering deleted objects.
  2. With versioning, can easily recover from both unintended user actions and application failures.
  3. Versioning is turned OFF by default.
47
Q

Reasons a developer would turn on versioning of files in S3.

A
  1. Protecting from accidental deletion.
  2. Recovering an earlier version.
  3. Retrieving deleted objects.
48
Q

How to retrieve any particular object in a versioned bucket.

A
  1. Perform a GET on the object key name and the particular version.
  2. S3 versioning tracks changes over time.
49
Q

How does S3 versioning protect against unintended deletes?

A
  1. If you issue a delete command against an object in a versioned bucket, AWS places a delete marker at the top of that object.
  2. Then when you perform a GET on it, you’ll get an error since the object does not exists.
  3. However, an administrator or someone with the necessary permissions, could remove the delete marker and access the data.
  4. When a delete request is issued against a versioned bucket on a particular object, S3 retains the data but removes access for users to retrieve that data.
  5. Can also be MFA delete-enabled for an additional security layer.
50
Q

T/F - Versioning is turned off by default?

A
  1. True.
51
Q

How many objects can you store within S3?

A
  1. Unlimited
  2. But, an object size can only be between 1 byte to 5TB
  3. If you have an object larger than 5TB, use a file splitter and upload in chunks to S3 (reassemble later if you download for later use)
52
Q

Largest object that can be uploaded in a single PUT

A
  1. 5GB
  2. For objects larger than 100MB, should consider using multipart upload
  3. (Anything larger than 5GB, you must use a multipart upload)
53
Q

Object facets

A
  1. Key
  2. VersionID
  3. Value
  4. Metadata
  5. Subresources
  6. Access Control Information
54
Q

Key (object facet)

A
  1. Name that you assign to an object, may include a simulated folder structure.
  2. Each key mus be unique within a bucket (unless versioning is turned on)
  3. S3 URLs are a basic data map between “bucket + key + version” and the webservice endpoint.
  4. Ex: URL http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, doc is the name of the bucket and 2006-03-01/AmazonS3.wsdl is the key.
55
Q

VersionID (object facet)

A
  1. Within a bucket, a key and versionID uniquely identify an object.
  2. If versioning is turned on you have multiple versions of a stored object.
56
Q

Value (object facet)

A
  1. Actual content you are storing.
  2. Can be any sequence of bytes.
  3. Objects can range in size from 1byte -> 5TB.
57
Q

Metadata (object facet)

A
  1. Set of name-value pairs with which you can store information regarding the object.
  2. Can assign metadata (referred to as user-defined metadata) to your objects in S3.
  3. S3 also assigns system metadata to manage these objects.
58
Q

Subresources (object facets)

A
  1. S3 uses this sub-resource mechanism to store additional object-specific information.
  2. Subordinates to objects so they are always associated with some other entity like a bucket or object (which it uses for managing objects)
  3. Ex: ACL and Torrent
59
Q

ACL

A
  1. Access Control List
  2. A list of grants identifying the grantees and the permissions they granted.
  3. A Type of subresource associated with S3 objects.
  4. Resource-based
60
Q

Torrent

A
  1. Returns the torrent file associated with the specific object.
  2. A type of subresource associated with S3 objects.
61
Q

Resource-based v. user-based access control

A
  1. Resource-based:
    - ACL
    - bucket policies
  2. User-based
62
Q

How many tags can you associate with an object?

A
  1. 10 tags to an object
    - each tag associated with an object must have unique tag keys.
  2. Tag key can be up to 128 unicode characters long
  3. Tag values can be up to 256 unicode characters long