AWS Hello, Storage Definitions Flashcards

1
Q

3 types of AWS storage

A
  1. Block: EBS (persistent), EC2 Instance Store (ephemeral)
  2. File: EFS
  3. Object: S3, S3 Glacier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

EBS

A
  1. Elastic Block Store
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

EFS

A
  1. Elastic File System
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

S3

A
  1. Amazon Simple Storage Solution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 V’s of big data

A
  1. Velocity
  2. Variety
  3. Volume
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Velocity

A
  1. Speed at which data is being read/written
  2. Measured in RPS (reads per second) or
  3. Measured in WPS (writes per second)
  4. Can be based on batch processing, periodic, near real time, or real time speed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variety

A
  1. Determines how structured the data is AND
  2. How many different structures exist in the data.
  3. Ex: Highly structured -> loosely structured, unstructured, or BLOB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

BLOB

A
  1. Binary large object data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Volume

A
  1. Total size of dataset.
  2. Typical metrics that measure availability of data store to support volume are:
    - maximum storage and cost
  3. Ex: $/GB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hot data

A
  1. Actively worked on (new ingests, updates, transformations)
  2. Read and writes tend to be single item.
  3. Items tend to be small (up to hundreds of kilobytes)
  4. Speed of access = essential
  5. Tends to be high velocity + low volume
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Warm data

A
  1. Still being actively accessed (less frequent than hot)
  2. Items can be small like hot, but are updated and read in sets.
  3. Speed of access while important is less than hot.
  4. More balanced across velocity and volume dimensions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cold data

A
  1. Still needs occaisional access.
  2. Updates to data are rare
  3. Reads can tolerate higher latency
  4. Items tend to be large (tens of hundreds of mega/giga bytes)
  5. Often written / read individually.
  6. High durability, low cost = essential
  7. High volume and low velocity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Frozen data

A
  1. Needs to be preserved for business continuity / archival / regulatory reasons.
  2. Not actively worked on.
  3. New data can be regularly added to data store, existing data is NEVER updated.
  4. Reads are very infrequent (“write once, read never”)
  5. Can tolerate high latency.
  6. Very high volume, very low-velocity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Transient data

A
  1. Usually short-lived.
  2. Loss of a subset of transient data does not have a big impact on system.
  3. Ex: clickstream or Twitter data.
  4. Usually don’t need high durability of this data (b/c we expect it to be quickly consumed, yielding higher value data)
  5. Note: not all streaming data is transient. (ex: intrusion alert system)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reproducible data

A
  1. Contains a copy of useful information that is often created to improve performance or simplify consumption.
  2. Ex: adding more structure or altering structure to match consumption patterns.
  3. Loss of some or all this data may affect system’s performance or availablity.
  4. Not result in data loss (b/c it’s reproducible)
  5. Ex: Data warehouse data, read replicas of OLTP, many types of caches.
  6. Invest a bit of durability (to reduce impact on system’s performance/ availablity) but only to a point.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

OLTP

A
  1. Online transaction processing systems.
  2. Category of data processing focused on transaction-oriented tasks.
  3. Usually Inserting, Updating, Deleting small amounts of data in a database.
  4. Mainly deals with large numbers of transactions by large number of users.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Authoritative data

A
  1. Source of truth.
  2. Losing it will significantly impact business b/c difficult/impossible to restore or replace.
  3. Willing to invest additional durability. More important, more durability desired.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Critical/Regulated data

A
  1. Business must retain at any cost.
  2. Tends to be stored for longer periods of time.
  3. Needs to be protected from accidental or malicious changes, not just data loss.
  4. In addition to durability, cost and security are equally important.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

ERP

A
  1. Enterprise resource planning systems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Block storage

A
  1. Offer low latency, high performance workloads.
  2. Analogous to DAS (direct-attached storage) or SAN (storage area network).
  3. Ex: EC2 and EBS.
  4. ERPs are a good example of an enterprise application that requires dedicated, low-latency storage for each host.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DAS

A
  1. Direct-attached storage

2. Analogous to Block storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

SAN

A
  1. Storage Area Network
  2. Analogous to Block Storage.
  3. Computer network which provides access to consolidated, block-level storage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Object storage

A
  1. Ideal for building modern applications from scratch that require scale and flexibility.
  2. Can be used to import existing data stores for analytics, backup, or archive.
  3. Cloud storage makes it possible to store virtually limitless data in native format.
  4. Ex: S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

File storage

A
  1. For applications that need access to shared files and require a file system.
  2. Ideal for large content repositories, development environments, media stores, user home directors.
  3. Often supported with NAS (network-attached storage) server
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

NAS

A
  1. Network-attached storage server usually supports File Storage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Confidentiality

A
  1. Equated to privacy level of your data.
  2. Refers to levels of encryption or access policies for your storage / files.
  3. Limit access to prevent accidental information disclosure by restricting access and enabling encryption.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Integrity

A
  1. Refers to whether your data is trustworthy and accurate.
  2. Ex: Are you sure the file you generated has not been changed when audited later?
  3. Tip - restrict permission of who can modify data.
  4. Tip - Enable backup and versioning.
28
Q

Availablity

A
  1. Refers to Availablity of a service on AWS for storage, where an authorized party can gain reliable access to the resource.
  2. Tip - restrict permission of who can delete data.
  3. Tip - enable MFA for S3 delete operation.
  4. Tip - enable backup and versioning.
29
Q

AFR

A
  1. Annual Failure Rate
  2. EBS are 20x more reliable than typical commodity disk drives (AFR around 4%)
  3. EBS AFR 0.1 - 0.2 %
30
Q

IOPS

A
  1. Input/output operations per second.
  2. Common performance measurement used to benchmark computer storage devices like hard disk drives (HDD) and solid state drives (SSD) and storage area network (SAN)
31
Q

HDD

A
  1. Hard disk drive
  2. HDD backed volumes are optimized for large streaming workloads where throughput (measured in MiB/s) is a better performance measure than IOPS.
32
Q

SSD

A
  1. Solid state drives
  2. SSD backed volumes are optimized for transactional workloads involving frequent read/write operations with small I/O size
  3. Where dominant performance attribute is IOPS.
  4. Newer, faster type of device that stores data on instantly-accessible memory chips (than HDD)
33
Q

MiB/s

A
  1. Mebibyte per second

2. Unit of data transfer rate = 1,048,576 bits per second

34
Q

AES-256

A
  1. 256-bit Advanced Encryption Standard
  2. An algorithm used in EBS encryption
  3. Encryption occurs on the server that hosts the EC2 instance.
  4. This provides encryption of data in transit from EC2 instance to EBS Storage.
  5. This is used in conjunction with AWS KMS.
35
Q

AWS KMS

A
  1. AWS Key Management Service.
  2. Amazon-managed key infrastructure.
  3. Encryption occurs on the server that hosts the EC2 instance.
  4. This provides encryption of data in transit from EC2 instance to EBS Storage.
  5. This is used in conjunction with AES-256.
36
Q

CMK

A
  1. Customer master key.
  2. One of two options for EBS encryption key creation .
  3. AWS KMS will create the CMK if you choose this option. (Instead of creating a KMS generated key)
37
Q

Pre-warming / Initialization

A
  1. Pre-warming is the previous term for “initialization”
  2. The time it takes an EBS volume created from a snap shot before you can access the block.
  3. This preliminary time can cause a significant increase in latency of an I/O operation the first time each block is accessed.
  4. Performance returns after the data is accessed once
38
Q

Initialization process

A
  1. For most applications, it is ok to amortize the cost of initializing a volume from a snapshot over the lifetime of the application.
  2. If this is not acceptable, you can avoid a performance hit by accessing each block (thus absorbing the downtime) prior to putting the volume into production.
  3. This process = initialization.
39
Q

RAID 0

A
  1. Configuration that allows you to join certain types of instances together.
  2. Recommended to maximize utilization of instance resources.
40
Q

What is the the configuration to achieve maximum throughput for a block device?

A
  1. 1 MiB
  2. This is a per-block-device setting.
  3. Only apply to HDD volumes.
41
Q

RDP

A
  1. Remote Desktop Protocol
42
Q

AMI

A
  1. Amazon Machine Image
43
Q

EBS volume vs EC2 instance store

A
  1. EBS = Persistent
    - Location: NETWORK-attached
    - recommended for durable data
  2. EC2 = Temporary
    - Location: Disks which are PHYSICALLY attached to host computer
    - cannot be the only source of truth for your data
    - good for incurring large amounts of I/O at lowest possible latency
44
Q

T/F - You can add an instance store after an EC2 instance has been launched.

A
  1. False, it must be enabled when the EC2 instance is launched.
45
Q

T/F - Instance store provides the lowest-latency storage to your instance (other than RAM)

A
  1. True.
46
Q

Object

A
  1. Piece of data like a document, image, or video that is stored with some metadata in a flat structure.
  2. Object storage provides that data to applications via APIs over the internet.
47
Q

Metadata

A
  1. A set of data that describes and gives information about other data.
  2. “Data about data”
  3. Ex: descriptive, structural, administrative, reference, statistical.
48
Q

S3

A
  1. Simple Storage Service
49
Q

Bucket

A
  1. A container for objects stored in S3.
  2. Every object is contained in a bucket.
  3. Bucket is like a drive or volume in traditional terminology.
50
Q

T/F - It is a good idea to use buckets like folders in S3.

A
  1. False. This is not best practice as there is 100 bucket limit. (You could reach the limit as your application or data grows)
51
Q

DNS

A
  1. Domain Naming System (S3 bucket names must be in compliance)
52
Q

SSL / TSL

A
  1. Secure Sockets Layer
  2. It’s successor is TSL (Transport Layer Security)
  3. Protocols for establishing authenticated and encrypted links between computer networks
53
Q

T/F - Amazon bucket names must be universally unique.

A
  1. True
54
Q

Versioning

A
  1. Keeping multiple variants of an object in same bucket.
  2. When versioning is turned on, S3 will create new versions of your object every time you overwrite a particular object key.
  3. Every time you update an object with the same key, S3 will maintain a new version of it.
55
Q

Versioning- enabled buckets

A
  1. Let you recover objects from accidental deletion or overwrite.
  2. Bucket’s versioning configuration can also be MFA Delete-enabled for additional layer of security.
  3. If you overwrite an object, it results in a new object version in the bucket.
  4. You an always restore from a pervious version.
56
Q

Versioning and Lifecycle policies

A
  1. Can use versioning in combination with lifecycle policies to implement them if the object is the current or previous version.
  2. If concerned with building up of many versions and using space for a particular object, configure lifecycle policy that will delete the old version of the object after a certain period of time.
  3. Tip - Easy to set up lifecycle policy to control the amount of data that’s being retained when you use versioning on a bucket.
57
Q

How to discontinue versioning on a bucket

A
  1. Copy all of your objects to a new bucket that has versioning disabled and use that bucket moving forward.
  2. Tip - Can never return to an un-versioned state.
  3. But you can suspend versioning on the bucket.
58
Q

Cost implications of the versioned- enabled bucket

A
  1. Must calculate as though every version is a completely separate object that takes up the same space as the object itself.
  2. This may make this option cost prohibitive.
59
Q

Buckets in regions

A
  1. S3 creates buckets in region you specify.
  2. Can choose a region that is geographically close to optimize latency, minimize costs or address regulator requirements.
  3. Tip - Objects belonging to a bucket that you create in a specific AWS Region never leave that region unless you explicitly transfer them to another region.
60
Q

Python code- Create a bucket

A

import boto3

s3 = boto3.client(‘s3’)
s3.create_bucket(Bucket=’my-bucket’)

61
Q

Python code - Get list of all bucket names

A

import boto3

# Create an S3 client
s3 = boto3.client('s3')
# Call S3 to list current buckets
response = s3.list_buckets()
# Get a list of all bucket names from the response
buckets = [bucket['Name'] for bucket in response['Buckets']]
# Print out the bucket list
print("Bucket List: %s" % buckets)
62
Q

Java code - Delete a bucket

A
  1. Note: bucket must be empty before you delete it, unless you use a force parameter

import java.io.IOException;

import com.amazonaws.AmazonServiceException;
import com.amazonaws.SdkClientException;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.DeleteObjectRequest;

public class DeleteObjectNonVersionedBucket {

public static void main(String[] args) throws IOException {
    String clientRegion = "*** Client region ***";
    String bucketName = "*** Bucket name ***";
    String keyName = "*** Key name ****";

    try {
        AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
                .withCredentials(new ProfileCredentialsProvider())
                .withRegion(clientRegion)
                .build();
            s3Client.deleteObject(new DeleteObjectRequest(bucketName, keyName));
        }
        catch(AmazonServiceException e) {
            // The call was transmitted successfully, but Amazon S3 couldn't process
            // it, so it returned an error response.
            e.printStackTrace();
        }
        catch(SdkClientException e) {
            // Amazon S3 couldn't be contacted for a response, or the client
            // couldn't parse the response from Amazon S3.
            e.printStackTrace();
        }
    }
}
63
Q

CLI - Delete a bucket (with force parameter)

A
  1. Note: the –force will delete all objects first and then delete the bucket

$ aws s3 rb s3://bucket-name –force

64
Q

Object tagging

A
  1. Enables you to categorize storage.
  2. Each tag is a key-value pair.
  3. Ex for personal health information: PHI = true OR Classification = PHI
  4. WARNING - Acceptable to use tags to label objects with confidential or PII data, the tags themselves should not contain confidential information.
  5. Can use multiple tags on one object
  6. Can tag new or existing objects
65
Q

Object keys and values

A
  1. Key and Values are case sensitive.
66
Q

How developers typically name their folders (tagging).

A
  1. Categorize their files in file-like structure in the key name.
  2. S3 has a flat file structure
    - Ex:
    - photos/photo1.jpg
    - project/projectx/document.pdf
    - project/projecty/document2.pdf
  3. Allows you one dimensional categorization, everything under a prefix is one category.
67
Q

Benefits of tagging

A
  1. Object tags enable file-grained access control of permissions.
    - Ex: Can grant IAM user permission to read-only objects with specific tags.
  2. Enable fine-grained object lifecycle management in which you can specify a tag-based filter, in addition to key-name prefix, in a lifecycle rule.
  3. When using S3 analytics, can configure filters to group objects together