AWS Hello, Storage Definitions Flashcards
1
Q
3 types of AWS storage
A
- Block: EBS (persistent), EC2 Instance Store (ephemeral)
- File: EFS
- Object: S3, S3 Glacier
2
Q
EBS
A
- Elastic Block Store
3
Q
EFS
A
- Elastic File System
4
Q
S3
A
- Amazon Simple Storage Solution
5
Q
3 V’s of big data
A
- Velocity
- Variety
- Volume
6
Q
Velocity
A
- Speed at which data is being read/written
- Measured in RPS (reads per second) or
- Measured in WPS (writes per second)
- Can be based on batch processing, periodic, near real time, or real time speed
7
Q
Variety
A
- Determines how structured the data is AND
- How many different structures exist in the data.
- Ex: Highly structured -> loosely structured, unstructured, or BLOB
8
Q
BLOB
A
- Binary large object data.
9
Q
Volume
A
- Total size of dataset.
- Typical metrics that measure availability of data store to support volume are:
- maximum storage and cost - Ex: $/GB
10
Q
Hot data
A
- Actively worked on (new ingests, updates, transformations)
- Read and writes tend to be single item.
- Items tend to be small (up to hundreds of kilobytes)
- Speed of access = essential
- Tends to be high velocity + low volume
11
Q
Warm data
A
- Still being actively accessed (less frequent than hot)
- Items can be small like hot, but are updated and read in sets.
- Speed of access while important is less than hot.
- More balanced across velocity and volume dimensions.
12
Q
Cold data
A
- Still needs occaisional access.
- Updates to data are rare
- Reads can tolerate higher latency
- Items tend to be large (tens of hundreds of mega/giga bytes)
- Often written / read individually.
- High durability, low cost = essential
- High volume and low velocity.
13
Q
Frozen data
A
- Needs to be preserved for business continuity / archival / regulatory reasons.
- Not actively worked on.
- New data can be regularly added to data store, existing data is NEVER updated.
- Reads are very infrequent (“write once, read never”)
- Can tolerate high latency.
- Very high volume, very low-velocity
14
Q
Transient data
A
- Usually short-lived.
- Loss of a subset of transient data does not have a big impact on system.
- Ex: clickstream or Twitter data.
- Usually don’t need high durability of this data (b/c we expect it to be quickly consumed, yielding higher value data)
- Note: not all streaming data is transient. (ex: intrusion alert system)
15
Q
Reproducible data
A
- Contains a copy of useful information that is often created to improve performance or simplify consumption.
- Ex: adding more structure or altering structure to match consumption patterns.
- Loss of some or all this data may affect system’s performance or availablity.
- Not result in data loss (b/c it’s reproducible)
- Ex: Data warehouse data, read replicas of OLTP, many types of caches.
- Invest a bit of durability (to reduce impact on system’s performance/ availablity) but only to a point.
16
Q
OLTP
A
- Online transaction processing systems.
- Category of data processing focused on transaction-oriented tasks.
- Usually Inserting, Updating, Deleting small amounts of data in a database.
- Mainly deals with large numbers of transactions by large number of users.
17
Q
Authoritative data
A
- Source of truth.
- Losing it will significantly impact business b/c difficult/impossible to restore or replace.
- Willing to invest additional durability. More important, more durability desired.
18
Q
Critical/Regulated data
A
- Business must retain at any cost.
- Tends to be stored for longer periods of time.
- Needs to be protected from accidental or malicious changes, not just data loss.
- In addition to durability, cost and security are equally important.
19
Q
ERP
A
- Enterprise resource planning systems.
20
Q
Block storage
A
- Offer low latency, high performance workloads.
- Analogous to DAS (direct-attached storage) or SAN (storage area network).
- Ex: EC2 and EBS.
- ERPs are a good example of an enterprise application that requires dedicated, low-latency storage for each host.
21
Q
DAS
A
- Direct-attached storage
2. Analogous to Block storage.
22
Q
SAN
A
- Storage Area Network
- Analogous to Block Storage.
- Computer network which provides access to consolidated, block-level storage.
23
Q
Object storage
A
- Ideal for building modern applications from scratch that require scale and flexibility.
- Can be used to import existing data stores for analytics, backup, or archive.
- Cloud storage makes it possible to store virtually limitless data in native format.
- Ex: S3
24
Q
File storage
A
- For applications that need access to shared files and require a file system.
- Ideal for large content repositories, development environments, media stores, user home directors.
- Often supported with NAS (network-attached storage) server
25
Q
NAS
A
- Network-attached storage server usually supports File Storage.
26
Q
Confidentiality
A
- Equated to privacy level of your data.
- Refers to levels of encryption or access policies for your storage / files.
- Limit access to prevent accidental information disclosure by restricting access and enabling encryption.