Data Stores Flashcards
What are the 3 types (concepts) of data store in AWS?
1) Persistent datastore
2) Transient datastore
3) Ephemeral datastore
Define persistent data storage and give 2 examples…
Data that is durable and sticks around after a reboot, restart or power cycles
e.g. Glacier, RDS
Define a transient data store and give 2 examples…
Data is just temporary stored and passed along to another process or persistent store
e.g. SQS, SNS
Define an ephemeral data store and give 2 examples…
Data is lost when stopped.
e.g. EC2 instance store, Elasticache- Memcached
What does IOPS stand for and what does it measure?
IOPS- Input output Operations Per Second
It is a measure of how fast we can read and write to a device
What does throughput measure?
It is the measure of how much data can be moved at a time
What are the two types of data storage consistency models?
1) ACID
2) BASE
What does ACID stand for?
Atomic- Transactions are all or nothing
Consistent- Transactions must be valid
Isolated- Transactions can’t mess with one another
Durable- Completed transactions must stick around
What does BASE stand for?
Basic Availablility- Values available even if stale
Soft-state- Might not be instantly consistent across stores
Eventually consistent- Will achieve consistency at some point
Why would you want a model (BASE) that was not consistent?
Because as accurate and precise ACID is they don’t scale very well.
BASE is not inconsistent just not parallel
What type of store is S3?
An Object store
What is the maximum object size in S3 and what is the largest object in a single PUT?
Max object size is 5TB
Largest single put 5GB
How can you increase the efficiency of uploads with files larger than 100MB?
You can use multi-part uploads
How are objects referenced in S3?
By a KEY, essentially a URL path like key.
s3:///finance/April/16/invoice.pdf
What is S3’s consistency model for read-after-writes? and what does this mean in lay terms?
S3 provides read-after-write consistency for PUTS of new objects
If a new file is added that S3 has never seen before once written you can read it immediately
What is S3’s consistency model for HEAD or GET requests of a KEY before the object exists? and what does this mean in lay terms?
HEAD or GET requests for a KEY before the object exists will result in eventual consistency.
Until an object has been fully written and replicated across AZs S3 will say that they don’t know what the object is. So I’ll let you read it eventually.
What is S3’s consistency model for overwrite PUTS and DELETES of objects? and what does this mean in lay terms?
S3 offers eventual consistency for overwrite PUTS (updates) and DELETES.
S3 will serve the original object until it has updated or deleted the file and has replicated this change across all other AZs. It will serve the updated/delete once it has been fully replicated eventually.
What is S3’s consistency model for updates to a single KEY? and what does this mean in lay terms?
Updates to a single KEY are atomic
Whoa there, only one person can update this object at a time. If I get two requests I’ll process them in order of their timestamps and you’ll see the updates as soon as I replicate them elsewhere.
What are the 3 methods of securing objects in an S3 bucket?
1) Resource-based (object ACL bucket policy)
2) User-based (IAM policies)
3) Object-based (Object ACL)
4) Optional MFA before delete
In what order does S3 evaluate the security access of an object?
User-based (IAM policy) > Resourced based (bucket policy) > Object-based (Object ACL)
What does versioning in S3 enable?
Enables “roll-back” and “un-delete” capabilities
Do you get charged for old versions of objects?
Yes
Why use MFA in S3?
1) If you require safeguarding against accidental deletion of an object
2) If you would like to change the versioning state of your bucket
Why use cross-region replication in S3?
1) increased durability
2) reduced latency
3) To meet compliance requirements
What are the 7 storage classes of S3? and what types of data are they suited for?
1) Standard- Frequently accessed
2) Standard IA- Long-lived, infrequently accessed
3) One Zone IA- Long-lived, non-critical
4) Reduced redundancy- Frequently accessed, non-critical
5) Intelligent tiering- Long-lived with changing or unknown access patterns
6) Glacier- Long-term data archiving with retrieval mins-hours
7) Glacier Deep Archive- Long term retrieval within 12-48 hours.
Why use S3 lifecycle management in S3?
1) optimise storage costs
2) Adhering to a data retention policy
3) Keep S3 volumes well-maintained
Name 4 ways S3 can be used in analytics…
1) Data lake concept- S3 data used as a data lake to be accessible to Athena, Redshift or quick sight
2) IoT streaming data repo- Stream data into Kinesis Firehose
3) Machine learning and AI storage- Rekognition, Lex, Mxnet
4) Storage class analysis- Analyses current usage… used by S3 management analytics to recommend areas where you can save
Name the 3 encryption at rest options available with S3?
1) SSE-S3 - S3’s existing encryption key for AES-256
2) SSE-C - Upload your own custom AES-256 encryption key which S3 will use when it writes the objects
3) SSE-KMS - Use a key generated and managed by AWS key management service
4) Client-side - Encrypt objects using own local encryption process before uploading to S3 (i.e. PGP. GPG)
What is transfer acceleration in S3?
A process of speeding up data uploads using CloudFront in reverse
What does the requester pays mean in S3?
The user pays for requests and data transfer rather than the owner.
What is a tag in the context of S3?
Assign tags to objects for use in costing, billing and security etc…
What is an event in the context of S3?
Events can be used when certain events happen in your S3 bucket (modification/add/delete). These events can trigger notifications to SNS, SQS or Lambda when certain events happen.
What is static web hosting in S3?
Simple and massively scalable static website hosting
How can BitTorrent be used with S3?
You can use BitTorrent protocol to retrieve any publically available object by automatically generating a .torrent file
What type of data is AWS Glacier useful for?
Seldomly accessed data, cold storage
Which hybrid cloud service uses Glacier for storage?
AWS storage gateway virtual tape library
Is Glacier integrated with lifecycle cycle manager?
Yes
What is a glacier vault?
A way to group archives together in S3 galcier
What is an archive in Glacier?
Any object such as a photo, video or document. It is a base unit Glacier storage. Each archive has a unique ID and an optional description. This archive ID is unique in the AWS region the archive is stored.
What is the max size of an archive?
40TB
What are the two levels and ways access to a vault is controlled?
1) Resource-based- Vault access policy
2) Identity-based- IAM policies
What is a vault access policy? Give an example of it’s use…
Sets rules which vaults must abide by.
e.g. no one can delete an object or before anyone deletes an object they must use MFA
How are IAM policies used for access to vaults? Also, Vault locks are ___….
Access managed though IAM give users permissions to administer a vault or to overwrite or delete a vault lock.
Immutable… They cannot be changed
What are the 4 steps of locking a vault?
1) Create a lock
2) Initiate vault lock
3) wait 24 hours and then confirm the lock is performing
a) if lock confirmed the lock is applied forever… no changes
b) if the lock is not confirmed then the lock dissolves
What is EBS? (2 points)
Elastic Block Storage. Essentially virtual hard drives. Can be unplugged and used with a different instance
Can EBS volumes be used in mutli-AZ
No, confined to a single AZ. Only one instance can access volume by default.
What backup strategy can you use with EBS volumes?
EBS snapshots
When would you use an instance store over and EBS?
when you want very fast access e.g. cache/buffer/scratch.
EBS is over the network so not as fast
What are the 3 benefits of using EBS snapshots?
1) Provides a cost-effective and easy backup-snapshot
2) Easy to share data sets with other users/accounts
3) Easy to migrate a system to a volume a new AZ or region
What are the 4 steps to convert an unencrypted volume to an encrypted volume?
1) take a snapshot
2) Use snapshot to create a new volume
3) Check encryption when creating
4) mount voume in EC2
What information is stored in a volume snapshot?
Changes only
Given that we have 1 snapshots 1,2,3. If we delete 2, do we loose 3?
No, we still have 1 and 3, but we cannot re-created 2 at that point in time
What is a snapshot?
A collection of pointer data which is stored in S3
What are the 2 ways we can use lifecycle manager to manage EBS snapshots?
1) Schedule snapshots to be created for volumes e.g. every hour
2) Set retention rules to remove stale snapshots