Data Stores Flashcards

Question

What are the 7 storage classes of S3? and what types of data are they suited for?

Answer 1

1) Standard- Frequently accessed 2) Standard IA- Long-lived, infrequently accessed 3) One Zone IA- Long-lived, non-critical 4) Reduced redundancy- Frequently accessed, non-critical 5) Intelligent tiering- Long-lived with changing or unknown access patterns 6) Glacier- Long-term data archiving with retrieval mins-hours 7) Glacier Deep Archive- Long term retrieval within 12-48 hours.

Answer 2

1) optimise storage costs 2) Adhering to a data retention policy 3) Keep S3 volumes well-maintained

Answer 3

1) Data lake concept- S3 data used as a data lake to be accessible to Athena, Redshift or quick sight 2) IoT streaming data repo- Stream data into Kinesis Firehose 3) Machine learning and AI storage- Rekognition, Lex, Mxnet 4) Storage class analysis- Analyses current usage... used by S3 management analytics to recommend areas where you can save

Answer 4

1) SSE-S3 - S3's existing encryption key for AES-256 2) SSE-C - Upload your own custom AES-256 encryption key which S3 will use when it writes the objects 3) SSE-KMS - Use a key generated and managed by AWS key management service 4) Client-side - Encrypt objects using own local encryption process before uploading to S3 (i.e. PGP. GPG)

Answer 5

A process of speeding up data uploads using CloudFront in reverse

Answer 6

The user pays for requests and data transfer rather than the owner.

Answer 7

Assign tags to objects for use in costing, billing and security etc...

Answer 8

Events can be used when certain events happen in your S3 bucket (modification/add/delete). These events can trigger notifications to SNS, SQS or Lambda when certain events happen.

Answer 9

Simple and massively scalable static website hosting

Answer 10

You can use BitTorrent protocol to retrieve any publically available object by automatically generating a .torrent file

Answer 11

Seldomly accessed data, cold storage

Answer 12

AWS storage gateway virtual tape library

Answer 13

A way to group archives together in S3 galcier

Answer 14

Any object such as a photo, video or document. It is a base unit Glacier storage. Each archive has a unique ID and an optional description. This archive ID is unique in the AWS region the archive is stored.

Answer 15

1) Resource-based- Vault access policy | 2) Identity-based- IAM policies

Answer 16

Sets rules which vaults must abide by. e.g. no one can delete an object or before anyone deletes an object they must use MFA

Answer 17

Access managed though IAM give users permissions to administer a vault or to overwrite or delete a vault lock. Immutable... They cannot be changed

Answer 18

1) Create a lock 2) Initiate vault lock 3) wait 24 hours and then confirm the lock is performing a) if lock confirmed the lock is applied forever... no changes b) if the lock is not confirmed then the lock dissolves

Answer 19

Elastic Block Storage. Essentially virtual hard drives. Can be unplugged and used with a different instance

Answer 20

No, confined to a single AZ. Only one instance can access volume by default.

Answer 21

EBS snapshots

Answer 22

when you want very fast access e.g. cache/buffer/scratch. EBS is over the network so not as fast

Answer 23

1) Provides a cost-effective and easy backup-snapshot 2) Easy to share data sets with other users/accounts 3) Easy to migrate a system to a volume a new AZ or region

Answer 24

1) take a snapshot 2) Use snapshot to create a new volume 3) Check encryption when creating 4) mount voume in EC2

Answer 25

Changes only

Answer 26

No, we still have 1 and 3, but we cannot re-created 2 at that point in time

Answer 27

A collection of pointer data which is stored in S3

Answer 28

1) Schedule snapshots to be created for volumes e.g. every hour 2) Set retention rules to remove stale snapshots

Answer 29

Elastic File System. An implementation of NFS- Network File Share protocol

Answer 30

You pay for a set about of GB per month, regardless of use!

Answer 31

You only pay for the amount of storage you use

Answer 32

Through mount points in one or many AZs

Answer 33

Yes, but caution here... you would need to have a stable connection e.g. direct connect or Amazon Data Sync with EFS sync

Answer 34

3x more expensive than EBS | 20x more expensive than S3

Answer 35

A virtural machine that you run on-prem with VMware. It provides local resources and backends onto S3 and Glacier

Answer 36

1) Disaster recovery | 2) Cloud migrations

Answer 37

1) File gateway 2) Volume gateway stored mode 3) Volume gateway cached mode 4) Tape gateway

Answer 38

Allows on prem to store objects in S3 via NFS or SMB mount points NFS, SMB

Answer 39

Asynchronous replication of on-prem to S3 iSCSI

Answer 40

Primary data stored in S3 with frequently accessed data cached locally on prem iSCSI

Answer 41

A virtual media change and tape library for use with existing backup software iSCSI

Answer 42

AWS's version of DropBox or google drive

Answer 43

When you want ultimate flexibility or a database that is not currently supported by RDS e.g. SAP HANNA

Answer 44

You are responsible for backup, patching and scaling...

Answer 45

A manage option for mySQL, PostgreSQL, MariaDB,Aurora....

Answer 46

Structured and relational data

Answer 47

1) Automates backup and patching in customer-defined maintenance windows 2) push-button scaling 3) redundancy

Answer 48

1) Synchronous replication | 2) Asynchronous replication

Answer 49

Instant replication of data from a master to a standby in same AZ.

Answer 50

The standby RDS gets promoted to the master and has ALL of the data that the master had

Answer 51

Read replicas are seconds or mins behind the master.

Answer 52

Read replicas are promoted to master and new standby created (This will be done manually)

Answer 53

DynamoDB is a managed multi-AZ noSQL datastore with cross-region replication option.

Answer 54

BASE, Eventual consistency by default.

Answer 55

Based on throughput

Answer 56

Set min/max level in anticipation of need. Can you on demand capacity if you do not know the amount of capacity you need.

Answer 57

Yes you can force ACID

Answer 58

A name and value pair

Answer 59

A collection of attributes

Answer 60

A collection of items

Answer 61

It creates a HASH of the key value. Used to assign a partition or the underlying physical storage to use. AKA a hash attribute.

Answer 62

Partition key + sort key

Answer 63

Partition key- the location the data will be physically stored Sort key- The order the data will be stored in for all keys with the same partition key

Answer 64

1) Global secondary index | 2) Local secondary index

Answer 65

Partition key and sort key can be different that those on the table I AM GLOBAL BABY!

Answer 66

Same partition key as the table, but a different sort key

Answer 67

When you want a fast query of attributes outside of the primary key without having to do a table scan e.g. querying sales orders by customer number rather than sales by order number

Answer 68

When you already know the partition key and want to quickly query on some other attribute e.g. I have a sales order number but I would like to retrieve only those records with a certain material number

Answer 69

solution- project just those few attributes in a global secondary index cost- minimal benefit- lowest possible latency access for non-key items

Answer 70

solution- project those attributes in a global secondary index cost- moderate, aims to offset table scan cost benefit- low latency for access to non-key items

Answer 71

Solution- Project those attributes or even the entire table into a global secondary index cost- up to double benefit- Maximim flexibility

Answer 72

Solution- Project keys only for the global secondary index cost- minimal benefit- very fast or updates for non-partition key items

Answer 73

To apply different WRU (write capacity unit) and RCU (read...) to tables e.g. free and premium customers.

Answer 74

A cost-effective scalable data warehouse, you this to query large data sets and identify correlations between disparate datasets. You can also query S3 using RedShift spectrum

Answer 75

A graph database. Allows you to store and query relationship data.

Answer 76

An in memory data store (not persistent in traditional sense)

Answer 77

1) Memcached | 2) Redis

Answer 78

Elasticache

Answer 79

Redis, using Redis avoids storing session data on server

Answer 80

Memcached, cheap and fast!

Answer 81

Redis, uses sorted sets! can keep order of millions of users instantly

Answer 82

Use either! e.g. lading spot for streaming sensor data on the factor floor

Answer 83

1) simple and straightforward 2) you need to scale and and in as demand changes 3) you need mulitlple CPU cores and threads 4) you need to cache objects like database queries

Answer 84

1) you need encryption 2) you need HIPPA compliance 3) you need support for clustering 4) you need complex data types 5) you need HA 6) you need pub/sub compatibility 7) you need geospatial indexing 8) you need backup and restore

Answer 85

Managed bockchain framework QLBD is a ordering service that is used to maintain complete history of all transactions

Answer 86

Fully managed database designed to manage time-series data e.g. industrial machinery

Answer 87

AWS based MongoDB - HA, multiAZ, scalable

Answer 88

Search engine but also a doc store, also known as an ELK stack... basically just a way to perform analytics on data.

Answer 89

Database on EC2

Answer 90

Amazon RDS

Answer 91

Amazon Redshift

Answer 92

Amazon Neptune

Answer 93

Amazon Elasticache

Answer 94

NFS only it does not expose as NFS!

Answer 95

Create a global secondary index with the most common queried attribute as the hash key (partition key)

Answer 96

1) get 404 error as the upload had not propagated 2) you get the metadata Because of eventual consistency for read after write

Answer 97

Another name for eventual consistency

Answer 98

FULLY QUALIFIED DOMAIN NAME e.g. when specifying a mount point in EFS

Answer 99

Create EFS mount targets in each AZ and configure each EC2 instance to mound the common mount target via it's FQDN

Answer 100

The EC2 instances use the common FQDN as a mount target. The EFS file system will resolve to its local mount target in each AZ

Answer 101

Graph databases

Answer 102

1) JSON 2) Apache Paraquet 3) Apache ORC NOT XML

Answer 103

1) DynamoDB Accelerator (DAX)- in memory cache in front of DynamoDB 2) Secondary indexes

Data Stores Flashcards

(136 cards)