Databases and Analytics Flashcards

1
Q

Which Type of Database has a rigid scheme (SQL) and is scaled vertically?

Relational Database
Non-Relational Database

A

Relational Database vs Non-Relational Database

Key difference between Relational and Non-relational is how data is MANAGED and how data is STORED

Relational Database

  • -Organized by tables, rows and columns
  • -Rigid scheme (SQL)
  • -Rules enforced within database
  • -Typically scaled vertically
  • -Support complex queries and joins
  • -Amazon RDS, Oracle, MySQL, IMB DB2, PostgreSQL
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which Type of Database allows varied data storage models, has a flexible schema w/ data stored in key value pairs, columns, documents or graphs and scales horizontally?

Relational Database
Non-Relational Database

A

Non-Relational Database

Key difference between Relational and Non-relational is how data is MANAGED and how data is STORED

  • -Varied data storage models
  • -Flexible schema (noSQL) - data stored in key value pairs, columns, documents or graphs
  • -Rules can be defined in application code (outside database)
  • -Scales horizontally
  • -Unstructured, simple language that supports any kind of schema
  • -Amazon DynamoDB, MongoDB (documents), Redis, Neo4j
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which Type of Database is used for Online Transaction Processing (OLTP) and is best for short transactions and simple queries?

Operational/transactional Database
Analytical Database

A

Operational/transactional Database

Key differences are USE CASES and how the database is OPTIMIZED

  • -Online Transaction Processing (OLTP)
  • -Production DBs that process transactions
  • ——ie: adding customer records, checking stock availability (INSERT, UPDATE, DELETE)
  • -Short transactions and simple queries

Relational examples:
—->Amazon RDS, Oracle, IBM DB2, MySQL

Non-relational examples:
—->Mongo DB, Cassandra, Neo4j, Hbase, Amazon DynamoDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which Type of Database is used for Online Analytics Processing (OLAP) for long transactions and complex queries?

Operational/transactional Database
Analytical Database

A

Analytical Database

Key differences are USE CASES and how the database is OPTIMIZED

  • -Online Analytics Processing (OLAP) - the source data comes from OLTP DBs
  • -Data warehouse
  • ——Typically separated from the customer facing DBs.
  • ——Data is extracted for decision making
  • -Long transactions and complex queries

Relational examples:
—>Amazon RedShift, Teradata, HP Vertica

Non-relational examples:
—>Amazon EMR, MapReduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which AWS Database is the best option if you need full control over instances and the database?

Amazon RDS
Amazon Dynamo DB
Amazon Redshift
Amazon ElastiCache
Amazon Elastic Map Reduce (EMR)
Amazon EC2
A

Database on EC2

Use Case:
–Need full control over instance and database (you manage it)

–3rd party database engine (not available in RDS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which AWS Database is the best option if you need a traditional Relational Database w/ well-formed and structured data?

Amazon RDS
Amazon Dynamo DB
Amazon Redshift
Amazon ElastiCache
Amazon Elastic Map Reduce (EMR)
Amazon EC2
A

Amazon RDS

Use Case:

–Need traditional relational database

–Data is well-formed and structured

–Ex. Oracle, PostgreSQL, Microsoft SQL, MariaDB, MySQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which AWS Database is the best option if you need a non-SQL database w/ in-memory performance and dynamic scaling?

Amazon RDS
Amazon Dynamo DB
Amazon Redshift
Amazon ElastiCache
Amazon Elastic Map Reduce (EMR)
Amazon EC2
A

Amazon DynamoDB

Use Case:
§ NoSQL database
–In-memory performance

–High I/O needs

–Dynamic scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which AWS Database is the best option if you have a data warehouse w/ large volumes of aggregated data?

Amazon RDS
Amazon Dynamo DB
Amazon Redshift
Amazon ElastiCache
Amazon Elastic Map Reduce (EMR)
Amazon EC2
A

Amazon Redshift

Use Case:

–Data warehouse for large volumes of aggregated data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which AWS Database is the best option for fast-temporary storage for small amounts of data?

Amazon RDS
Amazon Dynamo DB
Amazon Redshift
Amazon ElastiCache
Amazon Elastic Map Reduce (EMR)
Amazon EC2
A

Amazon ElastiCache

Use Case:

–Fast temporary storage for small amounts of data

–In-memory database

–High performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which AWS Database is the best option for analytic workloads using the Hadoop framework?

Amazon RDS
Amazon Dynamo DB
Amazon Redshift
Amazon ElastiCache
Amazon Elastic Map Reduce (EMR)
Amazon EC2
A

○ Amazon Elastic Map Reduce (EMR)

Use Case:

–Analytics workloads using the Hadoop framework

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amazon Relational Database Service (RDS)

A

Amazon Relational Database Service (RDS)

  • -Managed relational database - Structured Query Language (SQL) Databases
  • -Easy to setup, highly available, fault tolerant, and scalable
  • -Runs on EC2 instances so you must choose an instance family/type
  • -An Online Transaction Processing (OLTP) type of database

Common use cases:
–Online stores and banking systems

Can encrypt your Amazon RDS instances and snapshots at rest
—>Encryption uses AWS Key Management Service (KMS)

RDS supports the following database engines:

  • -SQL Server, Oracle, MySQL server, PostgreSQL, and Aurora
  • -Scales up by increasing INSTANCE size (compute and storage) or changing the INSTANCE type

Disaster recovery with Multi-AZ option by providing a passive standby instance:

Example of RDS database:
–Amazon Aurora

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe how Amazon Relational Database Service (RDS) provides disaster recovery using Multi-AZ option

A

Disaster recovery with Multi-AZ option by providing a passive standby instance:

RDS Master

  • -Runs in Availability zone
  • -Primary database (reads and writes)

RDS Master–> POINTS TO–>RDS Standby Instance

RDS Standby instance
–Master synchronously replicates to the Standby instance in a different Availability Zone

Read Replica

  • -An ‘asynchronous’ replication of the RDS Master so there is a little bit of a delay
  • -Located in same Availability zone
  • -Used to scale horizontally for reads/queries only (kind of like IDAA at SF)
  • -Application servers can ONY read from the read replica (can only write to the RDS Master)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the name of the database that is part of the Amazon RDS family, is SQL and PosstgreSQL compatible, and features a distributed, fault tolerant, self-healing storage system that auto-scales up to 128TP per database instance?

A

Amazon Aurora

–RDS family

–My SQL and PostgreSQL- compatible relational database

–Features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance

–VERY fast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amazon DynamoDB

A
  • Non-relational, No SQL type database (fully managed)
  • Key-value store and document store
  • Fully serverless service
  • –>No need to launch or pay for instances
  • –>You are just allocating characteristics of the performance

Horizontal scaling

  • –>Seamless scalability to any scale with PUSH BUTTON scaling or AUTO SCALING which means you can increase or decrease the performance w/out any interruption
  • —–>As opposed to Amazon RDS (Relational Database Service) you can scale your ‘instance’ up or down but you will have downtime b/c you have to restart your instance

Highly available and can be reserved
–>On demand backup and restore

DynamoDB is made up of:

  • -Tables
  • –>Items exist in the tables
  • —–>Attributes exist in the items

Global Tables

  • -fully managed multi-region, multi-master solution
  • -So, data can exist across multiple regions and be fully synchronized
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fully managed in-memory Cache for DynamoDB that increases performance up to 10x:

Dynamo Gateway
Dynamic Duo
Dynamo DB Accelerator (DAX)
Amazon Auto Scaling

A

Dynamo DB Accelerator (DAX)

Fully managed in-memory Cache for DynamoDB that increases performance up to 10x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Amazon Redshift

A

A Structured Query Language (SQL) based data warehouse used for analytics and applications

Relational database used for Online Analytics Processing (OLAP) use cases

Uses EC2 instances, so you must choose an instance family/type

Keeps three copies of your data

Provides continuous/incremental backups

17
Q

Amazon Elastic Map Reduce (EMR)

A

Managed cluster platform for BIG DATA including Apache Hadoop and Apache Spark

Hadoop is a framework for big data

Used for processing data for analytics and business intelligence
–Can also be used for transforming and moving large amounts of data

Performs Extract, Transform, and Load (ETL) functions

18
Q

Amazon Elasticache

A

Fully managed implementations
A key/value store
In-memory database used to cache data
High performance and low latency

Web session store (Redis)
–In cases w/ load-balanced web servers, store web session information in Redis so if a server is lost, the session info is not lost, and another web server can pick it up

Database caching (Memcached)
--Use Memcached in front of AWS RDS or DynamoDB to cache popular queries to offload work from RDS and return results faster to users

Leaderboards
–Use Redis to provide a live leaderboard for millions of users of your mobile app

Streaming data dashboards
–Provide a landing spot for streaming sensor data on the factory floor, providing live real-time dashboard displays

ElastiCache Node runs on EC2 instance
–Data is loaded into ElastiCache and is often used as web session store

19
Q

Amazon Athena

A

Amazon Athena

Interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
–Serverless database

Can be connected to other data sources with Lambda

Uses a managed Data Catalog (AWS Glue) to store information and schemas in the databases and tables
○ AWS Glue
□ Metadata Catalog that can be used with Amazon Athena
□ Fully managed Extract, Transform, and Load (ETL) service
□ You transform and move the data to various destinations with AWS Glue
□ It is used to prepare and load data for analytics

20
Q

A managed Data Catalog that can store information and schemas in the databases and tables:

A

AWS Glue

Metadata Catalog that can be used with Amazon Athena

Fully managed Extract, Transform, and Load (ETL) service

You transform and move the data to various destinations with AWS Glue

It is used to prepare and load data for analytics

21
Q

Amazon Kinesis Data Streams

A

Amazon Kinesis Data Streams

Service for processing streaming data
–Makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information

Producers send data which is stored in shards for up to 7 days (Shard - logical chunks that help maintain order)
—>Consumers process the data and save to another service

Typically data that is very large and consistent volumes
—>Ex. Data recording info about equipment temperature, movement of a car, etc

22
Q

Which Amazon Kinesis Data Stream has no shards, is completely automated and elastically scalable, and saves data directly to another service (such as S3, Splunk, Redshift, or Elastisearch)?

Amazon Kinesis Data Firehose
Amazon Kinesis Data Analytics

A

Amazon Kinesis Data Firehose

No shards, completely automated and elastically scalable

Saves data directly to another service such as S3, Splunk, Redshift, or Elastisearch

23
Q

Which Amazon Kinesis Data Stream provides real-time SQL processing for streaming data?

Amazon Kinesis Data Firehose
Amazon Kinesis Data Analytics

A

Amazon Kinesis Data Analytics

Provides real-time SQL processing for streaming data

24
Q

Processes and moves data between different AWS compute and storage services:

Amazon Kinesis Data Firehose
Amazon Neptune
Amazon Aurora
Amazon Data Pipeline

A

AWS Data Pipeline

Processes and moves data between different AWS compute and storage services

Save results to services such as:
—>S3, RDS, DynamoDB, and EMR

25
Q

A scalable, serverless, embeddable, machine learning-powered Business Intelligence (BI) service that provides a fast, cloud powered business analytics service which include easy to build visualizations and rich dashboards:

Amazon Kinesis Data Firehose
Amazon QuickSight
Amazon Aurora
Amazon Data Pipeline

A

Amazon QuickSight

A scalable, serverless, embeddable, machine learning-powered Business Intelligence (BI) service

Provides a fast, cloud powered business analytics service

Easy to build stunning visualizations and rich dashboards

Can be accessed from any browser or mobile device

26
Q

Fully managed graph database:

Amazon Kinesis Data Firehose
Amazon Neptune
Amazon Aurora
Amazon Data Pipeline

A

Amazon Neptune

Fully managed graph database

Ex. Facebook

27
Q

Fully managed document non-relational database service:

Amazon Kinesis Data Firehose
Amazon Neptune
Amazon DocumentDB
Amazon Data Pipeline

A

Amazon DocumentDB

Fully managed document non-relational database service

Queries and indexes JSON data

Supports MongoDB workloads

28
Q

Fully managed ledger database that provides cryptographically verifiable transaction logging:

Amazon Kinesis
Amazon Neptune
Amazon Quantum Ledger Database (QLDB)
Amazon Data Pipeline

A

Amazon Quantum Ledger Database (QLDB)

Fully managed ledger database immutable change history

Provides cryptographically verifiable transaction logging

A recording (ledger) of what transactions have taken place

29
Q

Fully managed service for joining public and private networks using Hyperledger Fabric and Ethereum:

A