Databases and Analytics Flashcards
Which Type of Database has a rigid scheme (SQL) and is scaled vertically?
Relational Database
Non-Relational Database
Relational Database vs Non-Relational Database
Key difference between Relational and Non-relational is how data is MANAGED and how data is STORED
Relational Database
- -Organized by tables, rows and columns
- -Rigid scheme (SQL)
- -Rules enforced within database
- -Typically scaled vertically
- -Support complex queries and joins
- -Amazon RDS, Oracle, MySQL, IMB DB2, PostgreSQL
Which Type of Database allows varied data storage models, has a flexible schema w/ data stored in key value pairs, columns, documents or graphs and scales horizontally?
Relational Database
Non-Relational Database
Non-Relational Database
Key difference between Relational and Non-relational is how data is MANAGED and how data is STORED
- -Varied data storage models
- -Flexible schema (noSQL) - data stored in key value pairs, columns, documents or graphs
- -Rules can be defined in application code (outside database)
- -Scales horizontally
- -Unstructured, simple language that supports any kind of schema
- -Amazon DynamoDB, MongoDB (documents), Redis, Neo4j
Which Type of Database is used for Online Transaction Processing (OLTP) and is best for short transactions and simple queries?
Operational/transactional Database
Analytical Database
Operational/transactional Database
Key differences are USE CASES and how the database is OPTIMIZED
- -Online Transaction Processing (OLTP)
- -Production DBs that process transactions
- ——ie: adding customer records, checking stock availability (INSERT, UPDATE, DELETE)
- -Short transactions and simple queries
Relational examples:
—->Amazon RDS, Oracle, IBM DB2, MySQL
Non-relational examples:
—->Mongo DB, Cassandra, Neo4j, Hbase, Amazon DynamoDB
Which Type of Database is used for Online Analytics Processing (OLAP) for long transactions and complex queries?
Operational/transactional Database
Analytical Database
Analytical Database
Key differences are USE CASES and how the database is OPTIMIZED
- -Online Analytics Processing (OLAP) - the source data comes from OLTP DBs
- -Data warehouse
- ——Typically separated from the customer facing DBs.
- ——Data is extracted for decision making
- -Long transactions and complex queries
Relational examples:
—>Amazon RedShift, Teradata, HP Vertica
Non-relational examples:
—>Amazon EMR, MapReduce
Which AWS Database is the best option if you need full control over instances and the database?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Database on EC2
Use Case:
–Need full control over instance and database (you manage it)
–3rd party database engine (not available in RDS)
Which AWS Database is the best option if you need a traditional Relational Database w/ well-formed and structured data?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon RDS
Use Case:
–Need traditional relational database
–Data is well-formed and structured
–Ex. Oracle, PostgreSQL, Microsoft SQL, MariaDB, MySQL
Which AWS Database is the best option if you need a non-SQL database w/ in-memory performance and dynamic scaling?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon DynamoDB
Use Case:
§ NoSQL database
–In-memory performance
–High I/O needs
–Dynamic scaling
Which AWS Database is the best option if you have a data warehouse w/ large volumes of aggregated data?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon Redshift
Use Case:
–Data warehouse for large volumes of aggregated data
Which AWS Database is the best option for fast-temporary storage for small amounts of data?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon ElastiCache
Use Case:
–Fast temporary storage for small amounts of data
–In-memory database
–High performance
Which AWS Database is the best option for analytic workloads using the Hadoop framework?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
○ Amazon Elastic Map Reduce (EMR)
Use Case:
–Analytics workloads using the Hadoop framework
Amazon Relational Database Service (RDS)
Amazon Relational Database Service (RDS)
- -Managed relational database - Structured Query Language (SQL) Databases
- -Easy to setup, highly available, fault tolerant, and scalable
- -Runs on EC2 instances so you must choose an instance family/type
- -An Online Transaction Processing (OLTP) type of database
Common use cases:
–Online stores and banking systems
Can encrypt your Amazon RDS instances and snapshots at rest
—>Encryption uses AWS Key Management Service (KMS)
RDS supports the following database engines:
- -SQL Server, Oracle, MySQL server, PostgreSQL, and Aurora
- -Scales up by increasing INSTANCE size (compute and storage) or changing the INSTANCE type
Disaster recovery with Multi-AZ option by providing a passive standby instance:
Example of RDS database:
–Amazon Aurora
Describe how Amazon Relational Database Service (RDS) provides disaster recovery using Multi-AZ option
Disaster recovery with Multi-AZ option by providing a passive standby instance:
RDS Master
- -Runs in Availability zone
- -Primary database (reads and writes)
RDS Master–> POINTS TO–>RDS Standby Instance
RDS Standby instance
–Master synchronously replicates to the Standby instance in a different Availability Zone
Read Replica
- -An ‘asynchronous’ replication of the RDS Master so there is a little bit of a delay
- -Located in same Availability zone
- -Used to scale horizontally for reads/queries only (kind of like IDAA at SF)
- -Application servers can ONY read from the read replica (can only write to the RDS Master)
What is the name of the database that is part of the Amazon RDS family, is SQL and PosstgreSQL compatible, and features a distributed, fault tolerant, self-healing storage system that auto-scales up to 128TP per database instance?
Amazon Aurora
–RDS family
–My SQL and PostgreSQL- compatible relational database
–Features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance
–VERY fast
Amazon DynamoDB
- Non-relational, No SQL type database (fully managed)
- Key-value store and document store
- Fully serverless service
- –>No need to launch or pay for instances
- –>You are just allocating characteristics of the performance
Horizontal scaling
- –>Seamless scalability to any scale with PUSH BUTTON scaling or AUTO SCALING which means you can increase or decrease the performance w/out any interruption
- —–>As opposed to Amazon RDS (Relational Database Service) you can scale your ‘instance’ up or down but you will have downtime b/c you have to restart your instance
Highly available and can be reserved
–>On demand backup and restore
DynamoDB is made up of:
- -Tables
- –>Items exist in the tables
- —–>Attributes exist in the items
Global Tables
- -fully managed multi-region, multi-master solution
- -So, data can exist across multiple regions and be fully synchronized
Fully managed in-memory Cache for DynamoDB that increases performance up to 10x:
Dynamo Gateway
Dynamic Duo
Dynamo DB Accelerator (DAX)
Amazon Auto Scaling
Dynamo DB Accelerator (DAX)
Fully managed in-memory Cache for DynamoDB that increases performance up to 10x
Amazon Redshift
A Structured Query Language (SQL) based data warehouse used for analytics and applications
Relational database used for Online Analytics Processing (OLAP) use cases
Uses EC2 instances, so you must choose an instance family/type
Keeps three copies of your data
Provides continuous/incremental backups
Amazon Elastic Map Reduce (EMR)
Managed cluster platform for BIG DATA including Apache Hadoop and Apache Spark
Hadoop is a framework for big data
Used for processing data for analytics and business intelligence
–Can also be used for transforming and moving large amounts of data
Performs Extract, Transform, and Load (ETL) functions
Amazon Elasticache
Fully managed implementations
A key/value store
In-memory database used to cache data
High performance and low latency
Web session store (Redis)
–In cases w/ load-balanced web servers, store web session information in Redis so if a server is lost, the session info is not lost, and another web server can pick it up
Database caching (Memcached) --Use Memcached in front of AWS RDS or DynamoDB to cache popular queries to offload work from RDS and return results faster to users
Leaderboards
–Use Redis to provide a live leaderboard for millions of users of your mobile app
Streaming data dashboards
–Provide a landing spot for streaming sensor data on the factory floor, providing live real-time dashboard displays
ElastiCache Node runs on EC2 instance
–Data is loaded into ElastiCache and is often used as web session store
Amazon Athena
Amazon Athena
Interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
–Serverless database
Can be connected to other data sources with Lambda
Uses a managed Data Catalog (AWS Glue) to store information and schemas in the databases and tables
○ AWS Glue
□ Metadata Catalog that can be used with Amazon Athena
□ Fully managed Extract, Transform, and Load (ETL) service
□ You transform and move the data to various destinations with AWS Glue
□ It is used to prepare and load data for analytics
A managed Data Catalog that can store information and schemas in the databases and tables:
AWS Glue
Metadata Catalog that can be used with Amazon Athena
Fully managed Extract, Transform, and Load (ETL) service
You transform and move the data to various destinations with AWS Glue
It is used to prepare and load data for analytics
Amazon Kinesis Data Streams
Amazon Kinesis Data Streams
Service for processing streaming data
–Makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information
Producers send data which is stored in shards for up to 7 days (Shard - logical chunks that help maintain order)
—>Consumers process the data and save to another service
Typically data that is very large and consistent volumes
—>Ex. Data recording info about equipment temperature, movement of a car, etc
Which Amazon Kinesis Data Stream has no shards, is completely automated and elastically scalable, and saves data directly to another service (such as S3, Splunk, Redshift, or Elastisearch)?
Amazon Kinesis Data Firehose
Amazon Kinesis Data Analytics
Amazon Kinesis Data Firehose
No shards, completely automated and elastically scalable
Saves data directly to another service such as S3, Splunk, Redshift, or Elastisearch
Which Amazon Kinesis Data Stream provides real-time SQL processing for streaming data?
Amazon Kinesis Data Firehose
Amazon Kinesis Data Analytics
Amazon Kinesis Data Analytics
Provides real-time SQL processing for streaming data
Processes and moves data between different AWS compute and storage services:
Amazon Kinesis Data Firehose
Amazon Neptune
Amazon Aurora
Amazon Data Pipeline
AWS Data Pipeline
Processes and moves data between different AWS compute and storage services
Save results to services such as:
—>S3, RDS, DynamoDB, and EMR
A scalable, serverless, embeddable, machine learning-powered Business Intelligence (BI) service that provides a fast, cloud powered business analytics service which include easy to build visualizations and rich dashboards:
Amazon Kinesis Data Firehose
Amazon QuickSight
Amazon Aurora
Amazon Data Pipeline
Amazon QuickSight
A scalable, serverless, embeddable, machine learning-powered Business Intelligence (BI) service
Provides a fast, cloud powered business analytics service
Easy to build stunning visualizations and rich dashboards
Can be accessed from any browser or mobile device
Fully managed graph database:
Amazon Kinesis Data Firehose
Amazon Neptune
Amazon Aurora
Amazon Data Pipeline
Amazon Neptune
Fully managed graph database
Ex. Facebook
Fully managed document non-relational database service:
Amazon Kinesis Data Firehose
Amazon Neptune
Amazon DocumentDB
Amazon Data Pipeline
Amazon DocumentDB
Fully managed document non-relational database service
Queries and indexes JSON data
Supports MongoDB workloads
Fully managed ledger database that provides cryptographically verifiable transaction logging:
Amazon Kinesis
Amazon Neptune
Amazon Quantum Ledger Database (QLDB)
Amazon Data Pipeline
Amazon Quantum Ledger Database (QLDB)
Fully managed ledger database immutable change history
Provides cryptographically verifiable transaction logging
A recording (ledger) of what transactions have taken place
Fully managed service for joining public and private networks using Hyperledger Fabric and Ethereum: