Databases and Analytics Flashcards
Which Type of Database has a rigid scheme (SQL) and is scaled vertically?
Relational Database
Non-Relational Database
Relational Database vs Non-Relational Database
Key difference between Relational and Non-relational is how data is MANAGED and how data is STORED
Relational Database
- -Organized by tables, rows and columns
- -Rigid scheme (SQL)
- -Rules enforced within database
- -Typically scaled vertically
- -Support complex queries and joins
- -Amazon RDS, Oracle, MySQL, IMB DB2, PostgreSQL
Which Type of Database allows varied data storage models, has a flexible schema w/ data stored in key value pairs, columns, documents or graphs and scales horizontally?
Relational Database
Non-Relational Database
Non-Relational Database
Key difference between Relational and Non-relational is how data is MANAGED and how data is STORED
- -Varied data storage models
- -Flexible schema (noSQL) - data stored in key value pairs, columns, documents or graphs
- -Rules can be defined in application code (outside database)
- -Scales horizontally
- -Unstructured, simple language that supports any kind of schema
- -Amazon DynamoDB, MongoDB (documents), Redis, Neo4j
Which Type of Database is used for Online Transaction Processing (OLTP) and is best for short transactions and simple queries?
Operational/transactional Database
Analytical Database
Operational/transactional Database
Key differences are USE CASES and how the database is OPTIMIZED
- -Online Transaction Processing (OLTP)
- -Production DBs that process transactions
- ——ie: adding customer records, checking stock availability (INSERT, UPDATE, DELETE)
- -Short transactions and simple queries
Relational examples:
—->Amazon RDS, Oracle, IBM DB2, MySQL
Non-relational examples:
—->Mongo DB, Cassandra, Neo4j, Hbase, Amazon DynamoDB
Which Type of Database is used for Online Analytics Processing (OLAP) for long transactions and complex queries?
Operational/transactional Database
Analytical Database
Analytical Database
Key differences are USE CASES and how the database is OPTIMIZED
- -Online Analytics Processing (OLAP) - the source data comes from OLTP DBs
- -Data warehouse
- ——Typically separated from the customer facing DBs.
- ——Data is extracted for decision making
- -Long transactions and complex queries
Relational examples:
—>Amazon RedShift, Teradata, HP Vertica
Non-relational examples:
—>Amazon EMR, MapReduce
Which AWS Database is the best option if you need full control over instances and the database?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Database on EC2
Use Case:
–Need full control over instance and database (you manage it)
–3rd party database engine (not available in RDS)
Which AWS Database is the best option if you need a traditional Relational Database w/ well-formed and structured data?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon RDS
Use Case:
–Need traditional relational database
–Data is well-formed and structured
–Ex. Oracle, PostgreSQL, Microsoft SQL, MariaDB, MySQL
Which AWS Database is the best option if you need a non-SQL database w/ in-memory performance and dynamic scaling?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon DynamoDB
Use Case:
§ NoSQL database
–In-memory performance
–High I/O needs
–Dynamic scaling
Which AWS Database is the best option if you have a data warehouse w/ large volumes of aggregated data?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon Redshift
Use Case:
–Data warehouse for large volumes of aggregated data
Which AWS Database is the best option for fast-temporary storage for small amounts of data?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
Amazon ElastiCache
Use Case:
–Fast temporary storage for small amounts of data
–In-memory database
–High performance
Which AWS Database is the best option for analytic workloads using the Hadoop framework?
Amazon RDS Amazon Dynamo DB Amazon Redshift Amazon ElastiCache Amazon Elastic Map Reduce (EMR) Amazon EC2
○ Amazon Elastic Map Reduce (EMR)
Use Case:
–Analytics workloads using the Hadoop framework
Amazon Relational Database Service (RDS)
Amazon Relational Database Service (RDS)
- -Managed relational database - Structured Query Language (SQL) Databases
- -Easy to setup, highly available, fault tolerant, and scalable
- -Runs on EC2 instances so you must choose an instance family/type
- -An Online Transaction Processing (OLTP) type of database
Common use cases:
–Online stores and banking systems
Can encrypt your Amazon RDS instances and snapshots at rest
—>Encryption uses AWS Key Management Service (KMS)
RDS supports the following database engines:
- -SQL Server, Oracle, MySQL server, PostgreSQL, and Aurora
- -Scales up by increasing INSTANCE size (compute and storage) or changing the INSTANCE type
Disaster recovery with Multi-AZ option by providing a passive standby instance:
Example of RDS database:
–Amazon Aurora
Describe how Amazon Relational Database Service (RDS) provides disaster recovery using Multi-AZ option
Disaster recovery with Multi-AZ option by providing a passive standby instance:
RDS Master
- -Runs in Availability zone
- -Primary database (reads and writes)
RDS Master–> POINTS TO–>RDS Standby Instance
RDS Standby instance
–Master synchronously replicates to the Standby instance in a different Availability Zone
Read Replica
- -An ‘asynchronous’ replication of the RDS Master so there is a little bit of a delay
- -Located in same Availability zone
- -Used to scale horizontally for reads/queries only (kind of like IDAA at SF)
- -Application servers can ONY read from the read replica (can only write to the RDS Master)
What is the name of the database that is part of the Amazon RDS family, is SQL and PosstgreSQL compatible, and features a distributed, fault tolerant, self-healing storage system that auto-scales up to 128TP per database instance?
Amazon Aurora
–RDS family
–My SQL and PostgreSQL- compatible relational database
–Features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance
–VERY fast
Amazon DynamoDB
- Non-relational, No SQL type database (fully managed)
- Key-value store and document store
- Fully serverless service
- –>No need to launch or pay for instances
- –>You are just allocating characteristics of the performance
Horizontal scaling
- –>Seamless scalability to any scale with PUSH BUTTON scaling or AUTO SCALING which means you can increase or decrease the performance w/out any interruption
- —–>As opposed to Amazon RDS (Relational Database Service) you can scale your ‘instance’ up or down but you will have downtime b/c you have to restart your instance
Highly available and can be reserved
–>On demand backup and restore
DynamoDB is made up of:
- -Tables
- –>Items exist in the tables
- —–>Attributes exist in the items
Global Tables
- -fully managed multi-region, multi-master solution
- -So, data can exist across multiple regions and be fully synchronized
Fully managed in-memory Cache for DynamoDB that increases performance up to 10x:
Dynamo Gateway
Dynamic Duo
Dynamo DB Accelerator (DAX)
Amazon Auto Scaling
Dynamo DB Accelerator (DAX)
Fully managed in-memory Cache for DynamoDB that increases performance up to 10x