Udemy lecture 7: Databases & analytics Flashcards
What is a relational database?
Relational database is when you make a link to multiple tables (ex. a student made 1 table with student ID, Dept ID, Name, Email, & then a second table was made linking to the first one where in the second table it starts with Dept ID, then gives futher information) (think of it like an excel sheet)
In relational databases it uses the __________ language to perform queries or lookups
SQL (Whenever you hear SQL think of relational databases)
______________ databases are nonrelational databases
NoSQL
____________ databases are purpose built for specific data models & have flexible schemas for building modern applications
NoSQL
What are some benefits of NoSQL databases?
- Flexible- easy to evolve data model
- Scalability- designed to scale out by using distributed clusters
- High-performance- optimized for a specific data model
-Highly functional- types optimized for data model
What does JSON stand for?
Javascript object notation
NoSQL can have its data in _________ format
JSON
Data can be _______ in the JSON format
Nested (storing data using in a structure way, but the fields (information) can change over time so have to change that information (support for new types of arrays))
What is AWS responsibility related to databases
- Responsible for the entire database in terms of patching
- Automated backup & restore, operations, upgrades
- Monitoring, alerting
-AWS offers to manage different databses
______ is a relational database
RDS
What does RDS stand for?
Relational database service
What is a Relational database service?
A managed database service for database that will use SQL as a query language, & it will allow you to create databases in the cloud that will be managed by AWS
_________ is a proprietary database from AWS
Aurora
What are the advantages to using RDS than deploying a database on EC2?
- Automated provisioning, OS patching
- Continuous backups & restore to specific timestamps (point-in-time restore)!
-Monitoring dashboards
-Read replicas for improved read performance
-Multi-AZ setup for DR (disaster recovery) - Maintenance windows for upgrades
- Scaling capability (vertical & horizontal)
With RDS databases you can’t connect ________ to it
SSH
What are the two kinds of database technologies that aurora supports?
- PostgreSQL
- MySQL
Aurora is supposed to be _________ optimized to yield better performances
Cloud
Aurora storage grows automatically from __________________
From 10 gigabytes to 128 terabytes
__________ & ___________ are the two ways to create relational databases on AWS
RDS & Aurora (They are both managed & aurora is more cloud-native whereas RDS is going to be running on the technologies you know that is a managed service)
The __________ option for Amazon Aurora is where the database instantiation is going to be automated
Serverless (also has auto scaling based on your usage)
Both ________ & ___________ are supported as engines of aurora serverless database
PostgreSQL & MySQL
Aurora serverless is great for _____________ workloads
Infrequent/unpredictable workloads
If your see Aurora with no management overhead then think of ______________
Aurora serverless
___________ can scale the read workload of your database
RDS read replicas (can create up to 15 replicas & data is only written to the main database)
__________ is useful to have in case of AZ outage or main database has problems (high availability)
failover database (so its bascially multi AZ)
In the ___________ data is only read/written to the main database & can only have one other AZ as a ________
Failover
You can use read replicas in multi- regions & use you it for a ___________ in case of region issue & local performance improve, less latency but also has a replication cost
Disaster recovery
____________ is used to get managed Redis or Memcached databses
Elasticache
_________ databases are caches that are in-memory databases with high performance & low latency
Redis or Memcached
Whenever you see “in-memory” database should think of ___________
elastichache
________________ helps reduce load off databases for read-intensive workloads
Elasticache
____________ is fully managed and highly available with replication across 3 AZ
DynamoDB
DynamoDB is a __________ database
NoSQL database (not a relational database)
___________ has a single-digit millisecond latency- low latency retrieval & it scales to massive workloads, distributed “serverless” database
DynamoDB
DynamoDB is a __________ database
Key/value
_______________ is a fully managed in-memory cache for dynamoDB (will give you a 10x performance improvement)
DynamoDB accelerator-DAX
What is the difference between elasicache & dynamoDB DAX?
DAX is only used for & is integrated with dynamnoDB, while elasticache can be used for other databases
_______________ make dynamoDB tables accessible with low latency in multiple- regions
DynamoDB global tables
With dynamoDB-global tables its an ___________ replication (read/write to any AWS region)
Active-Active
______________ is based on PostgreSQL but its not used for OLTP
Redshift
Redshift uses ___________
OLAP Online analytical processing which is used to do analytics & data warehousing
Redshift stores data in a __________ storage
Columnar (instead of row-based)
Redshift __________ is a feature in redshift that allows you to automatically provision & scale data warehouse underlying capacity
Redshift serverless
With ________________ you run analytics workloads without managing data warehouse infrastructure
Redshift serverless
What does EMR mean?
Elastic Mapreduce
________ helps create Hadoop clusters (big data) to analyze & process vast amount of data
EMR
In Hadoop the cluster can be made of hundreds of ______________
EC2 instances
What are the different use cases for EMR?
- Data processing
- Machine learning
- Web indexing
- Big data
__________ is a serverless query service to perform analytics against S3 objects
Amazon athena
Amazon Athena uses __________ language to query files
SQL
___________ analyze data in S3 using serverless SQL
Amazon athena
_____________ is a serverless machine learning-powered business intelligence service to create interactive dashboard
Amazon quicksight
___________ is the same for MongoDB (which is a NoSQL database)
DocumentDB
________________ is a fully managed graph database
Amazon Neptune
A popular graph datasets would be a ____________
Social network
What does QLDB mean?
Stands for quantum ledger database
A _________ is a book recording financial transactions
ledger
_____________ is used to just record financial transaction in AWS
Amazon QLDB
Amazon QLDB is used to review history of all the changes made to your application data over time & its an _________ system which means no entry can be removed or modified, cryptographically verifiable
Immutable
What is the difference with amazon managed blocked chain and Amazon QLDB?
With Amazon QLDB there is no concept of decentralization, which means there’s just a central database owened by amazon but with managed blockchain, its gonna have a decentralized component
_______________ makes it possible to build applications where multiple parties can execute transaction without the need for a trusted, central authority
Amazon managed blockchain
Amazon managed blockchain is compatible with the frameworks ___________ & __________
Hyperledger fabric & ethereum
___________ is a managed extract, transform & load (ETL) service
AWS Glue
With __________ you get quick & securely migrated databases to AWS, resilient, self-healing (used to migrate databases)
DMS (Database migration service)
___________ is a fully managed, petabyte-scale data warehouse service in the cloud.
Amazon Redshift
__________ is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SOL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Amazon Athena
________ is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
AWS Glue
_____________ helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.
AWS Database Migration Service
The ____________ is a central repository to store structural and operational metadata for all your data assets. For a given data set, you can store its table definition, physical location, add business relevant attributes, as well as track how this data has changed over time.
AWS Glue Data Catalog
____________ is a
SOL managed service that makes it easy to set up, operate, and scale a relational database in the cloud. It is suited for OLTP workloads
Amazon Relational Database Service (Amazon RDS)