Database, Analytics, ML Flashcards
What is RDS backed by?
EBS
What happens when you are running out of provisioned space on an RDS database?
AWS will automatically scale it for you
If you want to limit how much your DB in RDS can hold, what can you do?
Set the Maximum Storage Threshold
How many read replicas can you have in RDS?
5
True or False: RDS Read Replicas in the same region do not pay the network fee
True. Data transfer to read replicas in other AZs have to pay a networking fee
Why would we use RDS Multi AZ?
It is primarily used for disaster recovery. It performs synchronous replication to read instances
True or False: We cannot set our read replicas in Multi AZ for Disaster Recovery DBs
False. We can set our Read Replicas for DR
What is RDS Custom?
RDS Custom allows us to use Oracle and SQL Server with OS and database customization. We can access the underlying EC2 instance, which RDS managed DB don’t allow us to do
What is an Aurora Writer Endpoint?
A pointer that points to the Master. If the Master fails, and a Replica DB is promoted, the Writer Endpoint automatically shifts to the new Master.
Therefore we do not have to change out apps endpoint
What is an Aurora Reader Endpoint?
It is a pointer that points to a Load Balancer that sits in front of all the replica DBs to perform consistent and fault tolerant reads
We want to run intensive queries on certain Aurora DB instances that have stronger underlying infra. What can we do?
Create a Custom Endpoint. Custom Endpoints allow us to target specific Read Replicas for different types of operations or needs
What is Aurora Serverless?
It allows us to hand off the instantiation and scaling to AWS. It uses a fleet of DBs provisioned by AWS to hold our data. We do not have to pay for upfront capacity
What is Aurora Multi-Master?
Every DB Instance is a Read/Write node. If one fails, you can still write to other Master instances. You may have to configure conflict avoidance strategy, like implementing health checks to see if a Writer instance is still available. Great for if you need high write capacity
What is Global Aurora?
Your DB spans multiple regions, with up to 16 DB Read Instances in each. In one region you have your read/write DB, and up to five read-only secondary regions.
If one entire region fails, it will quickly shift read/write capabilities to another region
Cross-region replication takes less than one second
What two machine learning services can Aurora integrate with?
SageMaker - Deploy machine learning models
Comprehend - Uses machine learning to learn insights and connections in text
True or False: RDS Automated backups don’t expire but manual DB Snapshots do
False. Automated backups last for 1 to 35 days, manual DB snapshots don’t expire unless deleted
If you stop an RDS DB for a long while, what is the recommended protocol?
Create a snapshot, delete the DB and then restore from snapshot when needing the DB. Stopped RDS instances still charge for storage
What is Aurora Database Cloning?
Aurora DB cloning allows us to create a new Aurora DB Cluster from an existing one. It is faster than a snapshot & restore
What is an RDS Proxy?
RDS Proxy sits in front of the RDS instance and pools together all the connections.
This can be beneficial as it puts less strain on the RDS instance resources (CPU, RAM) and minimizes connection timeoutes
True or False: The RDS proxy is available to the public
False, It is only available with the VPC
What two DB engines can we choose from with Aurora?
Postgres and MySQL
What AWS resource can we sit in front of RDS to take the load off of our DB resources?
We can use ElastiCache, which will handle caching (note: our applications will still have to implement our Caching strategy)
We want to have Multi-AZ with Auto-Failover and read replicas for our caching, would we use Redis or Memcached?
Redis
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/WhatIs.html#WhatIs.Overview
True or False: ElastiCache supports IAM authentication for Redis and Memcached
False, it only supports it for Redis
Which service supports SASL-based authentication?
Memcached
What is Redis AUTH?
It allows you to create a password/token when creating the cluster; that password or token is then used to start making operations against Redis. Also allows for SSL in-flight encryption
What are the three cache patterns for ElastiCache?
Lazy Loading: Application checks for data in cache, if the data is not in cache it retrieves it from the database and then writes it to cache
Write-Through: All the write data is written to cache
Session Store: store temporary data in cache using TTL feature
These three are not mutually exclusive
You’re planning for a new solution that requires a MySQL database that must be available even in case of a disaster in one of the Availability Zones. What should you use?
Enable Multi-AZ
You have set up read replicas on your RDS database, but users are complaining that upon updating their social media posts, they do not see their updated posts right away. What is a possible cause for this?
Read Replicas have async replication, therefore it’s likely your users will miss the most up-to-date info because of eventual consistency
Which RDS (NOT Aurora) feature when used does not require you to change the SQL connection string: Multi-AZ or Read Replicas?
Multi-AZ, keeps the same connection string regardless of which database is up
An analytics application is currently performing its queries against your main production RDS database. These queries run at any time of the day and slow down the RDS database which impacts your users’ experience. What should you do to improve the users’ experience?
Setup read replicas. This will take the load off the main production DB
You would like to ensure you have a replica of your database available in another AWS Region if a disaster happens to your main AWS Region. Which database do you recommend to implement this easily?
Aurora Global Database. Multi-AZ won’t work because that is an AZ and not a region
How can you enhance the security of your ElastiCache Redis Cluster by forcing users to enter a password when they connect?
Use Redis Auth
You have migrated the MySQL database from on-premises to RDS. You have a lot of applications and developers interacting with your database. Each developer has an IAM user in the company’s AWS account. What is a suitable approach to give access to developers to the MySQL RDS DB instance instead of creating a DB user for each one?
IAM Database Authentication. This allows IAM users to use an authentication token to access the DB
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.html
Read replicas use ______ replication and Multi-AZ uses ______ replication
Async , Sync
How do you encrypt an unencrypted RDS DB instance
Create a snapshot of the unencrypted RDS DB instance, copy the snapshot with encryption enabled, then restore the RDS DB instance from the encrypted snapshot
What three DB engines are supported with IAM DB Authentication
MariaDB, MySQL and PostgresSQL
You have an un-encrypted RDS DB instance and you want to create Read Replicas. Can you configure the RDS Read Replicas to be encrypted?
No, you can not create encrypted read replicas if the RDS DB instance is unencrypted
An application running in production is using an Aurora Cluster as its database. Your development team would like to run a version of the application in a scaled-down application with the ability to perform some heavy workload on a need-basis. Most of the time, the application will be unused. Your CIO has tasked you with helping the team to achieve this while minimizing costs. What do you suggest?
Aurora Serverless. AWS will take care of the scaling for spiked workloads. Capacity is adjusted based on application demands
Great for variable workloads especially if they need intensive use
How many Aurora Read Replicas can you have in a single Aurora DB Cluster?
15
You work as a Solutions Architect for a gaming company. One of the games mandates that players are ranked in real-time based on their score. Your boss asked you to design then implement an effective and highly available solution to create a gaming leaderboard. What should you use?
Use ElastiCache for Redis - Sorted Sets
You need to store long-term backups for your Aurora database for disaster recovery and audit purposes. What do you recommend?
Perform On Demand Backups. Backups done automatically persist for as long as needed
You have 100 EC2 instances connected to your RDS database and you see that upon a maintenance of the database, all your applications take a lot of time to reconnect to RDS, due to poor application logic. How do you improve this?
Use an RDS proxy
What is Amazon Athena?
A serverless query service that uses SQL to analyze data stored in S3
What is Athena Federated Query?
It allows you to run queries on other AWS services and on-premise DB; the results can be stored back into S3
What is Amazon Redshift?
Redshift is a fully managed data warehouse, used for storing and analyzing large amounts of data from several locations
What is a data warehouse?
A data warehouse is a DB that is designed to analyze large amounts of data
True or False: Resdhift is used for OLTP
False. It is used for OLAP. OLTP is for transactions, like for an application.
OLAP is used for storing and processing
What is a Redshift cluster?
A Redshift cluster consists of a leader node and compute nodes. The leader node accepts the query, develops and execution plan. The compute node(s) then run the queries and the leader node accepts back the results
How can we move a Redshift cluster to another region?
We can manually or automatically create snapshots, copy those to a new region and create a new cluster from that snapshot
What are three ways we can get data into Redshift?
Kinesis
S3 COPY command (through internet or through VPC)
EC2 Instance through JDBC driver (need to write in batches)
What is Redshift Spectrum?
Query data that is already in S3 without loading it into S3; we must already have a Redshift Cluster available to start the query. The query is then submitted to thousands of Redshift Spectrum nodes
What is Amazon OpenSearch?
OpenSearch allows you to search massive amounts of data and retrieve relevant items
Why is OpenSearch not considered serverless?
It requires the creation of a cluster of instances
What is Amazon EMR?
Stands for Elastic MapReduce. Helps creating Hadoop clusters to analyze and process vast amounts of data (Big Data)
What is Amazon QuickSight?
Serverless machine learning-powered business intelligence service to create interactive dashboards
What is AWS Glue?
It is a managed extract, transform and load (ETL) service
What service would we use to convert data into Parquet format?
AWS Glue
What is AWS Lake Formation?
Data lake = central place to have all your data for analytics purpose
What is Kinesis Data Analytics?
Kinesis Data Analytics read from Kinesis Data Streams or Firehouse, apply SQL or Apache Flake to analyze data and send them to sinks
What is Amazon Rekognition?
Finds objects, people, text, scenes in images and videos using ML
What is Amazon Transcribe?
Automatically converts speech into text
What is AWS Polly?
Turns text into speech using ML
What is AWS Translate?
Translate localizes content using ML
What is Amazon Lex?
It is what powers Alexa. Gets speech recognition into text using NLP
Helps build chat bots or call center bots
What is Amazon Connect?
It is a virtual call center
What is AWS Comprehend?
Uses Natural Language Processing to find insights and relationships in text
What is the difference between Lex and Comprehend?
Comprehend is for analytics and insights, Lex is for conversations and interactions
What is AWS SageMaker?
It is a service to allow developers/data scientists build ML models
What is AWS Forecast?
Helps you to do predictive modeling and forecasting
How many storage nodes are there for Aurora?
6 nodes across 3 AZ. Do not confuse this with read replicas, which you can have up to 15 of them
How much can Aurora scale to?
128 teribytes
Give an overview of how Aurora is structured
Aurora helps manage MySQL and PostgreSQL DB engines. It provides a DB “cluster” that includes a Master DB instance, Read Replicas (instances) and a storage volume. The storage volume is six storage nodes across 3 AZs
The Master DB will control reading AND writing to the storage volume which spans multiple AZs. The Read Replicas will read from the storage volume
What is the Aurora-like NoSQL service?
DocumentDB
What is AWS fully managed graph db?
Amazon Neptune
We want managed serverless, scalable Apache Cassandra DB service. What should we use?
Amazon Keyspaces
What is Amazon Keyspaces generally used for?
IoT device info and time-series data
What is Amazon QLDB
Stands for Quantum Ledger Database. Used to view all the changes made to your application data over time. Data is immutable
What is Amazon Timestream?
A serverless time series database. Faster and less costly than a relational database
You are looking to perform Online Transaction Processing (OLTP). You would like to use a database that has built-in auto-scaling capabilities and provides you with the maximum number of replicas for its underlying storage. What AWS service do you recommend?
Aurora
As a Solutions Architect, a startup company asked you for help as they are working on an architecture for a social media website where users can be friends with each other, and like each other’s posts. The company plan on performing some complicated queries such as “What are the number of likes on the posts that have been posted by the friends of Mike?”. Which database do you recommend?
Neptune. A graph database service that makes it easy to build and run applications with highly connected datasets
A startup is working on developing a new project to reduce forest fires due to climate change. The startup is developing sensors that will be spread across the entire forest to make some readings such as temperature, humidity, and pressures which will help detect the forest fires before it happens. They are going to have thousands of sensors that are going to store a lot of readings each second. There is a requirement to store those readings and do fast analytics so they can predict if there is a fire. Which AWS service can they use to store those readings?
Timestream