RDS Proxy is a fully managed database proxy for RDS highly available across multiple AZ's increases scailability, faul tolerence and security reduced stresss on CPU/Memory control authentication method controls pool of connections to database

TTL (time to live) which lets you define when data can be deleted. Great for using DynamoDB like you would Redis for caching purposes allows you to add a timestamp on an item in the table to delete after TTL has expired No extra cost and does not use WCU/RCU (write capacity units / read capacity units)

Section 12: Databases and Analytics Flashcards by Adam Holloway

Relational vs Non-Relational databases

Relational
* SQL
* organised into tables, rows and columns
* ridig schema
* rules enforced in database
* usually verticially scalled
* supports complex queries and joins
* Amazon RDS, Orange, MySQL, PostgreSQL

Non-relational
* NoSQL
* varied data storage models
* flexible schema stored in key-value pairs, columns, documents or graphs
* rules can be defined in application code (outside of database)
* scales horiztonally
* unstructred, supports any kind of schema
* AWS DynamoDB, MongoDB, Redis, Neo4j

How well did you know this?

Not at all

Perfectly

AWS Relational Database Service (RDS)

Scales vertically, which means upgrading the EC2 instance (more CPU and RAM)
Is an OLTP type of database (Online Transaction Processing)
Horizontal scaling for queries (reads) can be done by creating a read replica. Meaning the is a RDS master and RDS read replica database. The master database syncs to the read replica.

How well did you know this?

Not at all

Perfectly

Relational Database Service (RDS) backups

Relational Database Service backups

Automated backups
* automated backups are retained for 0 to 35 days
* restore can be to any point in time during the retention period

Manual backups (snapshots)
* backs up entire DB instance, not just individual database
* snapshots do not expire

How well did you know this?

Not at all

Perfectly

What is Amazon Aurora

Amazon Aurora:
* database in the RDS family
* great in durability and scailability
* MySQL and PostgreSQL compatible
* built-in fault tolerence

How well did you know this?

Not at all

Perfectly

Aurora key features

Aurora key features:
* high performance and scailability
* supports MySQL and PostgreSQL
* aurora replicas: in-region read scaling and failover target (up to 15 replicas)
* global database: cross-reguib cluser with read scailing
* multi-master: scales out writes within a region
* serverless: on-demand, autoscaling config, does not support read replicas or public IP’s. Aurora Serverless is a seperate service to Aurora

How well did you know this?

Not at all

Perfectly

When to use Aurora Serverless

Use cases:
* inrequently used apps
* new apps
* variable workload
* unpredicatable workloads
* dev and test databases
* multi-tenant apps

How well did you know this?

Not at all

Perfectly

What is RDS Proxy?

RDS Proxy is a fully managed database proxy for RDS
highly available across multiple AZ’s
increases scailability, faul tolerence and security
reduced stresss on CPU/Memory
control authentication method
controls pool of connections to database

How well did you know this?

Not at all

Perfectly

What is Amazon ElastiCache

Fully managed implementation of Redis and Memcached
It is a key/value store
Can be put in front of databases such as RDS and DyanmoDB
ElastiCache runs on Amazon EC2 instances, so you must choose and instance family/type

How well did you know this?

Not at all

Perfectly

ElastiCache - Memcahced vs Redis

Redis:
* Data persistance
* Complex data types
* Partitioning (only in Cluster Mode)
* high availability
* NOT multi threaded

Memcached
* No data persistance
* Simple data types
* Partitioning
* Not high availability
* Multithreaded

How well did you know this?

Not at all

Perfectly

ElastiCache use cases

data that is relatively static and frequently accessed
apps that are tolerant of stale data
often used for storing session state (DynamoDB can also be used)

How well did you know this?

Not at all

Perfectly

What is Amazon DynamoDB?

NoSQL database service
key/value store and document store
non-relational, key-value type of database
fully serverless
autoscailing based on read/write capacity defined

How well did you know this?

Not at all

Perfectly

DynamoDB - TTL

TTL (time to live) which lets you define when data can be deleted. Great for using DynamoDB like you would Redis for caching purposes
allows you to add a timestamp on an item in the table to delete after TTL has expired
No extra cost and does not use WCU/RCU (write capacity units / read capacity units)

How well did you know this?

Not at all

Perfectly

What is DynamoDB Steams?

DynamoDB Streams:

Captures a time-ordered sequence of item-level modifications to any DynamoDB table and stores this information in a log for up to 24 hours

How well did you know this?

Not at all

Perfectly

What is DynamoDB Accelator (DAX)?

DAX is a fully managed, highly available, in-memory cache for DynamoDB
improved performance from milliseconds to microseconds (will help with latency etc)
used to improve read and write performance due to read-through and write-through cache

How well did you know this?

Not at all

Perfectly

What is DynamoDB Global Tables

DynamoDB Global Tables:
* multi-region, multi-active database
* DynamoDB databases async replication across regions (same data set)

How well did you know this?

Not at all

Perfectly

What is Amazon RedShift?

Study These Flashcards

Amazon Redshift:
* data warehouse
* use to analyse data using SQL and other Business Intelligence (BI) tools such as Amazon QuickSight, Tableau, Microsoft Power BI
* relation database
* used for OLAP (online analytical processing)
* uses EC2 instances
* keeps 3 copied of your day
* continuous and incremental backup

Uses cases for Amazon Redshift

Study These Flashcards

Amazon Redshift (data warehouse) use cases:
* perform** complex queries** on massive collections of structured and semi-structured data with fast performance
* use Redshift Spectrum for direct access of S3 objects in a data lake

What is Amazon Elastic Map Reduce (EMR)?

Study These Flashcards

Amazon Elatic Map Reduce is Amazon’s version of Hadoop
It is used for running big data frameworks such as Apache Hadoop and Apache Spark
used for processing data for analyics and business intelligance
can also be use for transforming and moving large amounts of data
performs extract, transform and load functions (ETL)

What is Amazon Kinesis?

Study These Flashcards

Amazon Kinesis:

Amazon Kinesis cost-effectively processes and analyzes streaming data at any scale as a fully managed service. With Kinesis, you can ingest real-time data, such as video, audio, application logs, website clickstreams, and IoT telemetry data, for machine learning (ML), analytics, and other applications.

What is Amazon Athena?

Study These Flashcards

Amazon Athena is an interactive query service that makes it simple to analyze data directly in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can choose to pay based on the queries you run or compute needed by your queries.

What is AWS Glue?

Study These Flashcards

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

AWS Glue provides both visual and code-based interfaces to make data integration easier. Users can more easily find and access data using the AWS Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can visually create, run, and monitor ETL workflows in a few steps in AWS Glue Studio.

What is Amazon OpenSearch Service (ElasticSearch)

Study These Flashcards

Search, visualise, and analyise text and unstrucutred data. Is is ElasticSearch, meaning you can use with Logstash and Kibana Dashboard (ELK stack)

Supports queries using SQL.

Amazon OpenSearch Service is a managed service that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open source, distributed search and analytics suite derived from Elasticsearch.

Study These Flashcards

Amazon OpenSearch (ElasticSearch) best practices

Study These Flashcards

deploy OpenSearch data instances across 3 Availability Zones
provision instances in multiples of 3
if 3 is not available, use 2 AZ’s with equal number of instances
configure at least 1 replica for each index
apply restrictive resource-based access policies to the domain (or use fin-grained access control)
create the domain within an Amazon VPC
for sentitiva data enable node-to-node encryption for encryption at rest

What is AWS Batch?

AWS Batch is a set of batch management capabilities that enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. A script such as shell script, executable or Docker container image is ran as the "batch job".

Other AWS databases

* DocumentDB = MongoDB. Document database for JSON data management * Amazon Keyspaces (for Apache Cassandra). Uses Cassandra Query Langauge (CQL) code * Amazon Neptune = graph database * Amazon Quantum Ledger Database = ledger database. Provides transparent, immutable (append-only, meaning can NOT be overwritten or deleted) and cryptographically verifiable transaction logo.

Other AWS analytics services

* Amazon Timestream = Amazon Timestream is a fast, scalable, and serverless time-series database service that makes it easier to store and analyze trillions of events per day up to 1,000 times faster * AWS Data Exchange = AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale. * AWS Data Pipeline = managed ETL (extract, transform, load) services. Data sources can be on-prem and can be processed and transformed. * AWS Lake Formation = data lake (structured, semi-structured and unstructured data). RedShift is a data warehouse (structured data) * Amazon Managed Streaming for Apache Kafka (MSK) = used for ingesting and processing data in real-time

Which DynamoDB feature integrates with AWS Lambda to automatically execute functions in response to table updates?

**DynamoDB Steams** DynamoDB Streams maintains a list of item level changes and can integrate with Lambda to create triggers.

An organization is migrating databases into the AWS Cloud. They require a managed service for their MySQL database and need automatic failover to a secondary database. Which solution should they use?

**Amazon RDS with Multi-AZ** RDS Multi-AZ does provide automatic failover to a secondary database.

How many PUT records per second does Amazon Kinesis Data Streams support?

**1000** Each shard can support up to 1000 PUT records per second.

Which Amazon Kinesis service stores data for later processing by applications?

**Amazon Kinesis Data Streams** Kinesis Data Streams stores data for later processing by applications.

You need to implement an in-memory caching layer in front of an Amazon RDS database. The caching layer should allow encryption and replication. Which solution meets these requirements?

**Amazon ElastiCache Redis** Redis provides encryption and replication.

A new application requires a database that can allow writes to DB instances in multiple availability zones with read after write consistency. Which solution meets these requirements?

**Amazon Aurora Multi-Master** Amazon Aurora Multi-Master adds the ability to scale out write performance across multiple Availability Zones and provides configurable read after write consistency.

An organization is migrating their relational databases to the AWS Cloud. They require full operating system access to install custom operational toolsets. Which AWS service should they use to host their databases?

**Amazon EC2** If you need to access the underlying operating system you must use Amazon EC2 for a relational database.

An existing Amazon RDS database needs to be encrypted. How can you enable encryption for an unencrypted Amazon RDS database?

**Take an encrypted snapshot of the DB instance and create a new database instance from the snapshot** You need to take an encrypted snapshot and then create a new database instance from the snapshot.

Which Amazon Kinesis service uses AWS Lambda to transform data?

**Amazon Kinesis Firehose** Kinesis Firehose can deliver data to Lambda for transformation.

How can you scale an Amazon Kinesis Data Stream that is reaching capacity?

**Add shards** You scale Kinesis by adding shards to a stream.

Cheat sheets

* DynamoDB - https://digitalcloud.training/amazon-dynamodb/ * ElastiCache - https://digitalcloud.training/amazon-elasticache/ * RedShift - https://digitalcloud.training/amazon-redshift/ * EMR - https://digitalcloud.training/amazon-emr/ * Kinesis - https://digitalcloud.training/amazon-kinesis/ * Athena - https://digitalcloud.training/amazon-athena/ * Glue - https://digitalcloud.training/aws-glue/ * RDS - https://digitalcloud.training/amazon-rds/ * Aurora - https://digitalcloud.training/amazon-aurora/

Section 12: Databases and Analytics Flashcards

(38 cards)