Section 12: Databases and Analytics Flashcards

1
Q

Relational vs Non-Relational databases

A

Relational
* SQL
* organised into tables, rows and columns
* ridig schema
* rules enforced in database
* usually verticially scalled
* supports complex queries and joins
* Amazon RDS, Orange, MySQL, PostgreSQL

Non-relational
* NoSQL
* varied data storage models
* flexible schema stored in key-value pairs, columns, documents or graphs
* rules can be defined in application code (outside of database)
* scales horiztonally
* unstructred, supports any kind of schema
* AWS DynamoDB, MongoDB, Redis, Neo4j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

AWS Relational Database Service (RDS)

A
  • Scales vertically, which means upgrading the EC2 instance (more CPU and RAM)
  • Is an OLTP type of database (Online Transaction Processing)
  • Horizontal scaling for queries (reads) can be done by creating a read replica. Meaning the is a RDS master and RDS read replica database. The master database syncs to the read replica.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relational Database Service (RDS) backups

A

Relational Database Service backups

Automated backups
* automated backups are retained for 0 to 35 days
* restore can be to any point in time during the retention period

Manual backups (snapshots)
* backs up entire DB instance, not just individual database
* snapshots do not expire

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Amazon Aurora

A

Amazon Aurora:
* database in the RDS family
* great in durability and scailability
* MySQL and PostgreSQL compatible
* built-in fault tolerence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Aurora key features

A

Aurora key features:
* high performance and scailability
* supports MySQL and PostgreSQL
* aurora replicas: in-region read scaling and failover target (up to 15 replicas)
* global database: cross-reguib cluser with read scailing
* multi-master: scales out writes within a region
* serverless: on-demand, autoscaling config, does not support read replicas or public IP’s. Aurora Serverless is a seperate service to Aurora

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When to use Aurora Serverless

A

Use cases:
* inrequently used apps
* new apps
* variable workload
* unpredicatable workloads
* dev and test databases
* multi-tenant apps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is RDS Proxy?

A
  • RDS Proxy is a fully managed database proxy for RDS
  • highly available across multiple AZ’s
  • increases scailability, faul tolerence and security
  • reduced stresss on CPU/Memory
  • control authentication method
  • controls pool of connections to database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Amazon ElastiCache

A
  • Fully managed implementation of Redis and Memcached
  • It is a key/value store
  • Can be put in front of databases such as RDS and DyanmoDB
  • ElastiCache runs on Amazon EC2 instances, so you must choose and instance family/type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ElastiCache - Memcahced vs Redis

A

Redis:
* Data persistance
* Complex data types
* Partitioning (only in Cluster Mode)
* high availability
* NOT multi threaded

Memcached
* No data persistance
* Simple data types
* Partitioning
* Not high availability
* Multithreaded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ElastiCache use cases

A
  • data that is relatively static and frequently accessed
  • apps that are tolerant of stale data
  • often used for storing session state (DynamoDB can also be used)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Amazon DynamoDB?

A
  • NoSQL database service
  • key/value store and document store
  • non-relational, key-value type of database
  • fully serverless
  • autoscailing based on read/write capacity defined
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DynamoDB - TTL

A
  • TTL (time to live) which lets you define when data can be deleted. Great for using DynamoDB like you would Redis for caching purposes
  • allows you to add a timestamp on an item in the table to delete after TTL has expired
  • No extra cost and does not use WCU/RCU (write capacity units / read capacity units)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is DynamoDB Steams?

A

DynamoDB Streams:

Captures a time-ordered sequence of item-level modifications to any DynamoDB table and stores this information in a log for up to 24 hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is DynamoDB Accelator (DAX)?

A
  • DAX is a fully managed, highly available, in-memory cache for DynamoDB
  • improved performance from milliseconds to microseconds (will help with latency etc)
  • used to improve read and write performance due to read-through and write-through cache
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is DynamoDB Global Tables

A

DynamoDB Global Tables:
* multi-region, multi-active database
* DynamoDB databases async replication across regions (same data set)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Amazon RedShift?

A

Amazon Redshift:
* data warehouse
* use to analyse data using SQL and other Business Intelligence (BI) tools such as Amazon QuickSight, Tableau, Microsoft Power BI
* relation database
* used for OLAP (online analytical processing)
* uses EC2 instances
* keeps 3 copied of your day
* continuous and incremental backup

17
Q

Uses cases for Amazon Redshift

A

Amazon Redshift (data warehouse) use cases:
* perform** complex queries** on massive collections of structured and semi-structured data with fast performance
* use Redshift Spectrum for direct access of S3 objects in a data lake

18
Q

What is Amazon Elastic Map Reduce (EMR)?

A
  • Amazon Elatic Map Reduce is Amazon’s version of Hadoop
  • It is used for running big data frameworks such as Apache Hadoop and Apache Spark
  • used for processing data for analyics and business intelligance
  • can also be use for transforming and moving large amounts of data
  • performs extract, transform and load functions (ETL)
19
Q

What is Amazon Kinesis?

A

Amazon Kinesis:

Amazon Kinesis cost-effectively processes and analyzes streaming data at any scale as a fully managed service. With Kinesis, you can ingest real-time data, such as video, audio, application logs, website clickstreams, and IoT telemetry data, for machine learning (ML), analytics, and other applications.

20
Q

What is Amazon Athena?

A

Amazon Athena is an interactive query service that makes it simple to analyze data directly in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to setup or manage, and you can choose to pay based on the queries you run or compute needed by your queries.

21
Q

What is AWS Glue?

A

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

AWS Glue provides both visual and code-based interfaces to make data integration easier. Users can more easily find and access data using the AWS Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can visually create, run, and monitor ETL workflows in a few steps in AWS Glue Studio.

22
Q

What is Amazon OpenSearch Service (ElasticSearch)

A

Search, visualise, and analyise text and unstrucutred data. Is is ElasticSearch, meaning you can use with Logstash and Kibana Dashboard (ELK stack)

Supports queries using SQL.

Amazon OpenSearch Service is a managed service that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open source, distributed search and analytics suite derived from Elasticsearch.

23
Q
A
24
Q

Amazon OpenSearch (ElasticSearch) best practices

A
  • deploy OpenSearch data instances across 3 Availability Zones
  • provision instances in multiples of 3
  • if 3 is not available, use 2 AZ’s with equal number of instances
  • configure at least 1 replica for each index
  • apply restrictive resource-based access policies to the domain (or use fin-grained access control)
  • create the domain within an Amazon VPC
  • for sentitiva data enable node-to-node encryption for encryption at rest
25
Q

What is AWS Batch?

A

AWS Batch is a set of batch management capabilities that enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.

A script such as shell script, executable or Docker container image is ran as the “batch job”.

26
Q

Other AWS databases

A
  • DocumentDB = MongoDB. Document database for JSON data management
  • Amazon Keyspaces (for Apache Cassandra). Uses Cassandra Query Langauge (CQL) code
  • Amazon Neptune = graph database
  • Amazon Quantum Ledger Database = ledger database. Provides transparent, immutable (append-only, meaning can NOT be overwritten or deleted) and cryptographically verifiable transaction logo.
27
Q

Other AWS analytics services

A
  • Amazon Timestream = Amazon Timestream is a fast, scalable, and serverless time-series database service that makes it easier to store and analyze trillions of events per day up to 1,000 times faster
  • AWS Data Exchange = AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
  • AWS Data Pipeline = managed ETL (extract, transform, load) services. Data sources can be on-prem and can be processed and transformed.
  • AWS Lake Formation = data lake (structured, semi-structured and unstructured data). RedShift is a data warehouse (structured data)
  • Amazon Managed Streaming for Apache Kafka (MSK) = used for ingesting and processing data in real-time
28
Q

Which DynamoDB feature integrates with AWS Lambda to automatically execute functions in response to table updates?

A

DynamoDB Steams

DynamoDB Streams maintains a list of item level changes and can integrate with Lambda to create triggers.

29
Q

An organization is migrating databases into the AWS Cloud. They require a managed service for their MySQL database and need automatic failover to a secondary database. Which solution should they use?

A

Amazon RDS with Multi-AZ

RDS Multi-AZ does provide automatic failover to a secondary database.

30
Q

How many PUT records per second does Amazon Kinesis Data Streams support?

A

1000

Each shard can support up to 1000 PUT records per second.

31
Q

Which Amazon Kinesis service stores data for later processing by applications?

A

Amazon Kinesis Data Streams

Kinesis Data Streams stores data for later processing by applications.

32
Q

You need to implement an in-memory caching layer in front of an Amazon RDS database. The caching layer should allow encryption and replication. Which solution meets these requirements?

A

Amazon ElastiCache Redis

Redis provides encryption and replication.

33
Q

A new application requires a database that can allow writes to DB instances in multiple availability zones with read after write consistency. Which solution meets these requirements?

A

Amazon Aurora Multi-Master

Amazon Aurora Multi-Master adds the ability to scale out write performance across multiple Availability Zones and provides configurable read after write consistency.

34
Q

An organization is migrating their relational databases to the AWS Cloud. They require full operating system access to install custom operational toolsets. Which AWS service should they use to host their databases?

A

Amazon EC2

If you need to access the underlying operating system you must use Amazon EC2 for a relational database.

35
Q

An existing Amazon RDS database needs to be encrypted. How can you enable encryption for an unencrypted Amazon RDS database?

A

Take an encrypted snapshot of the DB instance and create a new database instance from the snapshot

You need to take an encrypted snapshot and then create a new database instance from the snapshot.

36
Q

Which Amazon Kinesis service uses AWS Lambda to transform data?

A

Amazon Kinesis Firehose

Kinesis Firehose can deliver data to Lambda for transformation.

37
Q

How can you scale an Amazon Kinesis Data Stream that is reaching capacity?

A

Add shards

You scale Kinesis by adding shards to a stream.

38
Q

Cheat sheets

A
  • DynamoDB - https://digitalcloud.training/amazon-dynamodb/
  • ElastiCache - https://digitalcloud.training/amazon-elasticache/
  • RedShift - https://digitalcloud.training/amazon-redshift/
  • EMR - https://digitalcloud.training/amazon-emr/
  • Kinesis - https://digitalcloud.training/amazon-kinesis/
  • Athena - https://digitalcloud.training/amazon-athena/
  • Glue - https://digitalcloud.training/aws-glue/
  • RDS - https://digitalcloud.training/amazon-rds/
  • Aurora - https://digitalcloud.training/amazon-aurora/