Database & Analytics Flashcards

1
Q

Databases

A

Is an organized collection of structured information, or data, typically stored electronically in a computer system.

• You build indexes to efficiently query / search through the data
• You define relationships between your datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relational Databases

A

Is a collection of information that organizes data in predefined relationships where data is stored in one or more tables of columns and rows

• Can use the SQL language to perform queries / lookups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

NoSQL Databases

A

• NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications
• Benefits: Flexibility, Scalability, High-performance, Highly functional
• Examples: Key-value, document, graph, in-memory, search databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NoSQL data example: JSON

A

• JSON = JavaScript Object Notation
• JSON is a common form of data that fits into a NoSQL model
• Data can be nested
• Fields can change over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Databases & Shared Responsibility on AWS

A

• AWS offers use to manage different databases
• Benefits include:
• Quick Provisioning, High Availability, Vertical and Horizontal Scaling
• Automated Backup & Restore, Operations, Upgrades
• Operating System Patching is handled by AWS
• Monitoring, alerting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

AWS RDS

A

• RDS stands for Relational Database Service
• It’s a managed DB service for DB use SQL as a query language.
• It allows you to create databases in the cloud that are managed by AWS
• Postgres
• MySQL
• MariaDB
• Oracle
• Microsoft SQL Server
• Aurora (AWS Proprietary database)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantage over using RDS versus deploying
DB on EC2

A

• Automated provisioning, OS patching
• Continuous backups and restore to specific timestamp (Point in Time Restore)!
• Monitoring dashboards
• Read replicas for improved read performance
• Multi AZ setup for DR (Disaster Recovery)
• Maintenance windows for upgrades
• Scaling capability (vertical and horizontal)
• Storage backed by EBS (gp2 or io1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Amazon Aurora

A

• Aurora is a proprietary technology from AWS (not open sourced)
• PostgreSQL and MySQL are both supported as Aurora DB
• Aurora is “AWS cloud optimized”, better performance than RDS
• Aurora storage automatically grows in increments of 10GB, up to 64 TB.
• Aurora costs more than RDS (20% more) – but is more efficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RDS Deployments: Read Replicas, Multi-AZ

A

• Read Replicas:
• Scale the read workload of your DB
• Can create up to 5 Read Replicas
• Data is only written to the main DB

• Multi-AZ:
• Failover in case of AZ outage (high availability)
• Data is only read/written to the main database
• Can only have 1 other AZ as failover

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RDS Deployments: Multi-Region

A

• Multi-Region (Read Replicas)
• Disaster recovery in case of region issue
• Local performance for global reads
• Replication cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amazon ElastiCache

A

• ElastiCache is to get managed Redis or Memcached
• Caches are in-memory databases with high performance, low latency
• Helps reduce load off databases for read intensive workloads
• You want to save the queries somewhere else,so that they’re very readily available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DynamoDB

A

• Fully Managed Highly available with replication across 3 AZ
• NoSQL database /// Serverless
• Automatically scales up and down to adjust for capacity and maintain performance
• Millions of requests per seconds, 100s of TB of storage
• Single-digit millisecond latency – low latency retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DynamoDB – type of data

A

• DynamoDB is a key/value database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DynamoDB Accelerator - DAX

A

• Fully Managed in-memory cache for
DynamoDB
• 10x performance improvement – single- digit millisecond latency to microseconds
latency
• Secure, highly scalable & highly available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DynamoDB – Global Tables

A

• Make a DynamoDB table accessible with low latency in multiple-regions
• Active-Active replication (read/write to any AWS Region)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Redshift

A

• Relational database
• Redshift is based on PostgreSQL, but it’s not used for OLTP
• It’s OLAP – online analytical processing (analytics and data warehousing)
Columnar storage of data (instead of row based)
• Load data once every hour, not every second

17
Q

Amazon EMR “Elastic MapReduce”

A

• EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
• The clusters can be made of hundreds of EC2 instances
• Use cases: data processing, machine learning, web indexing, big data

18
Q

Amazon Athena

A

• Serverless query service to analyze data stored in Amazon S3
• Uses standard SQL language
• Use cases: Business intelligence / analytics / reporting, analyze

• Exam Tip: analyze data in S3 using serverless SQL, use Athena

19
Q

Amazon QuickSight

A

• Serverless machine that allows you to create dashboards on your databases so we can visually represent your data and show your business users the insights they’re looking for
• Fast, automatically scalable, embeddable, with per-session pricing
• Use cases: • Business analytics • Building visualization

20
Q

DocumentDB

A

• DocumentDB is the same for MongoDB (which is a NoSQL database)
• MongoDB is used to store, query, and index JSON data
• Fully Managed, highly available with replication across 3 AZ
• Aurora storage automatically grows in increments of 10GB, up to 64 TB.
• Automatically scales to workloads with millions of requests per seconds

21
Q

Amazon Neptune

A

• Fully managed graph database
• A popular graph dataset would be a social network
• Highly available across 3 AZ, with up to 15 read replicas
• Build and run applications working with highly connected datasets
• Can store up to billions of relations

22
Q

Amazon QLDB

A

• QLDB stands for ”Quantum Ledger Database”
• Centralized component
• A ledger is a book recording financial transactions
• Fully Managed, Serverless, High available, Replication across 3 AZ
• Used to review history of all the changes made to your application data over time
• NoSQL

23
Q

Amazon Managed Blockchain

A

• Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority
• Amazon Managed Blockchain is a managed service to: • Join public blockchain networks
• Or create your own scalable private network

24
Q

AWS Glue

A

• Fully serverless service
• Managed extract, transform, and load (ETL) service
• Useful to prepare and transform data for analytics

• Glue Data Catalog: catalog of datasets

25
Q

DMS – Database Migration Service

A

• Quickly and securely migrate databases
to AWS, resilient, self healing
• The source database remains available
during the migration

26
Q

Difference between relational and non relational db

A

The difference between DynamoDB and, say, RDS is that DynamoDB will have all the data living within one single table, and there’s no way to join it with another table.