AWS Databases Flashcards
Relational Data
Organized by tables, rows, and columns Rigid schema (SQL) structured query language (SQL) Rules enforced within database Typically scaled vertically supports complex queries and joins Amazon RDS, Oracle, MySQL, IBM DB2, PostgreSQL
Non-Relational Data
Varied data storage models
flexible schema (NoSQL) - data stored in key-value, pairs, columns, documents, or graphs
Rules can be defined in application code (outside database)
Scales horizontally - scales seamlessly
Unstructured, simple language that supports any kind of schema
Amazon DynamoDB, MongoDB, Redis, Neo4j
Operational/Transactional Data
Online Transaction Processing (OLTP)
Production DBs that process transactions (eg adding customer records, checking stock availability (INSERT, UPDATE, DELETE))
Short transactions and simple queries
Relational examples - Amazon RDS, Oracle, IBM DB2, MySQL
Non-relational examples - MongoDB, Cassandra, Neo4j, HBase
Analytical
Online Analytics Processing (OLAP) - the source data comes from OLTP DBs
Data warehouse - typically separated from the customer facing DBs. Data is extracted for decision making
Long transactions and complex queries
Relational examples - Amazon RedShift, Teradata, HP Vertica
Non-relational examples - Amazon EMR, MapReduce
Databases on AWS:
Database on EC2
need full control over instance and database
third-party database engine (not available in RDS, or relational database service)
Databases on AWS:
Amazon Relational Database Service (RDS)
Need traditional relational database
eg - Oracle, PostgreSQL, Microsoft SQL, MariaDB
Data is well-formed and structured
managed service that makes it easy to set up, operate, and scale a relational database in the cloud
RDS uses EC2 instances, so you must choose an instance family/type
Relational databases = Structured Query Language (SQL) databases
RDS = Online Transaction Processing (OTLP) database
Easy to set up, highly available, fault tolerant, and scalable
Use cases - online stores and banking systems
Can encrypt your Amazon RDS instances and snapshots at rest by enabling the encryption option for your Amazon RDS DB instance
Encryption uses AWS Key Management Service (KMS)
Supports these database engines: SQL Server, Oracle, MySQL Server, PostgreSQL, Aurora, MariaDB
Scalability - can only be scaled up by increasing instance size (compute and storage)
Fault tolerance / disaster recovery with Multi-AZ option
Automatic failover for Multi-AZ option
Read replicas option for read heavy workloads
Amazon DynamoDB
NoSqL database
In-memory performance
High I/O needs
Dynamic scaling - horizontal scaling
fully managed noSQL database service that provides fast and predictable performance with seamless scalability
push button scaling means that you can scale the DB at any time w/o incurring downtime
data is synchronously replicated across 3 facilities (AZs) in a region
Amazon DynamoDB global tables provides a fully managed solution for deploying a multi-region, multi-master database
DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to 10x performance improvement
Benefits:
- Serverless - fully managed, fault tolerant, service
- Highly available - 99.99% availability SLA
- noSQL type of database w/ Name/Value structure - flexible schema, good for when data is not well structured or unpredictable
- Horizontal scaling - seamless scalability to any scale with push button scaling or Auto Scaling
- DynamoDB Accelerator (DAX) - fully managed in-memory cache for DynamoDB that increases performance (microsecond latency)
- Backup - point-in-line recovery down to the second in the last 35 days; On-demand backup and restore
Amazon RedShift
Data warehouse for large volumes of aggregated data
fast, fully managed warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and existing Business Intelligence (BI) tools
a SQL based data warehouse used for analytics applications
a relational database that is used for Online Analytics Processing (OLAP) use cases
uses Amazon EC2 instances, so you must choose an instance family/type
uses columnar data storage
always keeps 3 copies of your data
provides continuous/incremental backups
Amazon ElastiCache
fast temporary storage for small data amounts
a web service that makes it easy to deploy and run Memcached or Redis protocol-compliant server nodes in the cloud
the in-memory caching provided by ElastiCache can be used to significantly improve latency and throughput for many read-heavy application workloads or compute-intensive workloads
ElastiCache nodes run on Amazon EC2 instances, so you must choose an instance/family
Use Cases
- Web Session store - in cases with load-balanced web servers, store web session information in Redis so if a server is lost, the session info is not lost, and another web server can pick it up
- Database caching - use Memcached in front of AWS RDS to cache popular queries to offload work from RDS and return results faster to users
- Leaderboards - Use Redis to provide a live leaderboard for millions of users of your mobile app
- Streaming data dashboards - provide a landing spot for streaming sensor data on the factory floor, providing live real-time dashboard displays
Amazon Aurora
AWS database in the RDS family
MySQL- and PostgreSQL-compatible relational database built for the cloud
5x faster than standard MySQL databases and 3x faster than standard PostgreSQL databases
features are distributed, fault-tolerant, self-healing storage system that auto-scales up to 64TB per database instance
Memcached
type of ElastiCache engine (2 types)
simplest model, can run large nodes with multiple cores/threads, can be scaled in and out, can cache objects such as DBs
simple, no-frills
you need to elasticity (scale out and in)
you need to run multiple CPU cores and threads
you need to cache objects (e.g. database queries)
Redis
type of ElastiCache engine (2 types)
complex model, supports encryption, master/slave replication, cross AZ (HA), automatic failover and backup/restore
You need encryption you need HIPAA compliance support for clustering you need complex data types you need high availability (replication) Pub/Sub capability