Databases and Analytics Flashcards
Amazon Relational Database Service (RDS)
- RDS uses EC2 instances, so you must choose an instance
family/type - Relational databases are known as Structured Query Language
(SQL) databases - RDS is an Online Transaction Processing (OLTP) type of database
- Easy to setup, highly available, fault tolerant, and scalable
RDS Encryption
- Can encrypt your Amazon RDS instances and snapshots at rest
- Encryption uses AWS Key Management Service (KMS)
RDS DB support types?
SQL Server, Oracle, MySQL Server, PostgreSQL, Aurora,
MariaDB
RDS scaling measures and DR?
- Scales up by increasing instance size (compute and storage)
- Read replicas option for read heavy workloads (scales out for
reads/queries only) - Disaster recovery with Multi-AZ option
Amazon Aurora
- Amazon Aurora is an AWS database offering in the RDS
family - Amazon Aurora is a MySQL and PostgreSQLcompatible relational database built for the cloud
- Amazon Aurora features a distributed, fault-tolerant, self healing storage system that auto-scales up to 128TB per database instance
Amazon DynamoDB
- Fully managed NoSQL database service
- Key/value store and document store
- It is a non-relational, key-value type of database
- Fully serverless service
- Push button scaling
Amazon DynamoDB features and benefits
Serverless - Fully managed, fault tolerant, service
Highly available - 99.99% availability SLA – 99.999% for Global Tables
NoSQL type of database with Name / Value
structure - Flexible schema, good for when data is not well structured or unpredictable
Horizontal scaling - Seamless scalability to any scale with push button scaling or Auto Scaling
DynamoDB Accelerator (DAX) - Fully managed in-memory cache for DynamoDB that increases performance (microsecond latency)
Backup - Point-in-time recovery down to the second in last 35 days; On-demand backup and restore
Global Tables - Fully managed multi-region, multi-master solution
Amazon RedShift
- RedShift is a SQL based data warehouse used for analytics
applications - RedShift is a relational database that is used for Online
Analytics Processing (OLAP) use cases - RedShift uses Amazon EC2 instances, so you must choose an
instance family/type - RedShift always keeps three copies of your data
- RedShift provides continuous/incremental backups
Amazon EMR
- Managed cluster platform that simplifies running big data
frameworks including Apache Hadoop and Apache Spark - Used for processing data for analytics and business
intelligence - Can also be used for transforming and moving large amounts
of data - Performs extract, transform, and load (ETL) functions
Amazon ElastiCache
- Fully managed implementations Redis and Memcached
- ElastiCache is a key/value store
- In-memory database offering high performance and low
latency - Can be put in front of databases such as RDS and DynamoDB
Amazon Athena
- Athena queries data in S3 using SQL
- Can be connected to other data sources with Lambda
- Data can be in CSV, TSV, JSON, Parquet and ORC formats
- Uses a managed Data Catalog (AWS Glue) to store
information and schemas about the databases and tables
AWS Glue
- Fully managed extract, transform and load (ETL) service
- Used for preparing data for analytics
- AWS Glue runs the ETL jobs on a fully managed, scale-out
Apache Spark environment - Works with data lakes (e.g. data on S3), data warehouses
(including RedShift), and data stores (including RDS or EC2
databases)
Amazon Kinesis Data Streams
- Producers send data which is stored in shards for up to 7
days - Consumers process the data and save to another service
Amazon Kinesis Data Firehose
- No shards, completely automated and elastically scalable
- Saves data directly to another service such as S3, Splunk,
RedShift, or Elasticsearch
Amazon Kinesis Data Analytics
- Provides real-time SQL processing for streaming data
AWS Data Pipeline
AWS Data Pipeline
* Processes and moves data between different AWS compute and
storage services
* Save results to services including S3, RDS, DynamoDB, and EMR