Database & Analytics Flashcards
Databases
Is an organized collection of structured information, or data, typically stored electronically in a computer system.
• You build indexes to efficiently query / search through the data
• You define relationships between your datasets
Relational Databases
Is a collection of information that organizes data in predefined relationships where data is stored in one or more tables of columns and rows
• Can use the SQL language to perform queries / lookups
NoSQL Databases
• NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications
• Benefits: Flexibility, Scalability, High-performance, Highly functional
• Examples: Key-value, document, graph, in-memory, search databases
NoSQL data example: JSON
• JSON = JavaScript Object Notation
• JSON is a common form of data that fits into a NoSQL model
• Data can be nested
• Fields can change over time
Databases & Shared Responsibility on AWS
• AWS offers use to manage different databases
• Benefits include:
• Quick Provisioning, High Availability, Vertical and Horizontal Scaling
• Automated Backup & Restore, Operations, Upgrades
• Operating System Patching is handled by AWS
• Monitoring, alerting
AWS RDS
• RDS stands for Relational Database Service
• It’s a managed DB service for DB use SQL as a query language.
• It allows you to create databases in the cloud that are managed by AWS
• Postgres
• MySQL
• MariaDB
• Oracle
• Microsoft SQL Server
• Aurora (AWS Proprietary database)
Advantage over using RDS versus deploying
DB on EC2
• Automated provisioning, OS patching
• Continuous backups and restore to specific timestamp (Point in Time Restore)!
• Monitoring dashboards
• Read replicas for improved read performance
• Multi AZ setup for DR (Disaster Recovery)
• Maintenance windows for upgrades
• Scaling capability (vertical and horizontal)
• Storage backed by EBS (gp2 or io1)
Amazon Aurora
• Aurora is a proprietary technology from AWS (not open sourced)
• PostgreSQL and MySQL are both supported as Aurora DB
• Aurora is “AWS cloud optimized”, better performance than RDS
• Aurora storage automatically grows in increments of 10GB, up to 64 TB.
• Aurora costs more than RDS (20% more) – but is more efficient
RDS Deployments: Read Replicas, Multi-AZ
• Read Replicas:
• Scale the read workload of your DB
• Can create up to 5 Read Replicas
• Data is only written to the main DB
• Multi-AZ:
• Failover in case of AZ outage (high availability)
• Data is only read/written to the main database
• Can only have 1 other AZ as failover
RDS Deployments: Multi-Region
• Multi-Region (Read Replicas)
• Disaster recovery in case of region issue
• Local performance for global reads
• Replication cost
Amazon ElastiCache
• ElastiCache is to get managed Redis or Memcached
• Caches are in-memory databases with high performance, low latency
• Helps reduce load off databases for read intensive workloads
• You want to save the queries somewhere else,so that they’re very readily available.
DynamoDB
• Fully Managed Highly available with replication across 3 AZ
• NoSQL database /// Serverless
• Automatically scales up and down to adjust for capacity and maintain performance
• Millions of requests per seconds, 100s of TB of storage
• Single-digit millisecond latency – low latency retrieval
DynamoDB – type of data
• DynamoDB is a key/value database
DynamoDB Accelerator - DAX
• Fully Managed in-memory cache for
DynamoDB
• 10x performance improvement – single- digit millisecond latency to microseconds
latency
• Secure, highly scalable & highly available
DynamoDB – Global Tables
• Make a DynamoDB table accessible with low latency in multiple-regions
• Active-Active replication (read/write to any AWS Region)