Databases and Analytics Flashcards
Amazon Relational Database Service (RDS)
- RDS uses EC2 instances, so you must choose an instance
family/type - Relational databases are known as Structured Query Language
(SQL) databases - RDS is an Online Transaction Processing (OLTP) type of database
- Easy to setup, highly available, fault tolerant, and scalable
RDS Encryption
- Can encrypt your Amazon RDS instances and snapshots at rest
- Encryption uses AWS Key Management Service (KMS)
RDS DB support types?
SQL Server, Oracle, MySQL Server, PostgreSQL, Aurora,
MariaDB
RDS scaling measures and DR?
- Scales up by increasing instance size (compute and storage)
- Read replicas option for read heavy workloads (scales out for
reads/queries only) - Disaster recovery with Multi-AZ option
Amazon Aurora
- Amazon Aurora is an AWS database offering in the RDS
family - Amazon Aurora is a MySQL and PostgreSQLcompatible relational database built for the cloud
- Amazon Aurora features a distributed, fault-tolerant, self healing storage system that auto-scales up to 128TB per database instance
Amazon DynamoDB
- Fully managed NoSQL database service
- Key/value store and document store
- It is a non-relational, key-value type of database
- Fully serverless service
- Push button scaling
Amazon DynamoDB features and benefits
Serverless - Fully managed, fault tolerant, service
Highly available - 99.99% availability SLA – 99.999% for Global Tables
NoSQL type of database with Name / Value
structure - Flexible schema, good for when data is not well structured or unpredictable
Horizontal scaling - Seamless scalability to any scale with push button scaling or Auto Scaling
DynamoDB Accelerator (DAX) - Fully managed in-memory cache for DynamoDB that increases performance (microsecond latency)
Backup - Point-in-time recovery down to the second in last 35 days; On-demand backup and restore
Global Tables - Fully managed multi-region, multi-master solution
Amazon RedShift
- RedShift is a SQL based data warehouse used for analytics
applications - RedShift is a relational database that is used for Online
Analytics Processing (OLAP) use cases - RedShift uses Amazon EC2 instances, so you must choose an
instance family/type - RedShift always keeps three copies of your data
- RedShift provides continuous/incremental backups
Amazon EMR
- Managed cluster platform that simplifies running big data
frameworks including Apache Hadoop and Apache Spark - Used for processing data for analytics and business
intelligence - Can also be used for transforming and moving large amounts
of data - Performs extract, transform, and load (ETL) functions
Amazon ElastiCache
- Fully managed implementations Redis and Memcached
- ElastiCache is a key/value store
- In-memory database offering high performance and low
latency - Can be put in front of databases such as RDS and DynamoDB
Amazon Athena
- Athena queries data in S3 using SQL
- Can be connected to other data sources with Lambda
- Data can be in CSV, TSV, JSON, Parquet and ORC formats
- Uses a managed Data Catalog (AWS Glue) to store
information and schemas about the databases and tables
AWS Glue
- Fully managed extract, transform and load (ETL) service
- Used for preparing data for analytics
- AWS Glue runs the ETL jobs on a fully managed, scale-out
Apache Spark environment - Works with data lakes (e.g. data on S3), data warehouses
(including RedShift), and data stores (including RDS or EC2
databases)
Amazon Kinesis Data Streams
- Producers send data which is stored in shards for up to 7
days - Consumers process the data and save to another service
Amazon Kinesis Data Firehose
- No shards, completely automated and elastically scalable
- Saves data directly to another service such as S3, Splunk,
RedShift, or Elasticsearch
Amazon Kinesis Data Analytics
- Provides real-time SQL processing for streaming data
AWS Data Pipeline
AWS Data Pipeline
* Processes and moves data between different AWS compute and
storage services
* Save results to services including S3, RDS, DynamoDB, and EMR
Amazon QuickSight
Amazon QuickSight
* Business intelligence (BI) service
* Create and publish interactive BI dashboards for Machine
Learning-powered insights
Amazon Neptune
Amazon Neptune
* Fully managed graph database service
Amazon DocumentDB
Amazon DocumentDB
* Fully managed document database service (non-relational)
* Supports MongoDB workloads
* Queries and indexes JSON data
Amazon QLDB
- Fully managed ledger database for immutable change history
- Provides cryptographically verifiable transaction logging
Amazon Managed Blockchain
- Fully managed service for joining public and private networks
using Hyperledger Fabric and Ethereum
AWS Migration Hub
- Provides a single location to track the progress of application
migrations across multiple AWS and partner solutions
AWS Database Migration Service (DMS)
- AWS Database Migration Service helps you migrate
databases to AWS quickly and securely. - The source database remains fully operational during the
migration, minimizing downtime to applications that rely on
the database
AWS Server Migration Service (SMS)
- Migrates servers and virtual machines to Amazon EC2
- Agentless service which makes it easier and faster for you to
migrate thousands of on-premises workloads to AWS - Automate, schedule, and track incremental replications of
live server volumes
AWS DataSync
- Online data transfer service
- Transfer data between on-premises and AWS storage
services
Snowball Family
- AWS Snowball and Snowmobile are used for migrating large volumes of data to AWS
Uses a secure storage device for physical transportation - Snowball (80TB) (50TB ) “petabyte scale”
- Snowball Edge (100TB) “petabyte scale”
- Snowmobile – “exabyte scale” with up to 100PB per
Snowmobile
Types
Snowball Edge Compute Optimized
Snowball Edge Storage Optimized
Snowcone
Snowball Edge Compute Optimized
- Provides block and object storage and optional GPU
- Edge computing use cases
Snowball Edge Storage Optimized
- Provides block storage and Amazon S3-compatible object storage
- Use for local storage and large-scale data transfer
Snowcone
- Small device used for edge computing, storage and data transfer
- Can transfer data offline or online with AWS DataSync agent
AWS Rekognition
- Add image and video analysis to your applications
- Identify objects, people, text, scenes, and activities in images
and videos
Amazon Transcribe
- Add speech to text capabilities to applications
- Recorded speech can be converted to text before it can be
used in applications
Amazon Translate
- Neural machine translation service that delivers fast, highquality, and affordable language translation
- Localize content such as websites and applications for your
diverse users
Amazon Sagemaket
- Helps data scientists and developers to prepare, build, train,
and deploy high-quality machine learning (ML) models
Amazon Comprehend
- Natural-language processing (NLP) service
- Uses machine learning to uncover information in unstructured
data
Amazon Lex
- Conversational AI for Chatbots
- Build conversational interfaces into any application using voice
and text
Amazon Polly
- Turns text into lifelike speech
- Create applications that talk, and build entirely new categories of
speech-enabled products
Amazon Workspaces
- Managed Desktop-as-a-Service (DaaS) solution
- Provision either Windows or Linux desktops
AWS AppStream 2.0
- Fully managed non-persistent application streaming service
- Alternative to popular products such as Citrix XenApp
AWS Worklink
- Provides secure, one-click access to your internal websites and
web apps using mobile phone browsers - Does not require VPN client or App
AWS WorkDocs
- Fully managed, secure content creation, storage, and
collaboration service - Create, edit, and share content that’s centrally stored on
AWS
AWS IoT Core
- Lets you connect IoT devices to the AWS cloud without the
need to provision or manage servers