AWS Services Flashcards
Amazon Athena
Analytics
Use SQL to query S3, save output to S3
Can use for preprocessing, feature engineering
Less performant than data warehouse, but more convenient
Amazon Elastic Map Reduce
EMR
Analytics
Distributed data processing
Massive parallel compute tasks
Single master node manages core nodes (scalable) which manage task nodes (scalable)
Apache Spark - fast analytics engine, can run on EMR or SageMaker
Amazon Kinesis (basic functionality and four instances)
Analytics
Ingesting large scale data, highly scalable
Amazon Kinesis Data Analytics
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Stream
Amazon QuickSight
Analytics
BI tool
reporting, visualize data
AWS Batch
Compute
Dynamically provision other AWS services for your batch job
EC2, fargate, spot instances, etc)
Amazon Elastic Cloud Compute
EC2
Compute
Scalable compute instances
Amazon machine image (AMI) - conda based containers w/ libraries and drivers
Instance types for ML: Compute optimized or accelerated computing (GPU)
GPUs: ml.p2
CPU recommended: ml.m4 or ml.c4
Amazon Elastic Container Registry
ECR
Containers
Managed container image registry
Amazon Elastic Container Service
ECS
Containers
Build and store container images
Amazon Elastic Kubernetes Service
EKS
Containers
Deploying and managing containers at scale
AWS Glue
Database
Data integration, ETL, S3 crawler to determine schema (called catalog)
Easy to setup/run with minimal effort
Python and Scala
Job Systems - managed infrastructure for ETL workflows
Crawlers and Classifiers - scan data, classify, extract schema info, store metadata
Data Catalog - store, annotate, and share metadata
ETL operations - auto generate ETL scripts based on metadata
Amazon Redshift
Database
Data warehouse
AWS IoT Greengrass
Internet of Things
Build, deploy, and manage
Control IoT fleet
AWS CloudTrail
Management and Governance
Tracks actions taken in AWS console
Amazon CloudWatch
Management and Governance
Track usage metrics
Amazon Virtual Private Cloud
VPC
Networking and Content Delivery
Manage virtual network