AWS Services Flashcards
Amazon Athena
Analytics
Use SQL to query S3, save output to S3
Can use for preprocessing, feature engineering
Less performant than data warehouse, but more convenient
Amazon Elastic Map Reduce
EMR
Analytics
Distributed data processing
Massive parallel compute tasks
Single master node manages core nodes (scalable) which manage task nodes (scalable)
Apache Spark - fast analytics engine, can run on EMR or SageMaker
Amazon Kinesis (basic functionality and four instances)
Analytics
Ingesting large scale data, highly scalable
Amazon Kinesis Data Analytics
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Stream
Amazon QuickSight
Analytics
BI tool
reporting, visualize data
AWS Batch
Compute
Dynamically provision other AWS services for your batch job
EC2, fargate, spot instances, etc)
Amazon Elastic Cloud Compute
EC2
Compute
Scalable compute instances
Amazon machine image (AMI) - conda based containers w/ libraries and drivers
Instance types for ML: Compute optimized or accelerated computing (GPU)
GPUs: ml.p2
CPU recommended: ml.m4 or ml.c4
Amazon Elastic Container Registry
ECR
Containers
Managed container image registry
Amazon Elastic Container Service
ECS
Containers
Build and store container images
Amazon Elastic Kubernetes Service
EKS
Containers
Deploying and managing containers at scale
AWS Glue
Database
Data integration, ETL, S3 crawler to determine schema (called catalog)
Easy to setup/run with minimal effort
Python and Scala
Job Systems - managed infrastructure for ETL workflows
Crawlers and Classifiers - scan data, classify, extract schema info, store metadata
Data Catalog - store, annotate, and share metadata
ETL operations - auto generate ETL scripts based on metadata
Amazon Redshift
Database
Data warehouse
AWS IoT Greengrass
Internet of Things
Build, deploy, and manage
Control IoT fleet
AWS CloudTrail
Management and Governance
Tracks actions taken in AWS console
Amazon CloudWatch
Management and Governance
Track usage metrics
Amazon Virtual Private Cloud
VPC
Networking and Content Delivery
Manage virtual network
AWS Identity and Access Management
IAM
Security, Identity, and Compliance
control access to AWS resources
AWS Fargate
Serverless
Run containers without having to manually manage underlying resources
AWS Lambda
Serverless
run serverless code on high-availability compute infrastructure
Amazon Elastic File System
EFS
grows/shrinks as you add/delete files
mount on EC2 instances, lambda, or containers
Amazon Elastic Block Store
EBS
Storage
scalable, high-performance block storage
breaks data into blocks to store as separate pieces
best for static files that aren’t changing
Amazon FSx
Storage
high performance and throughput
fully managed Windows File Server
Amazon S3
Simple Storage Service
Storage
Store any data, structured, unstructured, anything.
data lake
Security: IAM users, bucket policies; encryption - server side, key management service
Amazon Mechanical Turk
workforce for labeling jobs
AWS Database Migration Service
can use to migrate from on prem
Elastic Inference Accelerator
attach to EC2 / SageMaker / Deep Learning Containers
accelerates deep learning inference workloads