Domain 1 Solutions Flashcards
Helps you set up a secure data lake and govern, secure, and globally share data for ML and analytics. Manages fine-grained access control on S3 and metadata in Glue Data Catalog with its own permissions model that augments IAM
Lake Formation
Preferred storage option
S3
Used to build, train, and deploy ML models
SageMaker
A file system service that speeds up training jobs by serving your S3 data to SageMaker at high speeds
FSx for Lustre
A training data source that directly launches training jobs from service w/out need for data movement for faster training start times
EFS
Block-level storage device that you can attach to your instances and use as you would use a physical hard drive
EBS
An ETL service to categorize, clean, enrich, and move data b/w various data stores that’s used for batch ingestions, automates data discovery
Glue
This batch ingestion service reads from historical data from source systems, such as relational database management systems, data warehouses, and NoSQL databases, at any desired interval
DMS
Batch ingestion service that automates various ETL tasks that involve complex workflows
Step Functions
Uses Kinesis Producer Library to write to Kinesis data stream
Kinesis Data Streams
Batch/compress data to generate incremental views and execute custom transformation logic using Lambda before delivering incremental view to S3
Kinesis Firehose
Easiest way to process/transform data streaming thru Kinesis Data Streams or Firehose using SQL and provides insights in near real-time from incremental streams before storing in S3
Kinesis Data Analytics
Used to ingest/analyze video/audio data
Kinesis Video Streams
A distributed data store optimized for ingesting and processing streaming data in real-time. Used to publish and subscribe to streams of records, effectively store streams of records in the order in which records were generated, and process streams of records in real time
Apache Kafka
Supports many instance types that have proportionally high CPU with increased network performance, which is well suited for HPC (high-performance computing) applications
EMR