AWS Certified Machine Learning - Specialty (MLS-C01) Flashcards
NOTE: You can use S3 while training ML models with Sagemaker. S3 is integrated with Sagemaker to store training data and training output.
Amazon FSx for Lustre
Use when training data already in S3 and plan on run training jobs several times during different algos and parameters.
Speeds up tainkngnjobs by serving S3 data to Sagemaker at high speeds by copying data.
NOTE: If training data is already in AWS EFS then recommends using that as training data source.
EFS has benefit of directly launching training jobs without need for data movement, resulting in faster training start times.
Amazon Kinesis
Recommended for ingesting fast moving / real-time data. Allows you to build custom streaming data applications for specialized needs
Amazon Elastic MapReduce (EMR)
Provided a managed framework that can process massive quantities of data.
AWS Athena
AWS Athena is a powerful, serverless query service that enables you to analyze data directly in Amazon S3 using standard SQL without complex ETL processes or infrastructure management.
AWS Glue
AWS Glue is a fully managed ETL service that makes preparing and loading your data for analytics easy. It provides a serverless environment to create, run, and monitor ETL jobs.
AWS Ground Truth
Amazon SageMaker Ground Truth helps you build highly accurate training datasets for machine learning quickly.
SageMaker Ground Truth offers easy access to public and private human labelers and provides them with built-in workflows and interfaces for common labeling tasks.
AWS Mechanical Turk
Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace that makes it easier for individuals and businesses to outsource their processes and jobs to a distributed workforce who can perform these tasks virtually. This could include anything from conducting simple data validation and research to more subjective tasks like survey participation, content moderation, and more.