Amazon SageMaker - Deep Dive Flashcards
What is Amazon SageMaker?
A fully managed machine learning service by AWS that enables developers and data scientists to build, train, tune, and deploy ML models at scale.
What are the three main steps in a SageMaker ML workflow?
1) Collect and prepare data, 2) Build and train models, 3) Deploy and monitor models.
What types of algorithms are built-in with SageMaker?
Supervised (e.g., Linear regression, KNN), Unsupervised (e.g., PCA, K-means), Anomaly detection, NLP, and Image processing.
What is AMT in SageMaker?
Automatic Model Tuning, which automatically optimizes hyperparameters to improve model performance.
What are the four deployment types in SageMaker?
Real-time, Serverless, Asynchronous, and Batch Transform.
What is Real-time inference in SageMaker?
A low-latency prediction service for small payloads (up to 6 MB) that responds instantly using a deployed endpoint.
What is Serverless inference in SageMaker?
A deployment option without infrastructure management that auto-scales with memory configuration; may have cold start latency.
What is Asynchronous inference in SageMaker?
Used for large payloads (up to 1 GB) and longer processing times (up to 1 hour); input/output handled via Amazon S3.
What is Batch Transform in SageMaker?
Used for processing entire datasets (multiple records) at once with high latency; supports concurrent large-scale predictions.
What is SageMaker Studio?
A web-based IDE for ML development that supports model building, training, tuning, deployment, and collaboration.
What is the benefit of AMT’s early stop condition?
It saves time and cost by halting underperforming tuning jobs automatically.
What is SageMaker Data Wrangler used for?
Preparing, transforming, and engineering features from tabular and image data for machine learning.
What types of data can you prepare with Data Wrangler?
Tabular and image data.
What are some key features of Data Wrangler?
Data selection, cleansing, exploration, visualization, transformation, and feature engineering.
Does Data Wrangler support SQL?
Yes, it supports SQL for transformations and queries.
What tool in Data Wrangler helps analyze data completeness and formatting?
The data quality tool.
Why is feature engineering important?
Because high-quality features directly impact the performance of machine learning models.
What is a common transformation in feature engineering?
Converting a birth date into age (a numerical value).
What is the SageMaker Feature Store used for?
Storing, sharing, and discovering machine learning features across datasets and teams.
Where are features from the Feature Store discoverable?
Within SageMaker Studio.
Why is having a centralized Feature Store beneficial?
It improves collaboration and enables reuse of high-quality features across datasets and projects.
Can a quick model be created in Data Wrangler?
Yes, to analyze how well the model might perform on the transformed data.
Is Data Wrangler part of SageMaker Studio?
Yes, it’s fully integrated within SageMaker Studio.
What is SageMaker Clarify used for?
Evaluating foundation models, detecting bias, and explaining model predictions.
How does SageMaker Clarify compare different models?
By using human evaluations on specific tasks with metrics like brand voice and relevance.
What kind of human evaluations can SageMaker Clarify support?
Evaluations of friendliness, humor, and other subjective attributes of model outputs.
Can you bring your own team for model evaluation in Clarify?
Yes, you can use an AWS-managed team or bring your own employees.
What is model explainability in SageMaker Clarify?
Understanding how and why a model makes predictions using interpretation tools.
Why is model explainability important?
To debug predictions, improve trust, and increase understanding of model behavior.
What is an example use case for explainability in Clarify?
Explaining why a loan application was rejected by identifying the most influential features.
What does SageMaker Clarify do about bias?
It detects and measures biases in datasets and models using statistical metrics.
What is a common type of bias SageMaker Clarify can detect?
Class imbalance or demographic representation issues like gender or age bias.
What is SageMaker Ground Truth used for?
Data labeling, human review, model customization, and alignment using RLHF.
What is an example of using Ground Truth for labeling?
Labeling images by identifying objects like dogs, cats, or ships with human annotators.
Who can perform labeling tasks in Ground Truth?
Employees, third-party reviewers, or Amazon Mechanical Turk workers.
What is SageMaker Ground Truth Plus?
An enhanced version of Ground Truth that uses an expert workforce for data labeling tasks.
What is the purpose of SageMaker Model Cards?
To gather essential model information like intended use, risk rating, and training details in one place.
What does SageMaker Model Dashboard provide?
A centralized view to monitor, explore, and track all SageMaker models with insights on quality, risk, and bias.
What is SageMaker Role Manager used for?
To define roles and permissions for different personas like data scientists or MLOps engineers.
What is the function of SageMaker Model Monitor?
To track the performance of deployed models and detect data or prediction quality deviations.
How often can Model Monitor evaluate model quality?
Continuously or on a schedule (e.g., daily, weekly).
What action should you take when Model Monitor alerts you?
Investigate and fix the issue by updating data or retraining the model.
What does SageMaker Model Registry provide?
A centralized model catalog for versioning, managing metadata, and controlling model approval workflows.
How can you manage model approval in Model Registry?
By setting an approval status and involving human reviewers before deployment.
What is SageMaker Pipelines?
A CI/CD workflow tool for automating machine learning model building, training, and deployment.
Why are SageMaker Pipelines useful in MLOps?
They automate workflows, reduce manual errors, ensure repeatability, and accelerate iteration.
What are the main step types in SageMaker Pipelines?
Processing, Training, Tuning, AutoML, Model, ClarifyCheck, and QualityCheck.
What does the ‘Processing’ step in a pipeline do?
Performs data preparation like feature engineering.
What does the ‘Training’ step in a pipeline do?
Trains the machine learning model on prepared data.
What is the ‘Tuning’ step used for in a pipeline?
Optimizes model performance through hyperparameter tuning.
What does the ‘AutoML’ step handle?
Automatically trains models with minimal configuration.
What is the ‘Model’ step used for?
Creates and registers a SageMaker model (optionally to the Model Registry).
What does the ‘ClarifyCheck’ step do?
Performs bias and explainability analysis using SageMaker Clarify.
What is the purpose of the ‘QualityCheck’ step?
Checks the data or model quality against a defined baseline.
What is the benefit of using a centralized Model Dashboard?
It allows quick insights and actions on models that violate performance or fairness thresholds.
What is the significance of versioning in Model Registry?
It enables tracking changes across different versions of models, ensuring reproducibility and control.
What is SageMaker JumpStart?
A machine learning hub in SageMaker to find and launch pre-trained models (NLP, CV, etc.) and ML solutions quickly.
What kind of models does JumpStart support?
Pre-trained models from providers like Hugging Face, Meta, Stability AI, Databricks, and more.
What are the two main components of SageMaker JumpStart?
The Machine Learning Hub (pre-trained models) and Machine Learning Solutions (pre-built use case templates).
What can you do with models in JumpStart?
You can launch them, customize with your own data, fine-tune, and deploy them on SageMaker.
What is SageMaker Canvas?
A no-code, visual interface for building ML models using your data.
What kind of users is SageMaker Canvas designed for?
Non-developers or business users who want to build ML models without coding.
What service powers SageMaker Canvas under the hood?
SageMaker Autopilot, which uses AutoML to build models.
What AWS AI services are integrated with Canvas?
Amazon Comprehend, Rekognition, and Textract.
What are some ready-to-use use cases in Canvas?
Sentiment analysis (Comprehend), object detection (Rekognition), document analysis (Textract).
What is MLFlow?
An open-source tool to manage the ML lifecycle: tracking experiments, models, and workflows.
Can you run MLFlow on SageMaker?
Yes, SageMaker allows you to launch an MLFlow Tracking Server directly in SageMaker Studio.
What is the purpose of MLFlow Tracking Server in SageMaker?
To manage experiments and track training runs as part of ML model development.
What is the benefit of MLFlow integration with SageMaker?
Seamless use of open-source ML lifecycle tools within the SageMaker ecosystem.
What kind of UI does SageMaker Canvas provide?
A drag-and-drop, no-code visual interface to simplify the ML pipeline.
What’s the difference between JumpStart and Canvas?
JumpStart helps you launch and customize pre-trained models; Canvas lets you build models visually without code.
What is Amazon SageMaker?
An end-to-end machine learning service to build, train, and deploy ML models at scale.
What is SageMaker Automatic Model Tuning used for?
To tune hyperparameters of ML models automatically.
What are the deployment and inference options in SageMaker?
Real-time, serverless, batch, and asynchronous inference.
What is SageMaker Studio?
A unified IDE for building, training, debugging, and deploying ML models end-to-end.
What is SageMaker Data Wrangler?
A tool to import, explore, process, and prepare data for ML.
What is SageMaker Feature Store?
A centralized repository to store and retrieve ML features across teams and pipelines.
What does SageMaker Clarify do?
Provides model comparison, bias detection, and model explainability.
What is SageMaker Ground Truth?
A tool for data labeling and reinforcement learning from human feedback (RLHF).
What is SageMaker Model Cards?
A documentation tool to describe a model’s intended use, risk, and training details.
What is SageMaker Model Dashboard?
A centralized place to track all deployed models and their performance metrics.
What does SageMaker Model Monitor do?
Monitors deployed models for data drift, quality issues, and sends alerts.
What is SageMaker Model Registry?
A catalog for managing, versioning, and approving ML models before deployment.
What is SageMaker Pipelines?
A CI/CD service for automating ML workflows including data prep, training, and deployment.
What is SageMaker Role Manager?
A service to manage access and permissions for different user roles in SageMaker.
What is SageMaker JumpStart?
A hub for pre-trained models and pre-built ML solutions for rapid prototyping.
What is SageMaker Canvas?
A no-code interface to build and deploy ML models visually without programming.
What is MLFlow on SageMaker?
An integration that lets you use open-source MLFlow to track experiments and manage ML lifecycles.
What is SageMaker Network Isolation mode?
A security feature that prevents outbound network access from training/inference containers to protect data and prevent leaks.
What happens when you enable Network Isolation in SageMaker?
Containers cannot access the internet, Amazon S3, or your VPC—only preloaded data is available for training.
What is the DeepAR algorithm in SageMaker used for?
Forecasting time series data.
What type of neural network does DeepAR use?
Recurrent Neural Network (RNN).