Amazon SageMaker - Deep Dive Flashcards

Question 1

Q

What is Amazon SageMaker?

Answer

A

A fully managed machine learning service by AWS that enables developers and data scientists to build, train, tune, and deploy ML models at scale.

Question 2

Q

What are the three main steps in a SageMaker ML workflow?

Answer

A

1) Collect and prepare data, 2) Build and train models, 3) Deploy and monitor models.

Question 3

Q

What types of algorithms are built-in with SageMaker?

Answer

A

Supervised (e.g., Linear regression, KNN), Unsupervised (e.g., PCA, K-means), Anomaly detection, NLP, and Image processing.

Question 4

Q

What is AMT in SageMaker?

Answer

A

Automatic Model Tuning, which automatically optimizes hyperparameters to improve model performance.

Question 5

Q

What are the four deployment types in SageMaker?

Answer

A

Real-time, Serverless, Asynchronous, and Batch Transform.

Question 6

Q

What is Real-time inference in SageMaker?

Answer

A

A low-latency prediction service for small payloads (up to 6 MB) that responds instantly using a deployed endpoint.

Question 7

Q

What is Serverless inference in SageMaker?

Answer

A

A deployment option without infrastructure management that auto-scales with memory configuration; may have cold start latency.

Question 8

Q

What is Asynchronous inference in SageMaker?

Answer

A

Used for large payloads (up to 1 GB) and longer processing times (up to 1 hour); input/output handled via Amazon S3.

Question 9

Q

What is Batch Transform in SageMaker?

Answer

A

Used for processing entire datasets (multiple records) at once with high latency; supports concurrent large-scale predictions.

Question 10

Q

What is SageMaker Studio?

Answer

A

A web-based IDE for ML development that supports model building, training, tuning, deployment, and collaboration.

Question 11

Q

What is the benefit of AMT’s early stop condition?

Answer

A

It saves time and cost by halting underperforming tuning jobs automatically.

Question 12

Q

What is SageMaker Data Wrangler used for?

Answer

A

Preparing, transforming, and engineering features from tabular and image data for machine learning.

Question 13

Q

What types of data can you prepare with Data Wrangler?

Answer

A

Tabular and image data.

Question 14

Q

What are some key features of Data Wrangler?

Answer

A

Data selection, cleansing, exploration, visualization, transformation, and feature engineering.

Question 15

Q

Does Data Wrangler support SQL?

Answer

A

Yes, it supports SQL for transformations and queries.

Question 16

Q

What tool in Data Wrangler helps analyze data completeness and formatting?

Answer

A

The data quality tool.

Question 17

Q

Why is feature engineering important?

Answer

A

Because high-quality features directly impact the performance of machine learning models.

Question 18

Q

What is a common transformation in feature engineering?

Answer

A

Converting a birth date into age (a numerical value).

Question 19

Q

What is the SageMaker Feature Store used for?

Answer

A

Storing, sharing, and discovering machine learning features across datasets and teams.

Question 20

Q

Where are features from the Feature Store discoverable?

Answer

A

Within SageMaker Studio.

Question 21

Q

Why is having a centralized Feature Store beneficial?

Answer

A

It improves collaboration and enables reuse of high-quality features across datasets and projects.

Question 22

Q

Can a quick model be created in Data Wrangler?

Answer

A

Yes, to analyze how well the model might perform on the transformed data.

Question 23

Q

Is Data Wrangler part of SageMaker Studio?

Answer

A

Yes, it’s fully integrated within SageMaker Studio.

Question 24

Q

What is SageMaker Clarify used for?

Answer

A

Evaluating foundation models, detecting bias, and explaining model predictions.

Question 25

Q

How does SageMaker Clarify compare different models?

Answer

A

By using human evaluations on specific tasks with metrics like brand voice and relevance.

Question 26

Q

What kind of human evaluations can SageMaker Clarify support?

Answer

A

Evaluations of friendliness, humor, and other subjective attributes of model outputs.

Question 27

Q

Can you bring your own team for model evaluation in Clarify?

Answer

A

Yes, you can use an AWS-managed team or bring your own employees.

Question 28

Q

What is model explainability in SageMaker Clarify?

Answer

A

Understanding how and why a model makes predictions using interpretation tools.

Question 29

Q

Why is model explainability important?

Answer

A

To debug predictions, improve trust, and increase understanding of model behavior.

Question 30

Q

What is an example use case for explainability in Clarify?

Answer

A

Explaining why a loan application was rejected by identifying the most influential features.

Question 31

Q

What does SageMaker Clarify do about bias?

Answer

A

It detects and measures biases in datasets and models using statistical metrics.

Question 32

Q

What is a common type of bias SageMaker Clarify can detect?

Answer

A

Class imbalance or demographic representation issues like gender or age bias.

Question 33

Q

What is SageMaker Ground Truth used for?

Answer

A

Data labeling, human review, model customization, and alignment using RLHF.

Question 34

Q

What is an example of using Ground Truth for labeling?

Answer

A

Labeling images by identifying objects like dogs, cats, or ships with human annotators.

Question 35

Q

Who can perform labeling tasks in Ground Truth?

Answer

A

Employees, third-party reviewers, or Amazon Mechanical Turk workers.

Question 36

Q

What is SageMaker Ground Truth Plus?

Answer

A

An enhanced version of Ground Truth that uses an expert workforce for data labeling tasks.

Question 37

Q

What is the purpose of SageMaker Model Cards?

Answer

A

To gather essential model information like intended use, risk rating, and training details in one place.

Question 38

Q

What does SageMaker Model Dashboard provide?

Answer

A

A centralized view to monitor, explore, and track all SageMaker models with insights on quality, risk, and bias.

Question 39

Q

What is SageMaker Role Manager used for?

Answer

A

To define roles and permissions for different personas like data scientists or MLOps engineers.

Question 40

Q

What is the function of SageMaker Model Monitor?

Answer

A

To track the performance of deployed models and detect data or prediction quality deviations.

Question 41

Q

How often can Model Monitor evaluate model quality?

Answer

A

Continuously or on a schedule (e.g., daily, weekly).

Question 42

Q

What action should you take when Model Monitor alerts you?

Answer

A

Investigate and fix the issue by updating data or retraining the model.

Question 43

Q

What does SageMaker Model Registry provide?

Answer

A

A centralized model catalog for versioning, managing metadata, and controlling model approval workflows.

Question 44

Q

How can you manage model approval in Model Registry?

Answer

A

By setting an approval status and involving human reviewers before deployment.

Question 45

Q

What is SageMaker Pipelines?

Answer

A

A CI/CD workflow tool for automating machine learning model building, training, and deployment.

Question 46

Q

Why are SageMaker Pipelines useful in MLOps?

Answer

A

They automate workflows, reduce manual errors, ensure repeatability, and accelerate iteration.

Question 47

Q

What are the main step types in SageMaker Pipelines?

Answer

A

Processing, Training, Tuning, AutoML, Model, ClarifyCheck, and QualityCheck.

Question 48

Q

What does the ‘Processing’ step in a pipeline do?

Answer

A

Performs data preparation like feature engineering.

Question 49

Q

What does the ‘Training’ step in a pipeline do?

Answer

A

Trains the machine learning model on prepared data.

Question 50

Q

What is the ‘Tuning’ step used for in a pipeline?

Answer

A

Optimizes model performance through hyperparameter tuning.

Question 51

Q

What does the ‘AutoML’ step handle?

Answer

A

Automatically trains models with minimal configuration.

Question 52

Q

What is the ‘Model’ step used for?

Answer

A

Creates and registers a SageMaker model (optionally to the Model Registry).

Question 53

Q

What does the ‘ClarifyCheck’ step do?

Answer

A

Performs bias and explainability analysis using SageMaker Clarify.

Question 54

Q

What is the purpose of the ‘QualityCheck’ step?

Answer

A

Checks the data or model quality against a defined baseline.

Question 55

Q

What is the benefit of using a centralized Model Dashboard?

Answer

A

It allows quick insights and actions on models that violate performance or fairness thresholds.

Question 56

Q

What is the significance of versioning in Model Registry?

Answer

A

It enables tracking changes across different versions of models, ensuring reproducibility and control.

Question 57

Q

What is SageMaker JumpStart?

Answer

A

A machine learning hub in SageMaker to find and launch pre-trained models (NLP, CV, etc.) and ML solutions quickly.

Question 58

Q

What kind of models does JumpStart support?

Answer

A

Pre-trained models from providers like Hugging Face, Meta, Stability AI, Databricks, and more.

Question 59

Q

What are the two main components of SageMaker JumpStart?

Answer

A

The Machine Learning Hub (pre-trained models) and Machine Learning Solutions (pre-built use case templates).

Question 60

Q

What can you do with models in JumpStart?

Answer

A

You can launch them, customize with your own data, fine-tune, and deploy them on SageMaker.

Question 61

Q

What is SageMaker Canvas?

Answer

A

A no-code, visual interface for building ML models using your data.

Question 62

Q

What kind of users is SageMaker Canvas designed for?

Answer

A

Non-developers or business users who want to build ML models without coding.

Question 63

Q

What service powers SageMaker Canvas under the hood?

Answer

A

SageMaker Autopilot, which uses AutoML to build models.

Question 64

Q

What AWS AI services are integrated with Canvas?

Answer

A

Amazon Comprehend, Rekognition, and Textract.

Answer 65

A

Sentiment analysis (Comprehend), object detection (Rekognition), document analysis (Textract).

Answer 66

A

An open-source tool to manage the ML lifecycle: tracking experiments, models, and workflows.

Answer 67

A

Yes, SageMaker allows you to launch an MLFlow Tracking Server directly in SageMaker Studio.

Answer 68

A

To manage experiments and track training runs as part of ML model development.

Answer 69

A

Seamless use of open-source ML lifecycle tools within the SageMaker ecosystem.

Answer 70

A

A drag-and-drop, no-code visual interface to simplify the ML pipeline.

Answer 71

A

JumpStart helps you launch and customize pre-trained models; Canvas lets you build models visually without code.

Answer 72

A

An end-to-end machine learning service to build, train, and deploy ML models at scale.

Answer 73

A

To tune hyperparameters of ML models automatically.

Answer 74

A

Real-time, serverless, batch, and asynchronous inference.

Answer 75

A

A unified IDE for building, training, debugging, and deploying ML models end-to-end.

Answer 76

A

A tool to import, explore, process, and prepare data for ML.

Answer 77

A

A centralized repository to store and retrieve ML features across teams and pipelines.

Answer 78

A

Provides model comparison, bias detection, and model explainability.

Answer 79

A

A tool for data labeling and reinforcement learning from human feedback (RLHF).

Answer 80

A

A documentation tool to describe a model’s intended use, risk, and training details.

Answer 81

A

A centralized place to track all deployed models and their performance metrics.

Answer 82

A

Monitors deployed models for data drift, quality issues, and sends alerts.

Answer 83

A

A catalog for managing, versioning, and approving ML models before deployment.

Answer 84

A

A CI/CD service for automating ML workflows including data prep, training, and deployment.

Answer 85

A

A service to manage access and permissions for different user roles in SageMaker.

Answer 86

A

A hub for pre-trained models and pre-built ML solutions for rapid prototyping.

Answer 87

A

A no-code interface to build and deploy ML models visually without programming.

Answer 88

A

An integration that lets you use open-source MLFlow to track experiments and manage ML lifecycles.

Answer 89

A

A security feature that prevents outbound network access from training/inference containers to protect data and prevent leaks.

Answer 90

A

Containers cannot access the internet, Amazon S3, or your VPC—only preloaded data is available for training.

Answer 91

A

Forecasting time series data.

Answer 92

A

Recurrent Neural Network (RNN).