Amazon SageMaker - Deep Dive Flashcards

1
Q

What is Amazon SageMaker?

A

A fully managed machine learning service by AWS that enables developers and data scientists to build, train, tune, and deploy ML models at scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three main steps in a SageMaker ML workflow?

A

1) Collect and prepare data, 2) Build and train models, 3) Deploy and monitor models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What types of algorithms are built-in with SageMaker?

A

Supervised (e.g., Linear regression, KNN), Unsupervised (e.g., PCA, K-means), Anomaly detection, NLP, and Image processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AMT in SageMaker?

A

Automatic Model Tuning, which automatically optimizes hyperparameters to improve model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the four deployment types in SageMaker?

A

Real-time, Serverless, Asynchronous, and Batch Transform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Real-time inference in SageMaker?

A

A low-latency prediction service for small payloads (up to 6 MB) that responds instantly using a deployed endpoint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Serverless inference in SageMaker?

A

A deployment option without infrastructure management that auto-scales with memory configuration; may have cold start latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Asynchronous inference in SageMaker?

A

Used for large payloads (up to 1 GB) and longer processing times (up to 1 hour); input/output handled via Amazon S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Batch Transform in SageMaker?

A

Used for processing entire datasets (multiple records) at once with high latency; supports concurrent large-scale predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is SageMaker Studio?

A

A web-based IDE for ML development that supports model building, training, tuning, deployment, and collaboration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the benefit of AMT’s early stop condition?

A

It saves time and cost by halting underperforming tuning jobs automatically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is SageMaker Data Wrangler used for?

A

Preparing, transforming, and engineering features from tabular and image data for machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What types of data can you prepare with Data Wrangler?

A

Tabular and image data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some key features of Data Wrangler?

A

Data selection, cleansing, exploration, visualization, transformation, and feature engineering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does Data Wrangler support SQL?

A

Yes, it supports SQL for transformations and queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What tool in Data Wrangler helps analyze data completeness and formatting?

A

The data quality tool.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is feature engineering important?

A

Because high-quality features directly impact the performance of machine learning models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a common transformation in feature engineering?

A

Converting a birth date into age (a numerical value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the SageMaker Feature Store used for?

A

Storing, sharing, and discovering machine learning features across datasets and teams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where are features from the Feature Store discoverable?

A

Within SageMaker Studio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Why is having a centralized Feature Store beneficial?

A

It improves collaboration and enables reuse of high-quality features across datasets and projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Can a quick model be created in Data Wrangler?

A

Yes, to analyze how well the model might perform on the transformed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Is Data Wrangler part of SageMaker Studio?

A

Yes, it’s fully integrated within SageMaker Studio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is SageMaker Clarify used for?

A

Evaluating foundation models, detecting bias, and explaining model predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does SageMaker Clarify compare different models?

A

By using human evaluations on specific tasks with metrics like brand voice and relevance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What kind of human evaluations can SageMaker Clarify support?

A

Evaluations of friendliness, humor, and other subjective attributes of model outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Can you bring your own team for model evaluation in Clarify?

A

Yes, you can use an AWS-managed team or bring your own employees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is model explainability in SageMaker Clarify?

A

Understanding how and why a model makes predictions using interpretation tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why is model explainability important?

A

To debug predictions, improve trust, and increase understanding of model behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is an example use case for explainability in Clarify?

A

Explaining why a loan application was rejected by identifying the most influential features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does SageMaker Clarify do about bias?

A

It detects and measures biases in datasets and models using statistical metrics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a common type of bias SageMaker Clarify can detect?

A

Class imbalance or demographic representation issues like gender or age bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is SageMaker Ground Truth used for?

A

Data labeling, human review, model customization, and alignment using RLHF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is an example of using Ground Truth for labeling?

A

Labeling images by identifying objects like dogs, cats, or ships with human annotators.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Who can perform labeling tasks in Ground Truth?

A

Employees, third-party reviewers, or Amazon Mechanical Turk workers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is SageMaker Ground Truth Plus?

A

An enhanced version of Ground Truth that uses an expert workforce for data labeling tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the purpose of SageMaker Model Cards?

A

To gather essential model information like intended use, risk rating, and training details in one place.

38
Q

What does SageMaker Model Dashboard provide?

A

A centralized view to monitor, explore, and track all SageMaker models with insights on quality, risk, and bias.

39
Q

What is SageMaker Role Manager used for?

A

To define roles and permissions for different personas like data scientists or MLOps engineers.

40
Q

What is the function of SageMaker Model Monitor?

A

To track the performance of deployed models and detect data or prediction quality deviations.

41
Q

How often can Model Monitor evaluate model quality?

A

Continuously or on a schedule (e.g., daily, weekly).

42
Q

What action should you take when Model Monitor alerts you?

A

Investigate and fix the issue by updating data or retraining the model.

43
Q

What does SageMaker Model Registry provide?

A

A centralized model catalog for versioning, managing metadata, and controlling model approval workflows.

44
Q

How can you manage model approval in Model Registry?

A

By setting an approval status and involving human reviewers before deployment.

45
Q

What is SageMaker Pipelines?

A

A CI/CD workflow tool for automating machine learning model building, training, and deployment.

46
Q

Why are SageMaker Pipelines useful in MLOps?

A

They automate workflows, reduce manual errors, ensure repeatability, and accelerate iteration.

47
Q

What are the main step types in SageMaker Pipelines?

A

Processing, Training, Tuning, AutoML, Model, ClarifyCheck, and QualityCheck.

48
Q

What does the ‘Processing’ step in a pipeline do?

A

Performs data preparation like feature engineering.

49
Q

What does the ‘Training’ step in a pipeline do?

A

Trains the machine learning model on prepared data.

50
Q

What is the ‘Tuning’ step used for in a pipeline?

A

Optimizes model performance through hyperparameter tuning.

51
Q

What does the ‘AutoML’ step handle?

A

Automatically trains models with minimal configuration.

52
Q

What is the ‘Model’ step used for?

A

Creates and registers a SageMaker model (optionally to the Model Registry).

53
Q

What does the ‘ClarifyCheck’ step do?

A

Performs bias and explainability analysis using SageMaker Clarify.

54
Q

What is the purpose of the ‘QualityCheck’ step?

A

Checks the data or model quality against a defined baseline.

55
Q

What is the benefit of using a centralized Model Dashboard?

A

It allows quick insights and actions on models that violate performance or fairness thresholds.

56
Q

What is the significance of versioning in Model Registry?

A

It enables tracking changes across different versions of models, ensuring reproducibility and control.

57
Q

What is SageMaker JumpStart?

A

A machine learning hub in SageMaker to find and launch pre-trained models (NLP, CV, etc.) and ML solutions quickly.

58
Q

What kind of models does JumpStart support?

A

Pre-trained models from providers like Hugging Face, Meta, Stability AI, Databricks, and more.

59
Q

What are the two main components of SageMaker JumpStart?

A

The Machine Learning Hub (pre-trained models) and Machine Learning Solutions (pre-built use case templates).

60
Q

What can you do with models in JumpStart?

A

You can launch them, customize with your own data, fine-tune, and deploy them on SageMaker.

61
Q

What is SageMaker Canvas?

A

A no-code, visual interface for building ML models using your data.

62
Q

What kind of users is SageMaker Canvas designed for?

A

Non-developers or business users who want to build ML models without coding.

63
Q

What service powers SageMaker Canvas under the hood?

A

SageMaker Autopilot, which uses AutoML to build models.

64
Q

What AWS AI services are integrated with Canvas?

A

Amazon Comprehend, Rekognition, and Textract.

65
Q

What are some ready-to-use use cases in Canvas?

A

Sentiment analysis (Comprehend), object detection (Rekognition), document analysis (Textract).

66
Q

What is MLFlow?

A

An open-source tool to manage the ML lifecycle: tracking experiments, models, and workflows.

67
Q

Can you run MLFlow on SageMaker?

A

Yes, SageMaker allows you to launch an MLFlow Tracking Server directly in SageMaker Studio.

68
Q

What is the purpose of MLFlow Tracking Server in SageMaker?

A

To manage experiments and track training runs as part of ML model development.

69
Q

What is the benefit of MLFlow integration with SageMaker?

A

Seamless use of open-source ML lifecycle tools within the SageMaker ecosystem.

70
Q

What kind of UI does SageMaker Canvas provide?

A

A drag-and-drop, no-code visual interface to simplify the ML pipeline.

71
Q

What’s the difference between JumpStart and Canvas?

A

JumpStart helps you launch and customize pre-trained models; Canvas lets you build models visually without code.

72
Q

What is Amazon SageMaker?

A

An end-to-end machine learning service to build, train, and deploy ML models at scale.

73
Q

What is SageMaker Automatic Model Tuning used for?

A

To tune hyperparameters of ML models automatically.

74
Q

What are the deployment and inference options in SageMaker?

A

Real-time, serverless, batch, and asynchronous inference.

75
Q

What is SageMaker Studio?

A

A unified IDE for building, training, debugging, and deploying ML models end-to-end.

76
Q

What is SageMaker Data Wrangler?

A

A tool to import, explore, process, and prepare data for ML.

77
Q

What is SageMaker Feature Store?

A

A centralized repository to store and retrieve ML features across teams and pipelines.

78
Q

What does SageMaker Clarify do?

A

Provides model comparison, bias detection, and model explainability.

79
Q

What is SageMaker Ground Truth?

A

A tool for data labeling and reinforcement learning from human feedback (RLHF).

80
Q

What is SageMaker Model Cards?

A

A documentation tool to describe a model’s intended use, risk, and training details.

81
Q

What is SageMaker Model Dashboard?

A

A centralized place to track all deployed models and their performance metrics.

82
Q

What does SageMaker Model Monitor do?

A

Monitors deployed models for data drift, quality issues, and sends alerts.

83
Q

What is SageMaker Model Registry?

A

A catalog for managing, versioning, and approving ML models before deployment.

84
Q

What is SageMaker Pipelines?

A

A CI/CD service for automating ML workflows including data prep, training, and deployment.

85
Q

What is SageMaker Role Manager?

A

A service to manage access and permissions for different user roles in SageMaker.

86
Q

What is SageMaker JumpStart?

A

A hub for pre-trained models and pre-built ML solutions for rapid prototyping.

87
Q

What is SageMaker Canvas?

A

A no-code interface to build and deploy ML models visually without programming.

88
Q

What is MLFlow on SageMaker?

A

An integration that lets you use open-source MLFlow to track experiments and manage ML lifecycles.

89
Q

What is SageMaker Network Isolation mode?

A

A security feature that prevents outbound network access from training/inference containers to protect data and prevent leaks.

90
Q

What happens when you enable Network Isolation in SageMaker?

A

Containers cannot access the internet, Amazon S3, or your VPC—only preloaded data is available for training.

91
Q

What is the DeepAR algorithm in SageMaker used for?

A

Forecasting time series data.

92
Q

What type of neural network does DeepAR use?

A

Recurrent Neural Network (RNN).