Sagemaker Flashcards

1
Q

Complete with the name of each Sagemaker functionality:
-_________ is useful for more complex inference workflows
-_________ can help deployment on edge devices
-_________ can accelerate inference for Deep Learning models
-_________ evaluates new models against currently deployed models to catch errors

A

-Inference Pipeline
-Sagemaker Neo
-Elastic Inference
-Shadow Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When training models on AWS, what is de the difference between File mode and Pipe mode?

A

File mode copies the training files to the EBS volume of the instance, while Pipe mode streams the data from S3, reducing the volume space needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: RecordIO-Protobuf’s advantages over CSV are that is faster, more eficient and can be used in Pipe mode

A

False, CSV can also be sed in pipe mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or False: Sagemaker Linear Learner can perform Regression, Binary classification and multi-class classification

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What training data types and training types does Linear Learner support?

A

Data type: CSV and RecordIO-Protobuf (float32)
Training type: Pipe and File

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or False: When training Linear Learner, it is recommended to launch multiple training jobs at the same time, since the model is very sensitive to hyperparameters

A

False, multiple models are trained by default and the best one is selected by default

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False: Linear Learner supports both L1 and L2 regularization

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Identify the following Linear Learner Hyperparameters by their description:
- ______ equalizes the importance given to each class in a multi-class classification model
-_______ defines the speed with which the SGDC algorithm converges
-_______ governs L1 regularization
-_______ governs L2 regularization weigh decay
-_______ keeps the precision at the specified value then maximizes recall
-_______ keeps the recall at the specified value then maximizes precision

A

-Balance_multiclass_weights
-Learning_rate
-L1
-Wd
-target_precision
-target_recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False: Linear Learner can benefit from single and multi-machine CPUs, bot not GPUs

A

False, it can benefit from single machine GPUs, but not multi-machine ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is XGBoost?

A

It is an ensemble model that trains multiple decision trees based on the errors of previous trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True or False: XGBoost accepts only CSV and RecordIO-Protobuf training data

A

False, also accepts Parquet and libsvm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Identify the following XGBoost Hyperparameters by their description:
- ______ prevents overfitting
-_______ step size shrinkage, helps with overfitting
-_______ minimum loss reduction to create a partition
-_______ governs L1 regularization
-_______ governs L2 regularization
-_______ metric to use on the model evaluation process
-_______ adjust balance between positive and negative weights (helps with unbalanced data)
-_______ max depth of the tree, lower values help with overfitting

A

-Subsample
-Eta
-Gamma
-Alpha
-Lambda
-eval_metric
-scale_pos_weight
-max_depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or False: XGBoost is compute limited, so the best type of training instance for it is a compute focused one, such as C

A

False, it is memory bound, so something like an M5 instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False: XBoost nowadays accepts both single and distributed instance GPU training

A

True, as long as you configure the hyperparameters adequately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Seq2seq’s use cases?

A

Any case where a sequence is received as input and passed as output (text to text, audio to text, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What model types are used to implement Seq2seq?

A

RNNs and CNNs with attention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What types of input data does Seq2seq accept?

A

Only RecordIO-Protobuf with integer tokens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What types of metrics can Seq2seq be optimized on?

A

Accuracy, BLEU score, Perplexity (Cross-entropy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which of the following are Seq2Seq hyperparameters?
-Batch_size
-Optimizer_type (adam, sgd, rmsprop)
-Learning_rate
-Alpha
-Lambda
-Num_layers_encoder
-Num_layers_decoder
-Top_k
-Top_n

A

-Batch_size
-Optimizer_type (adam, sgd, rmsprop)
-Learning_rate
-Num_layers_encoder
-Num_layers_decoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True or False: Seq2seq can run on both CPU and GPU instances

A

False, only single or multi GPU instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Amazon DeepAR useful for?

A

Performing 1D timeseries inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What XGBoost Hyperparameters do you have to configure to enable single-GPU and multi-GPU training?

A

-Single GPU: tree_method = gpu_hist
-Multi GPU: use_dask_gpu_training = true and TrainingInput distribution = fully_replicated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: XGBoost distributed GPU training works only for RecordIO-Protobuf

A

False, CSV and Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What Seq2seq metrics are good for measuring machine translation problems

A

-BLEU score and perplexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What training Data format does DeepAR expect?
Any JSON lines format (Gzip, Parquet)
26
What are the features expected in DeepAR input?
-Start: starting timestamp -Target: The time series values -Dynamic_feat: A list of dynamic features -Cat: A list of categorical features
27
Identify the following DeepAR Hyperparameters by their description: - ______ is the number of points the model sees before making a prediction -_______ number of training epochs -_______ size of the batch -_______ learning speed -_______ number of cells
-Context_length -Epochs -mini_batch_size -Learning_rate -Num_cells
28
True or False: DeepAR can be trained on both CPU and GPU instances, but inference can only be performed using CPU
True
29
True or False: DeepAR training with GPU always helps with training
False, helps only with larger models or large mini_batch_size (>512)
30
What are the 2 most common use cases for BlazingText?
-Text classification -Word (not sentence) embedding generation
31
What are the expected training inputs for each blazing text use case?
-Text classification (supervise mode): text file where each line is a sentence where the first word is __label__ followed by the label. Also accepts augmented manisfest text format. Ex: __label__2 Hello, everyone -Word embedding (unsupervised mode): Just a text file with one sentence per line.
32
What are the modes avalilable for word2vec?
-Cbow -Skip-gram -Batch skip-gram
33
What are the available hyperparameters for BlazingText for text classification?
-Epochs -Learning_rate -Word_ngrams -Vector_dim
34
What are the available hyperparameters for BlazingText for Word2vec?
-Mode -Learning_rate -Window_size -Vector_dim -Negative_samples
35
What instance type should you use for BlazingText Word2vec training?
-Skipgram or cbow: Any single CPU or GPU -batch_skipgram: Any single or multiple CPUs instance
36
What instance type should you use for BlazingText Text Classification training?
-C5 if training dataset < 2GB -Single GPU instance otherwise
37
What is Object2Vec?
It is a model that generalizes word2vec so it can be used on other high dimensional objects
38
What are some examples inputs accepted by Object2vec?
-Sentence-sentence pairs. Ex: "A soccer game with multiple males playing." and "Some men are playing a sport." -Labels-sequence pairs: The genre tags of the movie "Titanic", such as "Romance" and "Drama", and a short description of the movie. -Customer-customer pairs: The customer ID of Jane and customer ID of Jackie. -Product-product pairs: The product ID of football and product ID of basketball. -Item review user-item pairs: A user's ID and the items she has bought, such as apple, pear, and orange.
39
What are the processed input types accepted by Object2vec?
Object2vec accepts as input pairs of the following input types: -A discrete token, which is represented as a list of a single integer-id. For example, [10]. -A sequences of discrete tokens, which is represented as a list of integer-ids. For example, [0,12,10,13].
40
How are the Object2vec inputs processed by the network?
But values of the sequence are encoded, compared by a comparator then fed through a feed-forward network
41
What are the encoder choices for Object2vec?
-Average-pooled embeddings -CNNs -Bidirectional LSTMs
42
Object2vec has all standard deep learning hyperparams. What are the hyperparams related to the encoders?
Enc1_network and enc2_network, where you choose the encoder type for each of them
43
True or False: Object2vec can be trained on any single instance, GPU or not, but not on multiple-instances
True
44
What is the environment_variable that can be used to optimize inference for Object2vec encoder embeddings?
INFERENCE_PREFERRED_MODE
45
How do Object Detection models work?
They detect and classify objects using bounding boxes and confidence valuesW
46
What are the 2 variants of Object Detection models?
-MXNet -Tensorflow
47
True or False: Both MXNet and Tensorflow Object detection allow model incremental training and transfer learning through AWS
False, only MXNet allows training thorugh AWS, Tensorflow needs you to perform a custom training depending on the model selected
48
For model training, what input formats does Object Detection (MXNet) accept?
RecordIO or image format (.png or .jgp)
49
What are the main hyperparameters for Object Detection training?
-Learning_rate -Optimizer (sgd, adam, rmsprop, adadelta) -Mini_batch_size
50
True or False: Object detection accepts only GPU instances for both training and inference
False, inference also accepts CPU
51
What is AWS Image Classification?
It is an AI Service where images are labeled by an AI model
52
What is the difference between AWS Image Classification and Object Detection?
Object Detection detects the objects inside the image using bounding boxes, while image classification classifies the image as a whole
53
True or False: Just like Object Detection, Image Classification has 2 possible models, MXNet and Tensorflow, with only MXNet accepting full training and transfer learning and Tensorflow not accepting any kind of training
False, Tensorflow in this case also accepts learning for it's top most layer
54
On AWS Image Classification, what are the learning types available for MXNet?
-Full training mode (training from scratch) -Transfer learning mode (only top most layer fine-tuned)
55
Image Classification has the same general hyperparameters as Object Detection. What are specific hyperparameters that exist only for it?
-Weight decay -Beta1 -Beta2 -Gamma -Eps
56
What are acceptable instances for training and inference for Image Classification
-Training: Single GPU, Multi GPU, multimachine -Inference: Any CPU or GPU
57
What is Semantic Segmentation?
It is a low level analysis of pixels within an image to identify shapes within an image through a segmentation mask
58
What type of Input does Semantic Segmentation accept?
Training: JPG, PNG and Augmented manifest image format Inference: JPG
59
What is the image size on image classification?
3x224x224
60
To use Pipe mode on Semantic Segmentation training, in what format does the input files have to be?
Augmented manifest image
61
Semantic Segmentation accepts multiple algorithm and model backbone choices. What are them?
Algorithm: Fully Convolutional Network (FCN), Pyramid Scene Parsing (PSP), DeepLabV3 Backbones: ResNet50, ResNet101(Both trained on ImageNet
62
What training types are supported by Semantic Segmentation?
Full training and Incremental Training
63
What are the hyperparameters that exists specifically for semantic segmentation?
-Backbone -Algorithm
64
What are acceptable instances for training and inference for Semantic Segmentation
-Training: Single GPU, Multi GPU, multimachine -Inference: Any CPU or GPU
65
What is Random Cut Forest?
It is an unsupervised machine learning algorithm that creates a forest of trees and uses them to detect outliers in the data, assigning an anomaly score to each point
66
What are the Input Formats and training modes accepted for Random Cut Forest?
Input: RecordIO-Protobuf and CSV Modes: Pipe and File
67
What are important hyperparameters for Random Cut Forest?
-Num_trees (Increasing reduces noise) -Num_samples_per_tree
68
True or False: Random Cut Forest does not take advantage of GPUs
TrueW
69
What is Neural Topic Modelling (NTM)?
It is an unsupervised machine learning algorithm that classifies or summarizes documents based on their similarity
70
What are the training modes and inputs accepted by NTM?
-Modes: File and Pipe -Input: CSV and RecordIO-Protobuf
71
What are the main Hyperparameters for NTM?
-Mini_batch_size -Learning_rate -Num_topics
72
What instances should you use for training and inference on NTM?
-Training: GPU -Inference: CPU
73
What is Latent Dirichlet Alocation?
It is an unsupervised topic modelling algorithm that groups similar documents together
74
What are the training modes and inputs accepted by LDA?
-Modes: File and Pipe (Pipe only with RecordIO-Protobuf) -Input: CSV and RecordIO-Protobuf
75
True or False: LDA is similar to NTM, but CPU based, so cheaper
True
76
What are important LDA Hyperparams?
-Num_topics -Alpha0 (Controls topic sparseness, smaller values produce sparse topics, while larger ones produce uniformly sized topics)
77
What are the recommended instance types for LDA?
Single CPU instances
78
What are the training modes and inputs accepted by KNN?
-Modes: File and Pipe -Input: CSV and RecordIO-Protobuf
79
True or False: When training a KNN model, Sagemaker includes a dimensionality reduction phase to avoid sparse data (sign or fjlt methods)
True
80
What are important KNN Hyperparams?
-K -Sample_size
81
What are the recommended instance types for KNN?
-Training: CPU or GPU -Inference: CPU or GPU for higher throughput on larger batches
82
What are the training modes and inputs accepted by K-Means?
-Modes: File and Pipe -Input: CSV and RecordIO-Protobuf
83
What are the main K-Means Hyperparameters?
-K -Mini_batch_size -Init_method -Extra_center_factor
84
What are the recommended instance types for K-Means?
CPU or single GPU
85
What is PCA?
It is an unsupervised algorithm that performs dimentionality reduction by projecting vectors into their main principal componets
86
What are the training modes and inputs accepted by PCA?
-Modes: File and Pipe -Input: CSV and RecordIO-Protobuf
87
What are the 2 PCA modes?
-Regular: For Sparse data and a moderate number of observations and features -Randomized: For large number of observations and features
88
What are PCA's main hyperparams?
-Mode -Subtract_mean (Unbiases data)
89
What are Factorization machines?
A classification and regression algorithm that works well with sparse data. It's also good with recommendations.
90
What are the training input format accepted by Factorization Machines?
RecordIO-Protobuf (CSV not good for sparse data)
91
What is the most common use for Factorization Machines?
Recommender Systems
92
What are the recommended instance types for Factorization Machines?
CPUs instances, with GPUs being recommended only for dense data
93
What is Amazon IP Insights?
It is a service that uses AI to detect anomalous behaviour based on IP addresses
94
What is the expected input format for IP Insights training?
CSV files with entity -> IP pairs only
95
True or False: IP Insights is based on decision trees, and returns an anomaly score that indicates how anomalous the analysed interaction is
False, IP insights is based on nerual networks
96
True or False: IP Insights trainings is recommended for GPUs and inference for CPUs
True
97
In the context of reinforcement learning, what is Q-Learning?
It is an implementation of reinforcement learning where you have: -A list of states S -A list of possible actions on those states A -A value of each state/action Q You start of with Q = 0 and explore the space. Each time rewards are given, increase Q. Each time bad things happen, reduce Q.
98
What is the exploration problem in Q-Learning?
When using Q learning, it is necessary to explore the space. Because of that, an important question becomes how to determine the actions taken to explore the space.
99
What are some common ways to solve the exploration problem in Q-Learning?
-Naive approach: Always choose an action with the highest Q (issue: Inefficient and might miss a lot of possible paths) -Better approach: Markov Decision Process (MDP). Introduce epsilon. If random number smaller then epsilon, choose action at random.
100
What frameworks in Sagemaker support reinforcement learning?
Deep Learning frameworks such as MXNet and Tensorflow
101
True or False: Sagemaker supports both multi-core and multi-instance reinforcement training
True
102
What is Sagemaker Automatic Model Tuning?
It's a Sagemaker functionality where multiple Hyperparameter Tuning Jobs to determine the best hyper parameters for your model
103
What are some ways of integrating spark with sagemaker?
-Use the sagemaker-spark library -Use SparkMagic kernels on Sagemaker notebooks
104
What is the difference between the Random and Kmeans++ methods if initializing K-means centroids?
Random initializes them randomly, while K-means++ tries to initialize the centroids far apart from each other
105
What is the expected Factorization Machine input row format?
A pair-wise interaction (Ex: User -> Item)
106
What is the Sagemaker Debugger?
It is a Sagemaker functionality that helps you train your models by providing debugging features. Some of them include: -Saving your model parameters periodically -Defining rules for unwanted conditions and running debug jobs for each rule -Saving the logs to Cloudwatch -Providing insights on Sagemaker Debugger Dashboard
107
True or False: Sagemaker Debugger can automatically generate training reports based on the jobs run
True
108
What are Sagemaker Debugger's built in sets of rules?
-Monitor system bottlenecks -Profile model framework operations -Debug model parameters
109
What are the frameworks and algorithms supported by Sagemaker Debugger?
-MXNet -Tensorflow -XGBoost -Pytorch -Sagemaker generic estimatorsTr
110
True or False: Sagemaker Debugger APIs are available on GitHub
True
111
What built-in actions can be triggered in response to Sagemaker Debugger rules via SNS?
-StopTraining -Email -SMS
112
What is Sagemaker Autopilot (AutoML)?
It is a Sagemaker functionality that automatizes model selection, model training, data processing and all infrastructure provisioning for training ML models
113
What is Sagemaker Autopilot's default workflow?
-Load data from S3 for training -Select column for prediction -Automatic model creation -Model is made available for visibility and control -Model is added to model leaderbords where you can pick the one that suits you the most -You deploy and monitor the model
114
What file formats are accepted by Sagemaker Autopilot?
CSV and Parquet
115
What problem types are accepted by Sagemaker Autopilot?
Binary classification, multiclass classification, regression
116
What algorithm types are accepted by Sagemaker Autopilot?
-Linear Learner -XGBoost -Deep Learning (MLP) -Ensemble mode
117
What training modes are available on Sagemaker Autopilot?
- HPO (Hyperparameter optimization) - Ensemble - Auto
118
Describe the HPO training mode from Sagemaker Autopilot
You select the algorithm and range of hyperparameters most relevant for the use case and multiples trails are run by Sagemaker. If the dataset is smaller than 100MB bayesian optimization is used, otherwise multi-fidelity optimization is used
119
Describe the Ensemble training mode from Sagemaker Autopilot
Multiple models are trained together using the AutoGluon library (wider range of models available) and the models are combined using a stacking ensemble method
120
Describe the Auto training mode from Sagemaker Autopilot
Autopilot automatically decides a training mode, HPO if training dataset larger than 100MB and Ensemble otherwise. For that, it needs to be able to read the size of your dataset, otherwise it chooses HPO
121
What are some common problems that would make Sagemaker Autopilot Auto training be unable to read the training dataset?
-S3 bucket hidden on VPC -Over 1000 files on S3 URI passed -S3DataType is a Manifest File
122
What is Sagemaker Autopilot Explainability?
It is a feature that integrates with Sagemaker Clarify and uses SHAP baselines / SHAP values to show how much each feature influences the value being predictedW
123
What is Sagemaker Model Monitor?
It a codeless feature that allows you to analyze model performance in search of anomalies, outliers or data drift
124
How does Sagemaker Model Monitor integrate with Clarify?
Clarify can help it detect bias in the data (imbalances in features) and explain model behavior
125
What are some pre-training biases available on Clarify?
-Class Imbalance -Difference in Proportions of Labels -Kullback-Leibler Divergence (KL), Jensen-Shannon Divergence(JS): How much outcome distributions of facets diverge -Lp-norm (LP): P-norm difference between distributions of outcomes from facets - Total Variation Distance (TVD): L1-norm difference between distributions of outcomes from facets - Total Variation Distance (TVD): L1-norm difference between distributions of outcomes from facets -Conditional Demographic Disparity (CDD): Disparity of outcomes between facets as a whole, and by subgroups
126
What data visualization services have integrations with Sagemaker Model Monitor?
-Tableau -Quicksight -Tensorboard
127
Where is Model Monitor data stored?
On S3 and the metrics emitted on Cloudwatch
128
True or False: Model Monitor jobs are scheduled automatically by AWS
False, they have to be scheduled via a Monitoring Schedule
129
What are the kinds of drift that can be monitored by Model Monitor?
-Bias Drift -Data Quality Drift -Model Quality Drift -Feature Attribution Drift
130
What endpoints are Sagemaker Deployment Guardrails available for?
-Asynchronous and real-time endpoints
131
True or False: the only Deployment Guardrails available on Sagemaker involve requests rerouting
-False, there are also auto-rollbacks
132
What request rerouting measures are available on Sagemaker Guardrails?
-Canary Deployment -A/B Deployment -Linear Deployment
133
What is Sagemaker Shadow Testing?
You deploy a Shadow Variant of the model in production and compare them. If the results are good, the shadow model can be promoted to production from the Sagemaker Console
134
What is Sagemaker Jumpstart?
It is a functionality from Sagemaker that allows you to deploy pre-existing models with a single click
135
What is Sagemaker Data Wrangler?
It is a set of tools that helps importing, exporting, analyzing and transforming data inside Sagemaker Studio
136
What is Sagemaker Feature Store?
It is a functionality that allows you to store and share features and datasets using specially built repositories
137
Whats is Sagemaker Edge Manager?
It is a feature that helps with deploying models to Edge locations using models optimized with Sagemaker Neo. Also collects data for monitoring, labelling and retraining
138
How does Sagemaker Feature Store organize data?
In feature groups, logical groupings of data
139
True or False: Sagemaker Feature Store has both an online and offline mode, with the offline mode working primarily by writing buffered data to Redshift
False, the data from the offline mode is written to S3
140
True or False: Sagemaker Feature STore data is encrypted at rest, and access to it can be secured through IAM and PrivateLink
True
141
Whats the name of the library that allows you to connect Sagemaker DEbugger to your code?
SMDebug
142
What is Sagemaker ML Lineage Tracking?
It is a Sagemaker feature that stores your ML Workflows (MLOps) for auditing and compliance
143
Complete the following statement: Sagemaker ML Lineage tracking integrates with ________ to track ML Flows across accounts?
Resource Access Manager
144
What are the Lineage Tracking Components used by Sagemaker ML Lineage?
-Trial component: Training job, processing job, etc -Trial: Model composed of trial components -Experiment: A group of trials for a use case -Context: Logical grouping of entities -Action: Workflow step, model deployment -Artifact: Object, data, etc -Association: Connects entities together
145
How would you query Sagemaker ML Lineage Entitities? What is the output of this query?
Use the Python LineageQuery API, which is part of the Sagemaker SDK. The output of this query is a list of all models/endpoints/etc that use the queried artifact and a visualization showing them
146
True or False: Sagemaker Data Wrangler has a visual interface that can be accessed through Sagemaker Notebooks
False, it can be accessed through Sagemaker Studio
147
What is Data Wrangler "Quick Model"?
A functionality that allows Data Wrangler to create and train a model based on the received data
148
What are some troubleshooting checks you should perform if Data Wrangler exhibits errors?
-Make sure Sagemaker Studio has the appropriate permissions to use it -Make sure the data sources have permissions that allow Data Wrangler to access them (AmazonSageMakerFullAccess) -If EC2 instance limit, request quota increase
149
What is Sagemaker Canvas?
It is a no code Machine Learning solution for business analysts. It simplifies the training step of the model and facilitates visualization of the data.
150
What is Sagemaker Training Compiler
It is a feature integrated into AWS Deep Learning Containers. It compiles and optimizes training jobs for GPUs, speeding up training by up to 50%.
151
What are some frameworks where Sagemaker Training Compiler work well?
Hugging Face Transformers and bring your own model
152
True or False: Sagemaker Training Compiler works with distributed learning
False, it does not work
153
What are some Sagemaker Training Compiler best practices?
-Ensure GPU instances are used (ml.p3, ml.p4) -PyTorch models must use PyTorch/XLA’s model save function -Enable debug flag in compiler_config parameter to enable debugging
154
What data formats does Sagemaker Canvas accept?
CSV only
155
True or False: When sending data to Sagemaker Canvas, one must take care to clean the data before hand
False, Sagemaker Canvas cleans the data for you
156
Which kind of ML models can Sagemaker Canvas train?
Classification and Regression
157
Classify each statement below as True or False regarding Sagemaker Canvas: * Local file uploading is impossible * Can integrate with Okta SSO * Canvas lives within a SageMaker Domain that must be updated either manually or automatically * Import from RDS can be set up * Time series forecasting must be enabled via Sagemaker Forecasting * Can run within a VPC * Pricing is $1.90/hr plus a charge based on number of training cells in a model
-False, it must be configured “by your IT administrator.” by using an S3 bucket under the hood -True -False, must be updated manually -False, what can be setup is import from Redshift -False, it must be enabled via IAM -True -True
158
What Sagemaker ML algorithms do not support exactly File and Pipe Training?
-XGBoost: Does not support Pipe for distributed training -Seq2Seq: Supports only file -DeepAR: Supports only file -Word2Vec: Supports only file -BlazingText: Supports only file -Object2Vec -ObjectDetection: File? -ImageClassification: File? -SemanticSegmentation: File? -FactorizationMachine: File? -IP Insights: File? -LDA: Supports Pipe only for RecordIO
159
What Sagemaker ML algorithms do not support exactly both CSV and RecordIO-Protobuf input?
-XGBoost: Also supports Parquet and libsvm. -Seq2Seq: Supports only RecordIO-protobuf (Int) -DeepAR: Supports only JSON line files (GZIP, Parquet) -BlazingText: Txt file or Augmented Manifest Text file -Object2Vec -ObjectDetection: Image format or RecordIO -ImageClassification: Image format or RecordIO -SemanticSegmentation: JPG, PNG, Augmented Image Format -FactorizationMachine: RecordIO-protobuf (Float32) -IP Insights: CSV only -LDA: Supports Pipe only for RecordIO
160
What are the 3 input modes for Sagemaker training data transmission?
-Pipe -File -FastFile
161
What Sagemaker ML algorithms support multi-GPU training?
-XGBoost -Seq2Seq: Only on the same instance -ObjectDetection -ImageClassification -SemanticSegmentation -IP Insights -Reinforcement Learning
162
What Sagemaker ML algorithms support multi-machine training?
-Linear Learning -XGBoost: Only GPU -DeepAR -BlazingText: Multiple CPUs for batch_skipgram -ObjectDetection -ImageClassification -SemanticSegmentation -Reinforcement Learning