SageMaker - Built-In Algorithms Flashcards
Does SageMaker handle the entire ML workflow?
Yes
What is a SageMaker notebook?
It is just a notebook that is spun up from the console.
Can you use Scikit_learn, Spark, and Tensorflow from a SageMaker notebook?
Yes
Can you launch servers from your SageMaker notebook?
Yes
What is the SageMaker Input Mode S3 File Mode?
It is the default. It copies the data to the docker container. This is okay for smaller datasets, but not large ones.
What is the SageMaker Input Mode S3 Fast File Mode?
It streams the data from the S3 source. This was a replacement for Pipe Mode.
What is the SageMaker Input Mode S3 Express One Zone Mode?
It is a high performance storage class in one AZ. Works with other input modes.
What is the SageMaker Input Mode Amazon FsX for Lustre?
This is for HPC and 100s of GB of throughput. This is really meant for large datasets.
What is the SageMaker Input Mode EFS Mode?
Uses EFS as a file system for the source data.
What is the Linear Learner Model in SageMaker?
It handles linear regression. This is used for predications and classifications.
If your model training is taking too much time to get started, what can you do?
Use pipe mode which will stream the data.
Does Linear Learner require nomalized data?
Yes. This can be done in advance or within the model.
What kind of regularization does Linear Learner support?
L1 and L2.
What is XGBoost?
It is a boosted group of decision trees. The new trees made to correct the errors of the previous trees.
How can you prevent overfitting when using XGBoost?
Use the subsample or Eta hyperparameters
Is XGBoost memory or CPU bound?
Memory.
What is LightGBM?
A gradient boosting decision tree. Like XGBoost.
What are good use cases for LightGBM?
Classification, Regression, or Ranking
What does the Seq2Seq model do?
It takes an input series of tokens and outputs a series of tokens.
What is Seq2Seq good for?
Machine translation
Speech to text
Text summarization
What is Seq2Seq often implemented with?
RNNs and CNNs
Are there pre-trained Seq2Seq models available in SageMaker?
Yes
What can Seq2Seq optimize on?
Accuracy
BLEU Score
Perplexity
What is the DeepAR model used for?
Forecasting one-dimensional time series data.
What is BlazingText?
It can predict labels for a sentence or to create vector representations of words.
Is BlazingText used for sentences or documents.
Sentences only.
What is Word2vec in BlazingText?
It creates a vector representation of a word. It does not work on sentences. It finds words similar to eachother.
What is Object2Vec used for?
Finding similar objects. Similar to word2Vec, but for objects.
What are some good use cases for Object2vec
Genre prediction, neared neighbors of objects, recommendations
What is SageMaker Object Detection?
It identifies all objects in an images with bounding boxes. Detection and classification.
What is SageMaker Image Classification?
It assigns one or more labels to an image. It does not perform object recognition.
What is the SageMaker Semantic Segmentation algorithm?
It is pixel level object classification.
What is SageMaker Random Cut Forest?
It is anomaly detection.
Does Random Cut Forest support file or pipe mode?
Both
What is the Neural Topic Model in SageMaker?
It organizes documents into topics.
Is Neural Topic Model in SageMaker supervised or unsupervised?
Unsupervised.
What is the LDA model in Sagemaker for?
Topic modeling. Similar to Neural Topic Model, but not using deep learning.
What is KNN in SageMaker for?
Finds the closest points to your sample and returns the most frequent label. Nearest neighbor.
Does KNN in SageMaker include a dimensionality reduction stage?
Yes
What is K-Means Clustering in SageMaker?
Unsupervised clustering algorithm. Finds clusters of data.
What is Principal Component Analysis (PCA) in SageMaker?
It performs dimensionality reduction.
Is Principal Component Analysis (PCA) in SageMaker supervised or unsupervised?
Unsupervised.
What does Factorization Machine Models do in SageMaker?
They deal with sparse data. Click predictions, recommendations, etc..