Sagemaker Built-in Algorithms Flashcards

Question 1

Q

Linear Learner

Answer

A

linear regression
can handle both regression and classification
for classification, a linear threshold is used

Question 2

Q

Linear Learner Input Format

Answer

A

recordIO/protobuf, csv

file or pipe mode supported

Question 3

Q

Linear Learner Usage

Answer

A

preprocessing:
data must be normalized and shuffled
training:
choose optimization function alg
multiple models optimized in parallel
tune L1, L2 regularization

Question 4

Q

XGBoost

Answer

A

eXtreme Gradient Boosting
boosted group of decision trees
gradient descent to minimize loss
can be used for classification and regression

Question 5

Q

XGBoost Input

Answer

A

CSV, libsvm

recently recordIO/protobuf, Parquet

Question 6

Q

XGBoost Usage

Answer

A

Models are serialized/deserialized with Pickle
can be used within notebook or as a built in SM algorithm

HPs: Subsample, eta, gamma, alpha, lambda

Only uses CPUs, only memory bound

Question 7

Q

Seq2Seq

Answer

A

Input is a sequence of tokens, output is a sequence of tokens
good for machine translation, text summarization, speech to text

Question 8

Q

Seq2Seq Input

Answer

A

recordIO/protobuf - tokens must be integers
start with tokenized text files
NEED TO PROVIDE TRAINING DATA, VALIDATION DATA, AND VOCAB FILES

Question 9

Q

Seq2Seq Usage

Answer

A

Training can take days
Pretrained models available
Public training datasets available for specific translation tasks

HPs: batch, optimizer, # layers
can optimize on accuracy, BLEU score, perplexity

only uses single machine GPU

Question 10

Q

DeepAR

Answer

A

forcasting 1D time-series data
uses RNNs
allows you to train the same model on several related time series
finds frequency and seasonality

Question 11

Q

DeepAR Input

Answer

A

JSON lines (gzip or parquet)
each record must contain: start, target
can contain dynamic/categorical features

Question 12

Q

DeepAR Usage

Answer

A

always include entire time series
uses entire dataset, remove last points for training
don’t use very large values for prediction length
train on many time series’ when possible

HPs: epochs, batch size, learning rate, # cells, context length

GPU or CPU for training, CPU only for inference

Question 13

Q

BlazingText

Answer

A

Text Classification
predict labels for a sentence (NOT DOCS)
supervised
ex. web search, info retrieval
Word2Vec
- vector representation of words
- semantically similar words represented by vectors
close to each other > word embedding
- useful for NLP, but not an NLP algorithm
- only works on INDIVIDUAL words

Question 14

Q

BlazingText Input

Answer

A

Text Class. (supervised mode)
- 1 sentence / line
- 1st word in sentence is label “label”
- augmented manifest text format
Word2Vec
- text file with 1 sentence / line

Question 15

Q

BlazingText Usage

Answer

A

Word2Vec has multiple modes:

- cbow > continuous bag of words (order doesn't matter)
- skip-gram (order matters)
- batch skip-gram (distributed over CPU nodes)

HPs:

Word2Vec: mode, learning rate, window size, vector dim, negative samples
Text Classification: epochs, learning rate, word n-grams, vector dim (how many words we look at together)

cbow and skipgram use GPU (can use CPU)
batch skipgram use single or multiple CPU
Text class - use CPU for smaller, GPU for larger

Question 16

Q

Object2Vec

Answer

A

like Word2Vec but with arbitrary objects
boil data down to lower level representation
- compute nearest neighbors, visualize clusters, genre prediction, recommendations
UNSUPERVISED

Question 17

Q

Object2Vec Input

Answer

A

tokenized into integers
pairs or sequences of tokens
- sentence-sentence, labels-sequence, customer-customer, product-product, user-item

Question 18

Q

Object2Vec Usage

Answer

A

process data into JSON and shuffle
train with 2 input channels, 2 encoders, 1 comparator
encoder choices:
- average pooled embeddings, CNN, bidirectional LSTM
comparator is followed by a feed-forward neural network

HPs: usual deep learning ones:

- dropout, early stopping, epochs, learning rate, batch size, layers, activation function, optimizer, weight decay - encoder1 network, encoder2 network

single machine, multi GPU
use inference pref mode to optimize for encoder embeddings rather than classification or regression

Question 19

Q

Object Detection

Answer

A

ID all images in an image with bounding boxes
detect and classify with one deep neural network
- provide confidence scores
can train images from scratch, or use pretrained models on ImageNet

Question 20

Q

Object Detection Input

Answer

A

recordIO or image format (need JSON file for annotation data)

Question 21

Q

Object Detection Usage

Answer

A

image&raquo_space; outputs all instances of all objects in image with categories and confidence scores
CNN with SSD algorithm
- VGG-16 or ResNet 50
transfer learning mode / incremental training
- use pretrained model for base network weights instead of random initial rates
uses flip, rescale, and jitter to avoid overfitting

HPs: batch size, learning rate, optimizer

GPU for training, CPU for inference

Question 22

Q

Image Classification

Answer

A

assign one or more labels to an image

- doesn’t tell you where the objects are (no bounding)

Question 23

Q

Image Classification Input

Answer

A

MxNet Record10 (not protobuf!)
raw images
- requires first file to associate image with labels
augmented manifest image format - pipe mode!

Question 24

Q

Image Classification Usage

Answer

A

ResNet CNN
full training > initialized with random weights
transfer learning mode:
- initialized with pretrained weights
- top layer is initialized with random weights
- network is fine-tuned with new training data
default size is 3channel 224*224

HPs: batch, learning rate, optimizers (weight decay, beta1, beta2, eps, gamma)

GPU for training, GPU or CPU for instance

Question 25

Q

Semantic Segmentation

Answer

A

pixel-level object classification
useful for self-driving cars
produces a segmentation mask

Question 26

Q

Semantic Segmentation Input

Answer

A

JPG images and PNG annotations
label maps for describing annotations
augmented manifest image format for pipe!
JPG for inference

Question 27

Q

Semantic Segmentation Usage

Answer

A

built on MxNet Gluon and GluonCV
choice of 3 algorithms:
- fully-convolutional network (FCN)
- pyramid scene parsing (PSP)
- DeepLabV3
backbone: ResNet50, ResNet10
- both trained on ImageNet

HPs: epochs, learning rate, batch size, optimizer, algorithm used, backbone used

single machine GPU only, CPU or GPU for inference

Question 28

Q

Random Cut Forest

Answer

A

unsupervised anomaly detection
detect
- spikes in time-series data
- breaks in periodicity
- unclassifiable data points
gives anomaly score to each point
amazon very proud of this!

Question 29

Q

Random Cut Forest Inputs

Answer

A

CSV or recordIO/protobuf
file or pipe
optional test channel for computing AUC, recall, precision, F1 score

Question 30

Q

Random Cut Forest Usage

Answer

A

create forest of trees where each tree is a partition of the training data
looks at expected change in complexity as a result of adding a new point
sampled randomly, then trained
can be used on time series

HPs: number of trees (increasing # reduces noise), samples / tree

no GPU

Question 31

Q

Neural Topic Model

Answer

A

organize documents into topics
classify/summarize documents based on topics
not just TF/IDF
- NTM groups things into higher levels
unsupervised
- uses a neural variational inference algorithm

Question 32

Q

Neural Topic Model Input

Answer

A

four data channels
- train channel required (validation, test, aux optional)
recordIO/protobuf or CSV
words need to be tokenized with a vocab file
file or pipe mode

Question 33

Q

Neural Topic Model Usage

Answer

A

define how many topics to generate
latent representation based on top-ranking words
one of two topic modeling algorithms (LDA)

HPs: smaller batch size and learning rate can reduce validation loss but increase training time, # of topics

CPU or GPU

Question 34

Q

Latent Dirichlet Allocation (LDA)

Answer

A

topic modeling (not deep learning)
unsupervised
- grouping of documents with shared subset of words
can be used for things other than words
- customer clusters, harmonic analysis

Question 35

Q

LDA Input

Answer

A

train, optional test channel
recordIO/protobuf or CSV - need to tokenize
each document has counts for every word in vocab (CSV)
pipe only supported with recordIO

Question 36

Q

LDA Usage

Answer

A

unsupervised > you pick the # of topics
test channel - score results
functionally similar to Neural Topic Modeling, but CPU based

HPs: # of topics, Alpha0 (initial guess for concentration values)

single instance CPU

Question 37

Q

KNN

Answer

A

supervised
simple classification or regression algorithm
classification:
- find K closest points to a sample and return most frequent label
regression:
- find K closest points to a sample and return average value

Question 38

Q

KNN Input

Answer

A

train, optional test channel
recordIO/protobuf or CSV
file or pipe

Question 39

Q

KNN Usage

Answer

A

data is sampled
dimensionality reduction
- avoid sparse data at the cost of noise/accuracy
- sign or figit methods
build index
serialize
query

HPs: K, sample size

CPU or GPU
inference - CPU for lower latency, GPU for higher throughput

Question 40

Q

K-Means

Answer

A

unsupervised clustering
divide data into K groups where members are similar
- you define “similar” > euclidian distance
web-scale k-means clustering

Question 41

Q

K-means Inputs

Answer

A

train channel (sharded by S3 key flag), optional test (fully replicated key flag)
recordIO/protobuf or CSV
file and pipe

Question 42

Q

K-Means Usage

Answer

A

every observation mapped to n-dimensional space
works to optimize center of K-clusters
- extra cluster centers may be specified to improve accuracy
- K = k*x
  - k = clusters we want
  - x = extra cluster centers
algorithm: determine initial cluster centers
- random or K-means ++
  - K-means ++ tries to make initial clusters far apart
iterate over data and calculate cluster centers
reduce clusters from K to k (using Lloyd’s method for k-means++)

HPs: batch size, extra center factor (x), init method (random or k-means++), K
- K is tricky: use elbow method - basically optimize for tightness of clusters

CPU or GPU (CPU recommended)

Question 43

Q

Principal Component Analysis (PCA)

Answer

A

dimensionality reduction
- projecting higher-level dimensional data into lower-dimensional (like a 2D plot) while minimizing loss of info
reduced dimensions are called components
- first component has largest possible variability
- 2nd component has next largest
unsupervised

Question 44

Q

PCA Inputs

Answer

A

recordIO/protobuf

- file or pipe

Question 45

Q

PCA Usage

Answer

A

covariance matrix created, then singular value decomposition (SVD)
2 modes:
- regular - sparse data, moderate # of features
- randomized - large # of features
  - uses approximation algorithm

HPs: algorithm mode, subtract mean (unbias data)

CPU or GPU - depends on specifics of data

Question 46

Q

Factorization Machines

Answer

A

classification/regression with SPARSE DATA
good for recommendations
- click prediction
- item recommendations
- since a user doesn’t interact with most pages/products, the data is sparse
supervised (classification or regression)
limited to pair-wise interactions
- user - item

Question 47

Q

Factorization Machines Inputs

Answer

A

recordIO/protobuf with Float32

- sparse data means CSV isn’t practical

Question 48

Q

Factorization Machines Usage

Answer

A

essentially makes a big matrix
find factors we can use to predict a classification (click or not) or value (predicted rating) given a matrix representing some pair of things (users and items)

HPs: initialization methods for bias, factors, and linear terms
- uniform, normal, or constant

CPU or GPU - CPU recommended, GPU only works with dense data

Question 49

Q

IP Insights

Answer

A

unsupervised
learning of IP address usage patterns
- ID suspicious activity
- security tool

Question 50

Q

IP Insights Inputs

Answer

A

user names, account IDs can be fed in directly
training channel, optional validation (computes AUC)
CSV only
- entity, IP

Question 51

Q

IP Insights Usage

Answer

A

neural network to learn about latent vector representations of entities and IP
entities hashed and embedded
- need big enough hash size
auto generates negative samples by randomly pairing entities and IPs

HPs: # of entity vectors (hash size), vector dim, epochs, learning rate, batch size

CPU or GPU (GPU recommended)