Sagemaker Built-in Algorithms Flashcards

1
Q

Linear Learner

A

linear regression
can handle both regression and classification
for classification, a linear threshold is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Learner Input Format

A

recordIO/protobuf, csv

file or pipe mode supported

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linear Learner Usage

A
preprocessing:
data must be normalized and shuffled
training:
choose optimization function alg
multiple models optimized in parallel
tune L1, L2 regularization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

XGBoost

A

eXtreme Gradient Boosting
boosted group of decision trees
gradient descent to minimize loss
can be used for classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

XGBoost Input

A

CSV, libsvm

recently recordIO/protobuf, Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

XGBoost Usage

A

Models are serialized/deserialized with Pickle
can be used within notebook or as a built in SM algorithm

HPs: Subsample, eta, gamma, alpha, lambda

Only uses CPUs, only memory bound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Seq2Seq

A

Input is a sequence of tokens, output is a sequence of tokens
good for machine translation, text summarization, speech to text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Seq2Seq Input

A

recordIO/protobuf - tokens must be integers
start with tokenized text files
NEED TO PROVIDE TRAINING DATA, VALIDATION DATA, AND VOCAB FILES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Seq2Seq Usage

A

Training can take days
Pretrained models available
Public training datasets available for specific translation tasks

HPs: batch, optimizer, # layers
can optimize on accuracy, BLEU score, perplexity

only uses single machine GPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DeepAR

A

forcasting 1D time-series data
uses RNNs
allows you to train the same model on several related time series
finds frequency and seasonality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DeepAR Input

A

JSON lines (gzip or parquet)
each record must contain: start, target
can contain dynamic/categorical features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DeepAR Usage

A
  • always include entire time series
  • uses entire dataset, remove last points for training
  • don’t use very large values for prediction length
  • train on many time series’ when possible

HPs: epochs, batch size, learning rate, # cells, context length

GPU or CPU for training, CPU only for inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

BlazingText

A
  1. Text Classification
    predict labels for a sentence (NOT DOCS)
    supervised
    ex. web search, info retrieval
  2. Word2Vec
    - vector representation of words
    - semantically similar words represented by vectors
    close to each other > word embedding
    - useful for NLP, but not an NLP algorithm
    - only works on INDIVIDUAL words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

BlazingText Input

A
  1. Text Class. (supervised mode)
    - 1 sentence / line
    - 1st word in sentence is label “label”
    - augmented manifest text format
  2. Word2Vec
    - text file with 1 sentence / line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

BlazingText Usage

A

Word2Vec has multiple modes:

- cbow > continuous bag of words (order doesn't matter)
- skip-gram (order matters)
- batch skip-gram (distributed over CPU nodes)

HPs:

  • Word2Vec: mode, learning rate, window size, vector dim, negative samples
  • Text Classification: epochs, learning rate, word n-grams, vector dim (how many words we look at together)

cbow and skipgram use GPU (can use CPU)
batch skipgram use single or multiple CPU
Text class - use CPU for smaller, GPU for larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Object2Vec

A
  • like Word2Vec but with arbitrary objects
  • boil data down to lower level representation
    • compute nearest neighbors, visualize clusters, genre prediction, recommendations
  • UNSUPERVISED
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Object2Vec Input

A
  • tokenized into integers
  • pairs or sequences of tokens
    • sentence-sentence, labels-sequence, customer-customer, product-product, user-item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Object2Vec Usage

A
  • process data into JSON and shuffle
  • train with 2 input channels, 2 encoders, 1 comparator
  • encoder choices:
    • average pooled embeddings, CNN, bidirectional LSTM
  • comparator is followed by a feed-forward neural network

HPs: usual deep learning ones:

- dropout, early stopping, epochs, learning rate, batch size, layers, activation function, optimizer, weight decay - encoder1 network, encoder2 network

single machine, multi GPU
use inference pref mode to optimize for encoder embeddings rather than classification or regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Object Detection

A
  • ID all images in an image with bounding boxes
  • detect and classify with one deep neural network
    • provide confidence scores
  • can train images from scratch, or use pretrained models on ImageNet
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Object Detection Input

A
  • recordIO or image format (need JSON file for annotation data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Object Detection Usage

A
  • image&raquo_space; outputs all instances of all objects in image with categories and confidence scores
  • CNN with SSD algorithm
    • VGG-16 or ResNet 50
  • transfer learning mode / incremental training
    • use pretrained model for base network weights instead of random initial rates
  • uses flip, rescale, and jitter to avoid overfitting

HPs: batch size, learning rate, optimizer

GPU for training, CPU for inference

22
Q

Image Classification

A
  • assign one or more labels to an image

- doesn’t tell you where the objects are (no bounding)

23
Q

Image Classification Input

A
  • MxNet Record10 (not protobuf!)
  • raw images
    • requires first file to associate image with labels
  • augmented manifest image format - pipe mode!
24
Q

Image Classification Usage

A
  • ResNet CNN
  • full training > initialized with random weights
  • transfer learning mode:
    • initialized with pretrained weights
    • top layer is initialized with random weights
    • network is fine-tuned with new training data
  • default size is 3channel 224*224

HPs: batch, learning rate, optimizers (weight decay, beta1, beta2, eps, gamma)

GPU for training, GPU or CPU for instance

25
Q

Semantic Segmentation

A
  • pixel-level object classification
  • useful for self-driving cars
  • produces a segmentation mask
26
Q

Semantic Segmentation Input

A
  • JPG images and PNG annotations
  • label maps for describing annotations
  • augmented manifest image format for pipe!
  • JPG for inference
27
Q

Semantic Segmentation Usage

A
  • built on MxNet Gluon and GluonCV
  • choice of 3 algorithms:
    • fully-convolutional network (FCN)
    • pyramid scene parsing (PSP)
    • DeepLabV3
  • backbone: ResNet50, ResNet10
    • both trained on ImageNet

HPs: epochs, learning rate, batch size, optimizer, algorithm used, backbone used

single machine GPU only, CPU or GPU for inference

28
Q

Random Cut Forest

A
  • unsupervised anomaly detection
  • detect
    • spikes in time-series data
    • breaks in periodicity
    • unclassifiable data points
  • gives anomaly score to each point
  • amazon very proud of this!
29
Q

Random Cut Forest Inputs

A
  • CSV or recordIO/protobuf
  • file or pipe
  • optional test channel for computing AUC, recall, precision, F1 score
30
Q

Random Cut Forest Usage

A
  • create forest of trees where each tree is a partition of the training data
  • looks at expected change in complexity as a result of adding a new point
  • sampled randomly, then trained
  • can be used on time series

HPs: number of trees (increasing # reduces noise), samples / tree

no GPU

31
Q

Neural Topic Model

A
  • organize documents into topics
  • classify/summarize documents based on topics
  • not just TF/IDF
    • NTM groups things into higher levels
  • unsupervised
    • uses a neural variational inference algorithm
32
Q

Neural Topic Model Input

A
  • four data channels
    • train channel required (validation, test, aux optional)
  • recordIO/protobuf or CSV
  • words need to be tokenized with a vocab file
  • file or pipe mode
33
Q

Neural Topic Model Usage

A
  • define how many topics to generate
  • latent representation based on top-ranking words
  • one of two topic modeling algorithms (LDA)

HPs: smaller batch size and learning rate can reduce validation loss but increase training time, # of topics

CPU or GPU

34
Q

Latent Dirichlet Allocation (LDA)

A
  • topic modeling (not deep learning)
  • unsupervised
    • grouping of documents with shared subset of words
  • can be used for things other than words
    • customer clusters, harmonic analysis
35
Q

LDA Input

A
  • train, optional test channel
  • recordIO/protobuf or CSV - need to tokenize
  • each document has counts for every word in vocab (CSV)
  • pipe only supported with recordIO
36
Q

LDA Usage

A
  • unsupervised > you pick the # of topics
  • test channel - score results
  • functionally similar to Neural Topic Modeling, but CPU based

HPs: # of topics, Alpha0 (initial guess for concentration values)

single instance CPU

37
Q

KNN

A
  • supervised
  • simple classification or regression algorithm
  • classification:
    • find K closest points to a sample and return most frequent label
  • regression:
    • find K closest points to a sample and return average value
38
Q

KNN Input

A
  • train, optional test channel
  • recordIO/protobuf or CSV
  • file or pipe
39
Q

KNN Usage

A
  • data is sampled
  • dimensionality reduction
    • avoid sparse data at the cost of noise/accuracy
    • sign or figit methods
  • build index
  • serialize
  • query

HPs: K, sample size

CPU or GPU
inference - CPU for lower latency, GPU for higher throughput

40
Q

K-Means

A
  • unsupervised clustering
  • divide data into K groups where members are similar
    • you define “similar” > euclidian distance
  • web-scale k-means clustering
41
Q

K-means Inputs

A
  • train channel (sharded by S3 key flag), optional test (fully replicated key flag)
  • recordIO/protobuf or CSV
  • file and pipe
42
Q

K-Means Usage

A
  • every observation mapped to n-dimensional space
  • works to optimize center of K-clusters
    • extra cluster centers may be specified to improve accuracy
    • K = k*x
      • k = clusters we want
      • x = extra cluster centers
  • algorithm: determine initial cluster centers
    • random or K-means ++
      • K-means ++ tries to make initial clusters far apart
  • iterate over data and calculate cluster centers
  • reduce clusters from K to k (using Lloyd’s method for k-means++)

HPs: batch size, extra center factor (x), init method (random or k-means++), K
- K is tricky: use elbow method - basically optimize for tightness of clusters

CPU or GPU (CPU recommended)

43
Q

Principal Component Analysis (PCA)

A
  • dimensionality reduction
    • projecting higher-level dimensional data into lower-dimensional (like a 2D plot) while minimizing loss of info
  • reduced dimensions are called components
    • first component has largest possible variability
    • 2nd component has next largest
  • unsupervised
44
Q

PCA Inputs

A
  • recordIO/protobuf

- file or pipe

45
Q

PCA Usage

A
  • covariance matrix created, then singular value decomposition (SVD)
  • 2 modes:
    • regular - sparse data, moderate # of features
    • randomized - large # of features
      • uses approximation algorithm

HPs: algorithm mode, subtract mean (unbias data)

CPU or GPU - depends on specifics of data

46
Q

Factorization Machines

A
  • classification/regression with SPARSE DATA
  • good for recommendations
    • click prediction
    • item recommendations
    • since a user doesn’t interact with most pages/products, the data is sparse
  • supervised (classification or regression)
  • limited to pair-wise interactions
    • user - item
47
Q

Factorization Machines Inputs

A
  • recordIO/protobuf with Float32

- sparse data means CSV isn’t practical

48
Q

Factorization Machines Usage

A
  • essentially makes a big matrix
  • find factors we can use to predict a classification (click or not) or value (predicted rating) given a matrix representing some pair of things (users and items)

HPs: initialization methods for bias, factors, and linear terms
- uniform, normal, or constant

CPU or GPU - CPU recommended, GPU only works with dense data

49
Q

IP Insights

A
  • unsupervised
  • learning of IP address usage patterns
    • ID suspicious activity
    • security tool
50
Q

IP Insights Inputs

A
  • user names, account IDs can be fed in directly
  • training channel, optional validation (computes AUC)
  • CSV only
    • entity, IP
51
Q

IP Insights Usage

A
  • neural network to learn about latent vector representations of entities and IP
  • entities hashed and embedded
    • need big enough hash size
  • auto generates negative samples by randomly pairing entities and IPs

HPs: # of entity vectors (hash size), vector dim, epochs, learning rate, batch size

CPU or GPU (GPU recommended)