Model Training, Tuning and Evaluation Flashcards by Ozzy Campos

What is an Activation Function?

It is a function within a Neuron that defines the output of a node / neuron based on its input signal.

How well did you know this?

Not at all

Perfectly

What is a Linear Activation Function?

It mirrors what came into it as an output. Think of it as a pass-through.

How well did you know this?

Not at all

Perfectly

Can a Linear Activation Function perform back propagation?

How well did you know this?

Not at all

Perfectly

What is a Binary Step Function?

It is on or off. Like a light switch. It only has a single value.

How well did you know this?

Not at all

Perfectly

Why are non-linear activation functions better than linear ones?

They allow for back propagation and multiple layers.

How well did you know this?

Not at all

Perfectly

What is a Rectified Linear Unit (ReLU)?

Used for deep learning. Very fast and easy to compute.

How well did you know this?

Not at all

Perfectly

What is Leaky ReLU?

It introduces a negative slope below zero

How well did you know this?

Not at all

Perfectly

What is PReLU?

It is like leaky ReLU, but the slope is learned from back propagation.

How well did you know this?

Not at all

Perfectly

What is Maxout?

It outputs the max of the inputs.

How well did you know this?

Not at all

Perfectly

What is Softmax?

The final output layer of a multi-class classification problem. It converts the outputs to a probability of each classification. Only handles a single label.

How well did you know this?

Not at all

Perfectly

What can Sigmoids do that Softmax cannot?

Multiple classifications

How well did you know this?

Not at all

Perfectly

What is TanH best for?

RNNs

How well did you know this?

Not at all

Perfectly

What is the activation function selection in steps?

ReLu, Leaky ReLu, PReLU, Maxout

How well did you know this?

Not at all

Perfectly

What is a CNN?

A Convolutional Neural Network

How well did you know this?

Not at all

Perfectly

What does a CNN do?

It finds a feature within your data. This could be in text or something in an image.

How well did you know this?

Not at all

Perfectly

What is the LeNet-5 CNN?

Used for handwriting analysis

How well did you know this?

Not at all

Perfectly

What is the AlexNet CNN?

Used for image classification

How well did you know this?

Not at all

Perfectly

What is an RNN for?

Sequences of data. Time series, web logs, captions, machine translation, etc..

How well did you know this?

Not at all

Perfectly

What is a recurrent neuron?

It is a neuron that remembers the data from previous runs.

How well did you know this?

Not at all

Perfectly

Can you have a layer of recurrent neurons?

Yes

How well did you know this?

Not at all

Perfectly

What is an Epoch?

An iteration in which we train.

How well did you know this?

Not at all

Perfectly

What is Learning Rate?

A hyperparameter that controls how much of the model’s weights are adjusted with respect to the loss (error) after each iteration during training.

How well did you know this?

Not at all

Perfectly

What does too high a learning rate cause?

Overshooting the optimal solution.

How well did you know this?

Not at all

Perfectly

What does too low a learning rate cause?

Taking too long to find the optimal solution.

How well did you know this?

Not at all

Perfectly

What is the batch size hyperparameter?

How many training samples are used within each batch of each epoch.

What is local minima?

A dip in the graph.

Do smaller batch sizes get stuck in "local minima"?

Yes, but they can work their way out. Batch sizes that are too large end up getting stuck at the wrong solution.

What does regularization do?

It prevents overfitting.

What is overfitting?

When a model is good at making predictions on the training data, but not on the new data it hasn't seen before.

What does dropout do?

It drops out specific neurons at random. Standard in CNNs

Can fewer layers or neurons prevent overfitting?

Yes

What is early stopping?

Stopping at a specific epoch when your accuracy is degrading over time.

What does L1 and L2 Regularization do?

They prevent overfitting

What is the L1 formula

Term is the sum of the weights

What is the L2 formula

Term is the sum of the square of the weights

What does L1 regularization really do?

It performs feature selection and some features can go to 0. It can result in sparse output.

What does L2 regularization really do?

It ensures all features remain considered, but weights are applied. It can result in Dense outputs.

What can Long short-term memory (LSTM) solve?

The vanishing gradient problem

Will ResNet solve the vanishing gradient problem?

Yes

What is a confusion matrix for?

A confusion matrix is a table used to evaluate the performance of a classification model. It compares the actual values (true labels) with the predicted values (predictions made by the model)

What is the formula for Recall?

True Positives dived by True Positives + False Negatives.

What does Recall measure?

The percentage of positives rightly predicted. Good choice for when you care about false negatives

What is a good use case for Recall?

Fraud Detection

What is the formula for Precision?

True positives divided by True Positives + False Positives.

What does Precision measure?

Percent of relevant results. Good choice when you care about false positives.

What is a good use case for Precision?

Medical screening, drug testing, etc..

What is the formula for Specificity?

True Negatives divided by True negatives + false positive.

What does Specificity measure?

True negative rate

When is the F1 score useful?

When you care about precision and recall

What does RMSE measure?

Accuracy. It only cares about right and wrong answers.

What does the ROC curve showcase?

Plot of true positive rate versus false positive rate.

What does the AUC curve showcase?

Probability that a classifier will rank a randomly chosen positive instance higher than a negative one.

What is the P-R Curve?

It shows Precision and Recall. Higher area under curve is better. Similar to ROC and good for informational retrival.

What are RMSE and MAE used for?

Measuring numerical predictions instead of classifications

What is the Ensemble method?

It allows for the end result to essentially be voted on.

What is bagging in the Ensemble method?

It generates N new training sets by random sampling.

What is boosting in the Ensemble method?

Observations are weighted.

Does bagging avoid overfitting?

Yes

Does SageMaker support automatic model tuning?

Yes

In SageMaker automatic model tuning, SHould you optimize on many hyperparameters at once?

What is warm start in SageMaker Automatic Model Tuning?

It uses one or more prebious tuning jobs as a starting point.

What is the grid search hyperparameter tuning approach?

It tries every possible combination.

What is the random search hyperparameter tuning approach?

It randomly chooses a combination of hyperparameters

What is the bayesian optimization hyperparameter tuning approach?

It treats tuning as a regression problem and learns from each run.

What is the hyperband hyperparameter tuning approach?

Best of all worlds.

What is SageMaker Autopiliot?

It selects the model and everything else for you based on your data. No expertise.

Does SageMaker Autopilot have a model notebook?

Yes

What is the SageMaker Autopilot model leaderboard?

It shows you a ranked list of the recommended model

What are the Autopiliot training modes?

Hyperparameter Training Mode Ensembling Auto

Can SageMaker Autopilot be explain in Sagemaker Clarify?

Yes

What is SageMaker Experiments?

It is a place to organize, capture, compare, and search on your ML jobs.

What is SageMaker Debugger?

Saves the internal model state at periodic intervals.

Can SageMaker Debugger generate alarms?

Yes

Is there a visual view of SageMaker Debugger?

Yes

What is the Sagemaker Model Registry?

It is a place for you to catalog and manage model versions.

Where can you manage the approval status of a model?

Sagemaker Model Registry

Can you deploy models to production from Sagemaker Model Registry?

Yes

What is the SageMaker Training Compiler?

It compiles and optimizes training jobs on GPU instances.

What are Sagemaker Warm Pools?

It retains and re-uses provisioned infrastructure.

What is Checkpointing in SageMaker?

It creates snapshots during your training and you can restart from those points.

What does SgaeMaker Cluster Health Checks do?

It checks GPU health and replaces faulty instances.

What instance types does Cluster Health Check run automatically on?

P and G instance types.

What are SageMaker Distributed Training Libraries?

The SageMaker Distributed Training Library is a set of tools and APIs provided by Amazon SageMaker to help you efficiently train machine learning models across multiple devices, like GPUs or machines, in parallel.

Using SageMaker Distributed Training Libraries, how can you free up GPU?

Use AllGather collective. This will offload communication to the CPU.

What is SageMaker Model Parallelism Library?

It allows you to distribute a model over multiple instances to overcome GPU memory limits.

What is SageMaker Data Parallelism?

Combines parallel data and models.

How can you improve the network speed of your SageMaker instances?

Use an Elastic Fabric Adapter. HPC