Model Training, Tuning and Evaluation Flashcards
What is an Activation Function?
It is a function within a Neuron that defines the output of a node / neuron based on its input signal.
What is a Linear Activation Function?
It mirrors what came into it as an output. Think of it as a pass-through.
Can a Linear Activation Function perform back propagation?
No
What is a Binary Step Function?
It is on or off. Like a light switch. It only has a single value.
Why are non-linear activation functions better than linear ones?
They allow for back propagation and multiple layers.
What is a Rectified Linear Unit (ReLU)?
Used for deep learning. Very fast and easy to compute.
What is Leaky ReLU?
It introduces a negative slope below zero
What is PReLU?
It is like leaky ReLU, but the slope is learned from back propagation.
What is Maxout?
It outputs the max of the inputs.
What is Softmax?
The final output layer of a multi-class classification problem. It converts the outputs to a probability of each classification. Only handles a single label.
What can Sigmoids do that Softmax cannot?
Multiple classifications
What is TanH best for?
RNNs
What is the activation function selection in steps?
ReLu, Leaky ReLu, PReLU, Maxout
What is a CNN?
A Convolutional Neural Network
What does a CNN do?
It finds a feature within your data. This could be in text or something in an image.
What is the LeNet-5 CNN?
Used for handwriting analysis
What is the AlexNet CNN?
Used for image classification
What is an RNN for?
Sequences of data. Time series, web logs, captions, machine translation, etc..
What is a recurrent neuron?
It is a neuron that remembers the data from previous runs.
Can you have a layer of recurrent neurons?
Yes
What is an Epoch?
An iteration in which we train.
What is Learning Rate?
A hyperparameter that controls how much of the model’s weights are adjusted with respect to the loss (error) after each iteration during training.
What does too high a learning rate cause?
Overshooting the optimal solution.
What does too low a learning rate cause?
Taking too long to find the optimal solution.
What is the batch size hyperparameter?
How many training samples are used within each batch of each epoch.
What is local minima?
A dip in the graph.
Do smaller batch sizes get stuck in “local minima”?
Yes, but they can work their way out. Batch sizes that are too large end up getting stuck at the wrong solution.
What does regularization do?
It prevents overfitting.
What is overfitting?
When a model is good at making predictions on the training data, but not on the new data it hasn’t seen before.
What does dropout do?
It drops out specific neurons at random. Standard in CNNs
Can fewer layers or neurons prevent overfitting?
Yes
What is early stopping?
Stopping at a specific epoch when your accuracy is degrading over time.
What does L1 and L2 Regularization do?
They prevent overfitting
What is the L1 formula
Term is the sum of the weights
What is the L2 formula
Term is the sum of the square of the weights
What does L1 regularization really do?
It performs feature selection and some features can go to 0. It can result in sparse output.
What does L2 regularization really do?
It ensures all features remain considered, but weights are applied. It can result in Dense outputs.
What can Long short-term memory (LSTM) solve?
The vanishing gradient problem
Will ResNet solve the vanishing gradient problem?
Yes
What is a confusion matrix for?
A confusion matrix is a table used to evaluate the performance of a classification model. It compares the actual values (true labels) with the predicted values (predictions made by the model)
What is the formula for Recall?
True Positives dived by True Positives + False Negatives.
What does Recall measure?
The percentage of positives rightly predicted. Good choice for when you care about false negatives
What is a good use case for Recall?
Fraud Detection
What is the formula for Precision?
True positives divided by True Positives + False Positives.
What does Precision measure?
Percent of relevant results. Good choice when you care about false positives.
What is a good use case for Precision?
Medical screening, drug testing, etc..
What is the formula for Specificity?
True Negatives divided by True negatives + false positive.
What does Specificity measure?
True negative rate
When is the F1 score useful?
When you care about precision and recall
What does RMSE measure?
Accuracy. It only cares about right and wrong answers.
What does the ROC curve showcase?
Plot of true positive rate versus false positive rate.
What does the AUC curve showcase?
Probability that a classifier will rank a randomly chosen positive instance higher than a negative one.
What is the P-R Curve?
It shows Precision and Recall. Higher area under curve is better. Similar to ROC and good for informational retrival.
What are RMSE and MAE used for?
Measuring numerical predictions instead of classifications
What is the Ensemble method?
It allows for the end result to essentially be voted on.
What is bagging in the Ensemble method?
It generates N new training sets by random sampling.
What is boosting in the Ensemble method?
Observations are weighted.
Does bagging avoid overfitting?
Yes
Does SageMaker support automatic model tuning?
Yes
In SageMaker automatic model tuning, SHould you optimize on many hyperparameters at once?
No
What is warm start in SageMaker Automatic Model Tuning?
It uses one or more prebious tuning jobs as a starting point.
What is the grid search hyperparameter tuning approach?
It tries every possible combination.
What is the random search hyperparameter tuning approach?
It randomly chooses a combination of hyperparameters
What is the bayesian optimization hyperparameter tuning approach?
It treats tuning as a regression problem and learns from each run.
What is the hyperband hyperparameter tuning approach?
Best of all worlds.
What is SageMaker Autopiliot?
It selects the model and everything else for you based on your data. No expertise.
Does SageMaker Autopilot have a model notebook?
Yes
What is the SageMaker Autopilot model leaderboard?
It shows you a ranked list of the recommended model
What are the Autopiliot training modes?
Hyperparameter Training Mode
Ensembling
Auto
Can SageMaker Autopilot be explain in Sagemaker Clarify?
Yes
What is SageMaker Experiments?
It is a place to organize, capture, compare, and search on your ML jobs.
What is SageMaker Debugger?
Saves the internal model state at periodic intervals.
Can SageMaker Debugger generate alarms?
Yes
Is there a visual view of SageMaker Debugger?
Yes
What is the Sagemaker Model Registry?
It is a place for you to catalog and manage model versions.
Where can you manage the approval status of a model?
Sagemaker Model Registry
Can you deploy models to production from Sagemaker Model Registry?
Yes
What is the SageMaker Training Compiler?
It compiles and optimizes training jobs on GPU instances.
What are Sagemaker Warm Pools?
It retains and re-uses provisioned infrastructure.
What is Checkpointing in SageMaker?
It creates snapshots during your training and you can restart from those points.
What does SgaeMaker Cluster Health Checks do?
It checks GPU health and replaces faulty instances.
What instance types does Cluster Health Check run automatically on?
P and G instance types.
What are SageMaker Distributed Training Libraries?
The SageMaker Distributed Training Library is a set of tools and APIs provided by Amazon SageMaker to help you efficiently train machine learning models across multiple devices, like GPUs or machines, in parallel.
Using SageMaker Distributed Training Libraries, how can you free up GPU?
Use AllGather collective. This will offload communication to the CPU.
What is SageMaker Model Parallelism Library?
It allows you to distribute a model over multiple instances to overcome GPU memory limits.
What is SageMaker Data Parallelism?
Combines parallel data and models.
How can you improve the network speed of your SageMaker instances?
Use an Elastic Fabric Adapter. HPC