Model Training, Tuning and Evaluation Flashcards

1
Q

What is an Activation Function?

A

It is a function within a Neuron that defines the output of a node / neuron based on its input signal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Linear Activation Function?

A

It mirrors what came into it as an output. Think of it as a pass-through.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can a Linear Activation Function perform back propagation?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Binary Step Function?

A

It is on or off. Like a light switch. It only has a single value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are non-linear activation functions better than linear ones?

A

They allow for back propagation and multiple layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Rectified Linear Unit (ReLU)?

A

Used for deep learning. Very fast and easy to compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Leaky ReLU?

A

It introduces a negative slope below zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is PReLU?

A

It is like leaky ReLU, but the slope is learned from back propagation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Maxout?

A

It outputs the max of the inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Softmax?

A

The final output layer of a multi-class classification problem. It converts the outputs to a probability of each classification. Only handles a single label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can Sigmoids do that Softmax cannot?

A

Multiple classifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is TanH best for?

A

RNNs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the activation function selection in steps?

A

ReLu, Leaky ReLu, PReLU, Maxout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a CNN?

A

A Convolutional Neural Network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a CNN do?

A

It finds a feature within your data. This could be in text or something in an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the LeNet-5 CNN?

A

Used for handwriting analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the AlexNet CNN?

A

Used for image classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an RNN for?

A

Sequences of data. Time series, web logs, captions, machine translation, etc..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a recurrent neuron?

A

It is a neuron that remembers the data from previous runs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can you have a layer of recurrent neurons?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an Epoch?

A

An iteration in which we train.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Learning Rate?

A

A hyperparameter that controls how much of the model’s weights are adjusted with respect to the loss (error) after each iteration during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does too high a learning rate cause?

A

Overshooting the optimal solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does too low a learning rate cause?

A

Taking too long to find the optimal solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the batch size hyperparameter?
How many training samples are used within each batch of each epoch.
26
What is local minima?
A dip in the graph.
27
Do smaller batch sizes get stuck in "local minima"?
Yes, but they can work their way out. Batch sizes that are too large end up getting stuck at the wrong solution.
28
What does regularization do?
It prevents overfitting.
29
What is overfitting?
When a model is good at making predictions on the training data, but not on the new data it hasn't seen before.
30
What does dropout do?
It drops out specific neurons at random. Standard in CNNs
31
Can fewer layers or neurons prevent overfitting?
Yes
32
What is early stopping?
Stopping at a specific epoch when your accuracy is degrading over time.
33
What does L1 and L2 Regularization do?
They prevent overfitting
34
What is the L1 formula
Term is the sum of the weights
35
What is the L2 formula
Term is the sum of the square of the weights
36
What does L1 regularization really do?
It performs feature selection and some features can go to 0. It can result in sparse output.
37
What does L2 regularization really do?
It ensures all features remain considered, but weights are applied. It can result in Dense outputs.
38
What can Long short-term memory (LSTM) solve?
The vanishing gradient problem
39
Will ResNet solve the vanishing gradient problem?
Yes
40
What is a confusion matrix for?
A confusion matrix is a table used to evaluate the performance of a classification model. It compares the actual values (true labels) with the predicted values (predictions made by the model)
41
What is the formula for Recall?
True Positives dived by True Positives + False Negatives.
42
What does Recall measure?
The percentage of positives rightly predicted. Good choice for when you care about false negatives
43
What is a good use case for Recall?
Fraud Detection
44
What is the formula for Precision?
True positives divided by True Positives + False Positives.
45
What does Precision measure?
Percent of relevant results. Good choice when you care about false positives.
46
What is a good use case for Precision?
Medical screening, drug testing, etc..
47
What is the formula for Specificity?
True Negatives divided by True negatives + false positive.
48
What does Specificity measure?
True negative rate
49
When is the F1 score useful?
When you care about precision and recall
50
What does RMSE measure?
Accuracy. It only cares about right and wrong answers.
51
What does the ROC curve showcase?
Plot of true positive rate versus false positive rate.
52
What does the AUC curve showcase?
Probability that a classifier will rank a randomly chosen positive instance higher than a negative one.
53
What is the P-R Curve?
It shows Precision and Recall. Higher area under curve is better. Similar to ROC and good for informational retrival.
54
What are RMSE and MAE used for?
Measuring numerical predictions instead of classifications
55
What is the Ensemble method?
It allows for the end result to essentially be voted on.
56
What is bagging in the Ensemble method?
It generates N new training sets by random sampling.
57
What is boosting in the Ensemble method?
Observations are weighted.
58
Does bagging avoid overfitting?
Yes
59
Does SageMaker support automatic model tuning?
Yes
60
In SageMaker automatic model tuning, SHould you optimize on many hyperparameters at once?
No
61
What is warm start in SageMaker Automatic Model Tuning?
It uses one or more prebious tuning jobs as a starting point.
62
What is the grid search hyperparameter tuning approach?
It tries every possible combination.
63
What is the random search hyperparameter tuning approach?
It randomly chooses a combination of hyperparameters
64
What is the bayesian optimization hyperparameter tuning approach?
It treats tuning as a regression problem and learns from each run.
65
What is the hyperband hyperparameter tuning approach?
Best of all worlds.
66
What is SageMaker Autopiliot?
It selects the model and everything else for you based on your data. No expertise.
67
Does SageMaker Autopilot have a model notebook?
Yes
68
What is the SageMaker Autopilot model leaderboard?
It shows you a ranked list of the recommended model
69
What are the Autopiliot training modes?
Hyperparameter Training Mode Ensembling Auto
70
Can SageMaker Autopilot be explain in Sagemaker Clarify?
Yes
71
What is SageMaker Experiments?
It is a place to organize, capture, compare, and search on your ML jobs.
72
What is SageMaker Debugger?
Saves the internal model state at periodic intervals.
73
Can SageMaker Debugger generate alarms?
Yes
74
Is there a visual view of SageMaker Debugger?
Yes
75
What is the Sagemaker Model Registry?
It is a place for you to catalog and manage model versions.
76
Where can you manage the approval status of a model?
Sagemaker Model Registry
77
Can you deploy models to production from Sagemaker Model Registry?
Yes
78
What is the SageMaker Training Compiler?
It compiles and optimizes training jobs on GPU instances.
79
What are Sagemaker Warm Pools?
It retains and re-uses provisioned infrastructure.
80
What is Checkpointing in SageMaker?
It creates snapshots during your training and you can restart from those points.
81
What does SgaeMaker Cluster Health Checks do?
It checks GPU health and replaces faulty instances.
82
What instance types does Cluster Health Check run automatically on?
P and G instance types.
83
What are SageMaker Distributed Training Libraries?
The SageMaker Distributed Training Library is a set of tools and APIs provided by Amazon SageMaker to help you efficiently train machine learning models across multiple devices, like GPUs or machines, in parallel.
84
Using SageMaker Distributed Training Libraries, how can you free up GPU?
Use AllGather collective. This will offload communication to the CPU.
85
What is SageMaker Model Parallelism Library?
It allows you to distribute a model over multiple instances to overcome GPU memory limits.
86
What is SageMaker Data Parallelism?
Combines parallel data and models.
87
How can you improve the network speed of your SageMaker instances?
Use an Elastic Fabric Adapter. HPC