Model Training, Tuning and Evaluation Flashcards

1
Q

What is an Activation Function?

A

It is a function within a Neuron that defines the output of a node / neuron based on its input signal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Linear Activation Function?

A

It mirrors what came into it as an output. Think of it as a pass-through.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can a Linear Activation Function perform back propagation?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Binary Step Function?

A

It is on or off. Like a light switch. It only has a single value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are non-linear activation functions better than linear ones?

A

They allow for back propagation and multiple layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Rectified Linear Unit (ReLU)?

A

Used for deep learning. Very fast and easy to compute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Leaky ReLU?

A

It introduces a negative slope below zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is PReLU?

A

It is like leaky ReLU, but the slope is learned from back propagation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Maxout?

A

It outputs the max of the inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Softmax?

A

The final output layer of a multi-class classification problem. It converts the outputs to a probability of each classification. Only handles a single label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can Sigmoids do that Softmax cannot?

A

Multiple classifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is TanH best for?

A

RNNs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the activation function selection in steps?

A

ReLu, Leaky ReLu, PReLU, Maxout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a CNN?

A

A Convolutional Neural Network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a CNN do?

A

It finds a feature within your data. This could be in text or something in an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the LeNet-5 CNN?

A

Used for handwriting analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the AlexNet CNN?

A

Used for image classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an RNN for?

A

Sequences of data. Time series, web logs, captions, machine translation, etc..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a recurrent neuron?

A

It is a neuron that remembers the data from previous runs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can you have a layer of recurrent neurons?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an Epoch?

A

An iteration in which we train.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Learning Rate?

A

A hyperparameter that controls how much of the model’s weights are adjusted with respect to the loss (error) after each iteration during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does too high a learning rate cause?

A

Overshooting the optimal solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does too low a learning rate cause?

A

Taking too long to find the optimal solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the batch size hyperparameter?

A

How many training samples are used within each batch of each epoch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is local minima?

A

A dip in the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Do smaller batch sizes get stuck in “local minima”?

A

Yes, but they can work their way out. Batch sizes that are too large end up getting stuck at the wrong solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does regularization do?

A

It prevents overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is overfitting?

A

When a model is good at making predictions on the training data, but not on the new data it hasn’t seen before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does dropout do?

A

It drops out specific neurons at random. Standard in CNNs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Can fewer layers or neurons prevent overfitting?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is early stopping?

A

Stopping at a specific epoch when your accuracy is degrading over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does L1 and L2 Regularization do?

A

They prevent overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the L1 formula

A

Term is the sum of the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the L2 formula

A

Term is the sum of the square of the weights

36
Q

What does L1 regularization really do?

A

It performs feature selection and some features can go to 0. It can result in sparse output.

37
Q

What does L2 regularization really do?

A

It ensures all features remain considered, but weights are applied. It can result in Dense outputs.

38
Q

What can Long short-term memory (LSTM) solve?

A

The vanishing gradient problem

39
Q

Will ResNet solve the vanishing gradient problem?

A

Yes

40
Q

What is a confusion matrix for?

A

A confusion matrix is a table used to evaluate the performance of a classification model. It compares the actual values (true labels) with the predicted values (predictions made by the model)

41
Q

What is the formula for Recall?

A

True Positives dived by True Positives + False Negatives.

42
Q

What does Recall measure?

A

The percentage of positives rightly predicted. Good choice for when you care about false negatives

43
Q

What is a good use case for Recall?

A

Fraud Detection

44
Q

What is the formula for Precision?

A

True positives divided by True Positives + False Positives.

45
Q

What does Precision measure?

A

Percent of relevant results. Good choice when you care about false positives.

46
Q

What is a good use case for Precision?

A

Medical screening, drug testing, etc..

47
Q

What is the formula for Specificity?

A

True Negatives divided by True negatives + false positive.

48
Q

What does Specificity measure?

A

True negative rate

49
Q

When is the F1 score useful?

A

When you care about precision and recall

50
Q

What does RMSE measure?

A

Accuracy. It only cares about right and wrong answers.

51
Q

What does the ROC curve showcase?

A

Plot of true positive rate versus false positive rate.

52
Q

What does the AUC curve showcase?

A

Probability that a classifier will rank a randomly chosen positive instance higher than a negative one.

53
Q

What is the P-R Curve?

A

It shows Precision and Recall. Higher area under curve is better. Similar to ROC and good for informational retrival.

54
Q

What are RMSE and MAE used for?

A

Measuring numerical predictions instead of classifications

55
Q

What is the Ensemble method?

A

It allows for the end result to essentially be voted on.

56
Q

What is bagging in the Ensemble method?

A

It generates N new training sets by random sampling.

57
Q

What is boosting in the Ensemble method?

A

Observations are weighted.

58
Q

Does bagging avoid overfitting?

A

Yes

59
Q

Does SageMaker support automatic model tuning?

A

Yes

60
Q

In SageMaker automatic model tuning, SHould you optimize on many hyperparameters at once?

A

No

61
Q

What is warm start in SageMaker Automatic Model Tuning?

A

It uses one or more prebious tuning jobs as a starting point.

62
Q

What is the grid search hyperparameter tuning approach?

A

It tries every possible combination.

63
Q

What is the random search hyperparameter tuning approach?

A

It randomly chooses a combination of hyperparameters

64
Q

What is the bayesian optimization hyperparameter tuning approach?

A

It treats tuning as a regression problem and learns from each run.

65
Q

What is the hyperband hyperparameter tuning approach?

A

Best of all worlds.

66
Q

What is SageMaker Autopiliot?

A

It selects the model and everything else for you based on your data. No expertise.

67
Q

Does SageMaker Autopilot have a model notebook?

A

Yes

68
Q

What is the SageMaker Autopilot model leaderboard?

A

It shows you a ranked list of the recommended model

69
Q

What are the Autopiliot training modes?

A

Hyperparameter Training Mode

Ensembling

Auto

70
Q

Can SageMaker Autopilot be explain in Sagemaker Clarify?

A

Yes

71
Q

What is SageMaker Experiments?

A

It is a place to organize, capture, compare, and search on your ML jobs.

72
Q

What is SageMaker Debugger?

A

Saves the internal model state at periodic intervals.

73
Q

Can SageMaker Debugger generate alarms?

A

Yes

74
Q

Is there a visual view of SageMaker Debugger?

A

Yes

75
Q

What is the Sagemaker Model Registry?

A

It is a place for you to catalog and manage model versions.

76
Q

Where can you manage the approval status of a model?

A

Sagemaker Model Registry

77
Q

Can you deploy models to production from Sagemaker Model Registry?

A

Yes

78
Q

What is the SageMaker Training Compiler?

A

It compiles and optimizes training jobs on GPU instances.

79
Q

What are Sagemaker Warm Pools?

A

It retains and re-uses provisioned infrastructure.

80
Q

What is Checkpointing in SageMaker?

A

It creates snapshots during your training and you can restart from those points.

81
Q

What does SgaeMaker Cluster Health Checks do?

A

It checks GPU health and replaces faulty instances.

82
Q

What instance types does Cluster Health Check run automatically on?

A

P and G instance types.

83
Q

What are SageMaker Distributed Training Libraries?

A

The SageMaker Distributed Training Library is a set of tools and APIs provided by Amazon SageMaker to help you efficiently train machine learning models across multiple devices, like GPUs or machines, in parallel.

84
Q

Using SageMaker Distributed Training Libraries, how can you free up GPU?

A

Use AllGather collective. This will offload communication to the CPU.

85
Q

What is SageMaker Model Parallelism Library?

A

It allows you to distribute a model over multiple instances to overcome GPU memory limits.

86
Q

What is SageMaker Data Parallelism?

A

Combines parallel data and models.

87
Q

How can you improve the network speed of your SageMaker instances?

A

Use an Elastic Fabric Adapter. HPC