8.3 Model Training and Evaluation Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Supervised or unsupervised?: * Regression; * Ensemble trees; * Support vector machines (SVMs); * Neural networks;

A

Supervised;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Supervised or unsupervised?: * Clustering; * Dimension reduction; * Anomaly detection; deep learning networks;

A

Unsupervised;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three steps of model training?

A
  1. Method selection; 2. Performance evaluation; 3. Tuning;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If you have _______ data you can use supervised learning; if you have ___________ data, you would use unsupervised learning.

A

labeled; unlabeled;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What types of data should you be familiar with in the context of machine learning?

A

continuous, numerical or categorical; image data; text data; speech data;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is “bias error” due to overfitting or underfitting? Is it from the training data or from the out-of-sample data?

A

underfitting (not enough variables); from the training data;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is “variance error” due to overfitting or underfitting? Is it from the training data or in the out-of-sample data?

A

overfitting (too many variables); out-of-sample data;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do the X and Y axes stand for in a fitting curve for a learning model. Where on the curve do “bias error” and “variance error” lie?

Which curve represents in-sample (training sample) error?

A

X = Error; Y = Model Complexity;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In order to validate the fit of a machine learning algorithm, you can create a confusion matrix and then calculate various metrics from it. Using the following column headings “Actual: Default” and “Actual: No Default” and the following row titles “Prediction: Default” and “Prediction: No Default”, create a confusion matrix showing the results of the example classification problem “Classification of Defaulters”.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Using a confusion matrix, what is the formula for calculating the metric “precision”? Explain it verbally.

A

Precision (P) = TP / (TP + FP)

The ratio of true positives to all predicted positives. The sum of all predicted positives is the sum of the first row of results in the confusion matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Using a confusion matrix, what is the formula for the metric “recall”?

Explain it verbally.

A

Recall (R) = TP / (TP + FN)

the ratio of true positives to all actual positives. The sum of all actual positives is the sum of the values in the first column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using a confusion matrix, what is the formula for the metric “accuracy”?

Explain it verbally.

A

Accuracy = (TP + TN) / (TP + TN + FP + FN)

The proportion of correct forecasts out of a total number of forecasts.

numerator = the sum of the value in the upper left quadrant plus the value in the lower right quadrant; divided by the sum of all four values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does TPR stand for, what axis represents it, and what is the formula for calculating it?

A

TPR = true positive ratio;

Y-axis;

TPR = TP / (TP + FN) (same as the formula for “recall”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does FPR stand for, what axis represents it, and what is the formula for calculating it?

A

FPR = False Positive Ratio;

x-axis;

FPR = FP / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the definition of the “F1 Score”?

What is the formula for the F1 Score?

A

The F1 Score is the harmonic mean of precision and recall.

F1 Score = (2 x P x R) / (P + R)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Calculated from a confusion matrix:

(1) What does “ROC” stand for, and
(2) What does it show?
(3) What does the x-axis represent and what does the Y-axis represent?

A

ROC = Receiver operating characteristic;

The ROC plots a curve showing the tradeoff between FPs and TPs;

Y-axis = TPR (true positive ratio)

x-axis = FPR (false positive ratio)

17
Q

Using the following ROC curves, answer these questions:

(1) What does AUC stand for?
(2) Which line represents the best model?
(3) What would a curve representing 100% accuracy look like?
(4) What would a curve representing 0% accuracy look like?
(5) What is implied by the results of Model #2?

A

(1) Area under the curve;
(2) Model 3;
(3) A flat line going across the top of the graph;
(4) A flat line going across the bottom of the graph;
(5) 50% accuracy, therefore what one might expect from random guessing.