8.3 Model Training and Evaluation Flashcards
Supervised or unsupervised?: * Regression; * Ensemble trees; * Support vector machines (SVMs); * Neural networks;
Supervised;
Supervised or unsupervised?: * Clustering; * Dimension reduction; * Anomaly detection; deep learning networks;
Unsupervised;
What are the three steps of model training?
- Method selection; 2. Performance evaluation; 3. Tuning;
If you have _______ data you can use supervised learning; if you have ___________ data, you would use unsupervised learning.
labeled; unlabeled;
What types of data should you be familiar with in the context of machine learning?
continuous, numerical or categorical; image data; text data; speech data;
Is “bias error” due to overfitting or underfitting? Is it from the training data or from the out-of-sample data?
underfitting (not enough variables); from the training data;
Is “variance error” due to overfitting or underfitting? Is it from the training data or in the out-of-sample data?
overfitting (too many variables); out-of-sample data;
What do the X and Y axes stand for in a fitting curve for a learning model. Where on the curve do “bias error” and “variance error” lie?
Which curve represents in-sample (training sample) error?
X = Error; Y = Model Complexity;
In order to validate the fit of a machine learning algorithm, you can create a confusion matrix and then calculate various metrics from it. Using the following column headings “Actual: Default” and “Actual: No Default” and the following row titles “Prediction: Default” and “Prediction: No Default”, create a confusion matrix showing the results of the example classification problem “Classification of Defaulters”.
Using a confusion matrix, what is the formula for calculating the metric “precision”? Explain it verbally.
Precision (P) = TP / (TP + FP)
The ratio of true positives to all predicted positives. The sum of all predicted positives is the sum of the first row of results in the confusion matrix.
Using a confusion matrix, what is the formula for the metric “recall”?
Explain it verbally.
Recall (R) = TP / (TP + FN)
the ratio of true positives to all actual positives. The sum of all actual positives is the sum of the values in the first column.
Using a confusion matrix, what is the formula for the metric “accuracy”?
Explain it verbally.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
The proportion of correct forecasts out of a total number of forecasts.
numerator = the sum of the value in the upper left quadrant plus the value in the lower right quadrant; divided by the sum of all four values.
What does TPR stand for, what axis represents it, and what is the formula for calculating it?
TPR = true positive ratio;
Y-axis;
TPR = TP / (TP + FN) (same as the formula for “recall”)
What does FPR stand for, what axis represents it, and what is the formula for calculating it?
FPR = False Positive Ratio;
x-axis;
FPR = FP / (FP + TN)
What is the definition of the “F1 Score”?
What is the formula for the F1 Score?
The F1 Score is the harmonic mean of precision and recall.
F1 Score = (2 x P x R) / (P + R)