ML | Model evaluation | Basics | Priority Flashcards
Write a confusion matrix with labels.
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p194
(See source material.)
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p194
Equation for the error of a binary classification model. The numerator includes which cells in a confusion matrix?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
err = (fp + fn) / (tp + fp + tn + fn)
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
Equation for the accuracy of a binary classification model. The numerator includes which cells in a confusion matrix?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
acc = (tp + tn)/ (tp + fp + tn + fn)
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
Equation for the true positive rate of a binary classification model. The denominator includes which cells in a confusion matrix?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
tpr = tp / (tp + fn)
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
Equation for the false positive rate of a binary classification model. The denominator includes which cells in a confusion matrix?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
fpr = fp / (fp + tn)
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195
Equation for precision. Optimizing precision comes at the cost of which cell in a confusion matrix?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
precision = tp / (tp + fp); fn
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
Equation for recall. Optimizing recall comes at the cost of which cell in a confusion matrix?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
recall = tp / (tp + fn); fp
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
Equation for F1 in terms of precision and recall. What in a confusion matrix does F-score not take into account?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
Source
f1 = 2 * (p * r) / (p+r)
TN
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
Source
Explain how a ROC curve works?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p198
[Machine Learning with PyTorch and Scikit-Learn] Receiver operating characteristic (ROC) graphs are useful tools to select models for classification based on their performance with respect to the FPR and TPR, which are computed by shifting the decision threshold of the classifier. The diagonal of a ROC graph can be interpreted as random guessing, and classification models that fall below the diagonal are considered as worse than random guessing. A perfect classifier would fall into the top-left corner of the graph with a TPR of 1 and an FPR of 0. Based on the ROC curve, we can then compute the so-called ROC area under the curve (ROC AUC) to characterize the performance of a classification model.
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p198
How does accuracy typically relate to ROC AUC? How is AUC typically thought to be better?
Machine Learning with PyTorch and Scikit-Learn Source Chapter 6 p200
Typically similar. AUC is typically thought to better account for class imbalance.
Machine Learning with PyTorch and Scikit-Learn Source Chapter 6 p200
Equations for micro-average precision and macro-average precision.
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p200
p_micro = (tp_1 + … + tp_k) / (tp_1 + … + tp_k + fp_1 + … + fp_k)
P_macro = (p_1 + … + p_k) / k
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p200
What are some ways to deal with class imbaance?
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p202-203
assign a larger penalty to wrong predictions on the minority class (class_weight=’balanced’); upsampling the minority class (resample function), downsampling the majority class (using the resample function, we could simply swap the class 1 label with class 0), and the generation of synthetic training examples (Synthetic Minority Over-sampling Technique (SMOTE) via imbalanced-learn)
Machine Learning with PyTorch and Scikit-Learn Chapter 6 p202-203
Can you cite some examples where a false positive is more important than a false negative? (I.e want to minimize fp rather than minimize fn.)
Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q8 p23; Chapter 6 p196
Optimizing for high precision will decrease fps at the cost of increasing fns (i.e missed detections). Chemotherapy example: want to decrease fp (actual tumor = 0, predict tumor=1). (See source material.)
Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q8 p23; Chapter 6 p196
Can you cite some examples where a false negative is more important than a false positive? (I.e want to minimize fns rather than minimize fps).
Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q9; Chapter 6 p196
Optimizing high recall will decrease fns (i.e reduce missed detections) at the cost of increasing fps. Example: Don’t want to let criminal go free (actual crime=1, predict crime=0). Example: Don’t want to miss fraud (actual fraud=1, predict fraud = 0).
Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q9; Chapter 6 p196
Can you cite some examples where both false positives and false negatives are equally important?
Example: idea detection. (See source material.)