ML | Model evaluation | Basics | Priority Flashcards

Question 1

Q

Write a confusion matrix with labels.

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p194

Answer

A

(See source material.)

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p194

Question 2

Q

Equation for the error of a binary classification model. The numerator includes which cells in a confusion matrix?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Answer

A

err = (fp + fn) / (tp + fp + tn + fn)

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Question 3

Q

Equation for the accuracy of a binary classification model. The numerator includes which cells in a confusion matrix?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Answer

A

acc = (tp + tn)/ (tp + fp + tn + fn)

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Question 4

Q

Equation for the true positive rate of a binary classification model. The denominator includes which cells in a confusion matrix?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Answer

A

tpr = tp / (tp + fn)

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Question 5

Q

Equation for the false positive rate of a binary classification model. The denominator includes which cells in a confusion matrix?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Answer

A

fpr = fp / (fp + tn)

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p195

Question 6

Q

Equation for precision. Optimizing precision comes at the cost of which cell in a confusion matrix?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196

Answer

A

precision = tp / (tp + fp); fn

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196

Question 7

Q

Equation for recall. Optimizing recall comes at the cost of which cell in a confusion matrix?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196

Answer

A

recall = tp / (tp + fn); fp

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196

Question 8

Q

Equation for F1 in terms of precision and recall. What in a confusion matrix does F-score not take into account?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
Source

Answer

A

f1 = 2 * (p * r) / (p+r)
TN

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p196
Source

Question 9

Q

Explain how a ROC curve works?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p198

Answer

A

[Machine Learning with PyTorch and Scikit-Learn] Receiver operating characteristic (ROC) graphs are useful tools to select models for classification based on their performance with respect to the FPR and TPR, which are computed by shifting the decision threshold of the classifier. The diagonal of a ROC graph can be interpreted as random guessing, and classification models that fall below the diagonal are considered as worse than random guessing. A perfect classifier would fall into the top-left corner of the graph with a TPR of 1 and an FPR of 0. Based on the ROC curve, we can then compute the so-called ROC area under the curve (ROC AUC) to characterize the performance of a classification model.

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p198

Question 10

Q

How does accuracy typically relate to ROC AUC? How is AUC typically thought to be better?

Machine Learning with PyTorch and Scikit-Learn Source Chapter 6 p200

Answer

A

Typically similar. AUC is typically thought to better account for class imbalance.

Machine Learning with PyTorch and Scikit-Learn Source Chapter 6 p200

Question 11

Q

Equations for micro-average precision and macro-average precision.

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p200

Answer

A

p_micro = (tp_1 + … + tp_k) / (tp_1 + … + tp_k + fp_1 + … + fp_k)
P_macro = (p_1 + … + p_k) / k

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p200

Question 12

Q

What are some ways to deal with class imbaance?

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p202-203

Answer

A

assign a larger penalty to wrong predictions on the minority class (class_weight=’balanced’); upsampling the minority class (resample function), downsampling the majority class (using the resample function, we could simply swap the class 1 label with class 0), and the generation of synthetic training examples (Synthetic Minority Over-sampling Technique (SMOTE) via imbalanced-learn)

Machine Learning with PyTorch and Scikit-Learn Chapter 6 p202-203

Question 13

Q

Can you cite some examples where a false positive is more important than a false negative? (I.e want to minimize fp rather than minimize fn.)

Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q8 p23; Chapter 6 p196

Answer

A

Optimizing for high precision will decrease fps at the cost of increasing fns (i.e missed detections). Chemotherapy example: want to decrease fp (actual tumor = 0, predict tumor=1). (See source material.)

Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q8 p23; Chapter 6 p196

Question 14

Q

Can you cite some examples where a false negative is more important than a false positive? (I.e want to minimize fns rather than minimize fps).

Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q9; Chapter 6 p196

Answer

A

Optimizing high recall will decrease fns (i.e reduce missed detections) at the cost of increasing fps. Example: Don’t want to let criminal go free (actual crime=1, predict crime=0). Example: Don’t want to miss fraud (actual fraud=1, predict fraud = 0).

Nouri 800
Machine Learning with PyTorch and Scikit-Learn Data analysis q9; Chapter 6 p196

Question 15

Q

Can you cite some examples where both false positives and false negatives are equally important?

Nouri 800
Machine Learning with PyTorch and Scikit-Learn

Answer

A

Example: idea detection. (See source material.)

Nouri 800
Machine Learning with PyTorch and Scikit-Learn

Question 16

Q

What are the various steps involved in an analytics project?

Nouri 800 ML q21 p43

Answer

Study These Flashcards

A

(See source material.)

Nouri 800 ML q21 p43

Question 17

Q

What’s the difference between Type I and Type II error?

Source

Answer

Study These Flashcards

A

fp vs. fn. (See source material.)

Source

Question 18

Q

How to do error analysis in a machine learning pipeline?

Source

Answer

Study These Flashcards

A

Automated scoring: 1. Agreement – meets a threshold? 2. Confusion matrix + bar charts – biggest off-diagonals, examples, any examples mis-labeled? 3. Fairness; 4. Ablation. (See source material.)

Source

ML | Model evaluation | Basics | Priority Flashcards

(18 cards)