Model Evaluation Flashcards

Question 1

Q

Confusion Matrix

Answer

A

Used for assessing model performance

Can become large for multi-classification
For binary (e.g. Logistic regression):
1. True Positive: Correctly classified as positive (good)
2. False Positive: Incorrectly classified as positive (bad)
3. True Negative: Correctly classified as negative (good)
4. False Negative: Incorrectly classified as negative (bad)

Question 2

Q

Sensitivity

Answer

A

Also known as true positive rate, or recall:

Number of correct positives out of the actual positive results
I.e. % of correct predictions that were true classifications
TPs / (TPs + FNs)
Closer to 1 is better

Question 3

Q

Specificity

Answer

A

Also known as true negative rate
The number of correct negatives out of the actual negative results
TNs / (TNs + FPs)
Closer to 1 is better

Question 4

Q

When is sensitivity more important?

Answer

A

When False Positives are acceptable but False Negatives are not. E.g. Detecting fraudulent transactions, medical diagnosis

Question 5

Q

When is specificity more important?

Answer

A

When False Negatives are acceptable but False Positives are not. E.g. model that ensures images are appropriate for children.

Question 6

Q

Accuracy

Answer

A

The proportion of all predictions that were correctly identified. I.e. how right is the model?
- (TPs + TNs) / Total

Question 7

Q

Precision

Answer

A

The proportion of all actual positives that were correctly identified
- TPs / (TPs + FPs)

Question 8

Q

ROW / AUC

Answer

A

In binary classification, what is the point at which a value is classified to one side or the other? We can adjust our decision based on sensitivity or specificity
As we adjust the line up and down between 0 and 1, we achieve different confusion matrices with different results
We plot the TP rate and FP rate for each of the confusion matrices against each other and achieve the ROC curve.
The “knee points” in the curve will give us the optimum points for sensitivity and specificity
When comparing models, we plot a ROC for each, compare the AUC and choose the model with the largest

Question 9

Q

Gini Impurity

Answer

A

A metric for assessing the accuracy of decision trees
Goal is to assess the impact of the question itself, and whether it should be at the root node or not
We compare each of the features by Gini Impurity and choose that with the lowest weighted, which defines the feature which best separates values to each class

Question 10

Q

Gini Impurity (calculation)

Answer

A

1 - (probability of class 1)^2 - (probability of class 2)^2

Question 11

Q

F1 Score

Answer

A

Combination of Recall and Precision
2 / ((1/Recall) + (1/Precision))
Can be good to split where we have very similar accuracy scores
Takes into account both the FPs and FNs
Has been proven as a better indication of the model if you have an uneven class distribution

Question 12

Q

Linear regression metrics: SSE, Rsquared, Adj RSquared

Answer

A

SSE
R squared: 1 - (SSE / Var)
- Value between 0 and 1
- Extent to which the variance in the data is explained by the model
- Closer to 1 better
- Adding more variables leads to higher R squared - doesn’t account for overfitting
Adjusted R squared
- 1 - (1 - Rsquared) * ((no of data points - 1) /
  (no of data points - no of variables - 1))
- Takes into account the effect of adding more variables

Question 13

Q

Linear regression metrics: Confidence intervals

Answer

A

Normal distribution: Majority of density is contained within +/- 3 std devs of the mean
Central Limit Theorem: no matter what the original distribution is of X, the mean of X (i.e. X hat) will follow a normal distribution (helps to give us Confidence Intervals)
Confidence intervals quantify margin-of-error between sample metric and true metric due to sampling randomness
90% CI: if we randomly choice a sample from the population and make 100 predictions, 90 of those predictions should fall in our confidence interval
We can have CIs for proportions or means

Model Evaluation Flashcards

(13 cards)