Statistics Flashcards
Accuracy
TP + TN / TP + TN + FP + FN
Number of correct predictions /
Number of all predictions
Good general report of model performance with BALANCED data sets
Why is accuracy alone not enough to evaluate classification models?
Consider benign versus malignant tumors. A typical set of random people would include more than 90% benign (0) because they’re much more common than malignant. If a model predicts all the examples as 0 without making any calculation, the accuracy is more than 90% which is useless. We need other measures called precision and recall
Precision (PPV)
TP / TP + FP
Correct positives /
Positive tests
From all positive PREDICTED, how many are ACTUAL positive?
Focus on precision when you want to be confident in the YES the model gives you; that what’s your model pings is the real deal. It will miss some YES’s, but what it does ping as YES you can be confident in.
Applicant screening. Some viable applicants will get away, but when the model pings a viable applicant, you can be confident about it
Recall (Sensitivity/TPR)
TP / TP + FN
Correct positives /
Actual positives
From all ACTUALLY positive, how many we PREDICTED correctly
Increasing precision_______recall
Decreases
F1 score
2*precision*recall /
precision + recall
A weighted average of precision and recall to combine the two numbers
Use when working with IMBALANCED data sets
Trying to classify tweets by sentiment, positive, negative, neutral, but data set was unbalanced with way more neutral. F1 score describes overall model performance (caring equally about all three classes)
Sensitivity (Recall / TPR)
TP / TP + FN
1 - FNR
Correct positives /
Actual positives
How good is the model at catching YES’s?
A sensitive test helps rule out a disease when the test is negative.
Highly SeNsitive = SNout = rule out
Use sens/spec when every instance of what you’re looking for is too precious to let slip by (illnesses, fraud, terrorist attacks) sensitivity focused model will catch ALL REAL terrorist attacks, ALL TRUE cases of heart disease, etc.
CAVEAT: there will be some false positives: innocent travelers identified as terrorists, some healthy people labeled as diseased
Specificity (TNR)
TN / TN + FP
1-FPR
Correct negatives /
Actual negatives
How good is the model at catching NO’s?
Prevalence
The number of cases in a defined population at a single point in time. Expressed as a decimal or percentage
Positive predictive value (PPV) (Precision)
TP / TP + FP
Actual positive /
Tested positive
The probability that following a positive test result, that individual will TRULY have that disease. Also thought of as clinical relevance of a test.
Related to prevalence, whereas sensitivity and specificity are independent of prevalence.
As prevalence decreases, PPV decreases because there will be more false positives for every true positive
These enable you to rule in/out conditions but not definitively diagnose a condition
Negative predictive value (NPV)
TN / TN + FN
Actual Negative /
Tested Negative
The probability that following a NEGATIVE test result, that individual will TRULY NOT have that disease. Also thought of as clinical relevance of a test.
Related to prevalence, whereas sensitivity and specificity are independent of prevalence.
As prevalence decreases, NPV increases because there will be more true negatives for every false negative
These enable you to rule in/out conditions but not definitively diagnose a condition
Type I error
False Positive
REJECTING the NULL when it is TRUE
Alpha level
(significance level)
Probability of REJECTING the NULL when it is TRUE (type I error)
Beta level
Probability that you’ll fail to reject the null when it’s false (type II error)
i.e. ACCEPT the NULL when it’s FALSE
Type II error
False Negative
ACCEPTING the NULL when it’s FALSE