Model Evaluation Flashcards
Accuracy
Measures the proportion of correctly classified predictions in the model.
Accuracy = (TP+TN)/(TP+TN+FN+FP)
Sensitivity / Recall
True positive rate (TPR)
TPR = TP/(TP+FN)
Looking at all the actual positives, how many of them correctly classified as positive?
Specificity
True negative rate (TNR)
TNR = TN/(TN+FP)
Looking at all the actual negatives, how many of them were correctly classified as negative?
Precision
Precision = TP/(TP+FP)
Look at all predicted positives cases, how many of them were correctly classified as positive?
“If I predict something negative and it’s wrong, it’s fine, but let the ones predicted “positive” be good!
ROC Curve
Confusion matrix is at the basis of the Receiver Operating Characteristic (ROC) curve.
The ROC curve allows to compare different models of classification. Each point in the ROC curve corresponds to a different cut-off thresholds of the model.
X axis - rate of false positives / (amongst all negatives) (1 - specificity) → indicator of how well one classifies negative cases.
Y axis - rate of true positives / (amongst all positives) (sensitivity) → indicator of how well one classifies positive cases.
Depending on what your priorities are, you might choose a model that allows more or less of either.
AUC
Sometimes the curves of the chart does not clearly show which model performs better. The AUC metric can help.
Gain Chart
have a population of customers. On average, 10% of our customers end up buying the product. We want to select a subgroup of our population whom to send marketing emails, 60% of which will buy the product. Hence we want out model to predict the people that will buy the product. We want a high True Positive Rate (sensitivity).
The gain chart shows us how our models’ target metric (sensitivity) changes when we change sample size, as we send more marketing emails.
The greater the distance between the lift curve and the baseline, the better the model.
Lift Chart
The lift chart shows how many times will the target in the selected sample increase with respect to the random sample, with increasing percentage of the population.
Lift is calculated as the ration between the results in the target, obtained with and without the model.
If we contact only 10% of the population, when customers contacted are chosen with a model, the response rate is 35%, and with random sampling the response rate is 10%.
Kolmogorov-Smirnov graph
a measure of the degree of separation between the positives and negative distributions.
K-S-Value = 100, population divided into two completely separate groups, one containing positives, other negatives.
K-S-Value = 0, the model is not able to differentiate between positive and negatives, then the model works as a random selection.
Akaike Information Criterion
Combines the goodness of the model with its complexity, measured by the number of independent variables.
AIC = -2\log (hat(L)) + 2(k+1)
Where hat(L) is the maximum of the likelihood function and k the number of independent variables.
The lower the better!
MacFadden’s R2
1 - (LL_fullmodel/ LL_intercept)
Compares LL of model with LL of a model with only the intercept. The closer to 1, the better. Values tend to be smaller, and values such as 0.2 or 0.4 can be considered satisfactory.
R2 of Cox and Snell
R_cs2 = 1-exp[-2/n [LL(B)-LL(0)]]
Takes into account also the size of the data sample.
Nagelkerke R2
R_N2 = R_{CS}^2 / R_{MAX}^2
where
R_MAX^2 = 1- exp[2/n-LL(0)]
Measures for Quantitative target model evaluation
MAE - Mean Absolute Error
MSE - Mean Squared Error
MPE - Mean Percentage Error
MAPE - Mean Absolute Percentage Error