Chapter 5 Flashcards
Prediction
Average Error, MAPE ( Mean Absolute Percentage Error ), RMSE (Root-Mean-Square-Error) , Validation Data
Classification
Classification matrix, specificity, sensitivity
ROC (Receiver Operating Characteristic)
to assess performance at different cutoff values
Detect overfitting
compare validation to training data:
some differences expected, extreme differences may indicate overfitting
Naïve rule
classify all records as belonging to the most prevalent class benchmark: we hope to do better than that Using external predictor info should outperform naïve rule
Exception to Naïve rule
when goal is to identify high-value but rare outcomes, we may do well by doing worse than the naïve rule (see “lift” – later)
There are various performance measures comparing to the naïve rule
For example: multiple R squared, measures classifier fit to naïve rule.
The equivalent to using the naïve rule for classification is y^ (the sample mean)
Lift Chart for Predictive Error
Y axis is cumulative value of numeric target variable (e.g., revenue), instead of cumulative count of “responses”
X axis is cumulative number of cases, sorted left to right in order of predicted value
Benchmark is average numeric value per record, i.e. not using model (aka The Naïve Rule)
Misclassification error
Error = classifying a record as belonging to one class when it belongs to another class.
Error rate = percent of misclassified records out of the total records in the validation data
“High separation of records”
means that using predictor variables attains low error
“Low separation of records”
means that using predictor variables does not improve much on naïve rule
Confusion Matrix
actual class predicted
Accuracy
1 – err
Cutoff Table
cut off is .50 , so everything above should be 1 and everything below should be 0. any records that are otherwise, it counted as misclassification
When One Class is More Important
we are willing to tolerate greater overall error, in return for better identifying the important class for further attention
Sensitivity
The ability to detect important class members correctly
Specificity
ability to rule out C0 members classified correctly
False positive
% of predicted “C1’s” that were not “C1’s”
false alarm,
indicates a given condition exists, when it does not.
False negative
% of predicted “C0’s” that were not “C0’s”
indicates a given condition not exists, but it really does
Lift and Decile Charts: Goal
Useful for assessing performance in terms of identifying the most important class The goal is to obtain a rank ordering among the records according to their estimated probabilities of class membership Compare performance of DM model to “no model, pick randomly”
Decile Chart
In “most probable” (top) decile, model is twice as likely to identify the important class compared to avg. prevalence.
Lift vs. Decile Charts
Decile chart does this in decile chunks of data
Y axis shows ratio of decile mean to overall mean
Lift chart shows continuous cumulative results Y axis shows number of important class records identified