Predictive Data Mining Flashcards
Accuracy
Measure of classification success defined as 1 minus the overall error rate.
Average error
The average difference between the actual values and the predicted values of observations in a data set.
Bagging
An ensemble method that generates a committee of models based on random samples drawn with replacement and makes predictions based on the average prediction of the set of models.
Bias
The tendency of a predictive model to overestimate or underestimate the value of a continuous outcome.
Boosting
An ensemble method that iteratively samples from the original training data to generate individual models that target observations that were mispredicted in previously generated models. Its predictions are based on the weighted average of the predictions of the individual models, where the weights are proportional to the individual models’ accuracy.
Class error rate
Percentage of observations of a given class misclassified by a model in a data set
Classification confusion matrix
A matrix showing the counts of actual versus predicted class values.
Classification tree
A tree that classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.
Classification
A predictive data mining task requiring the prediction of an observation’s outcome class or category.
Cumulative lift chart
A chart used to present how well a model performs in identifying observations most likely to be in a given class as compared with random classification.
Cutoff value
The smallest value that the predicted probability of an observation can be for the observation to be classified as a given class.
Decile-wise lift chart
A chart used to present how well a model performs at identifying observations for each of the top k deciles most likely to be in a given class versus a random selection.
Ensemble method
A predictive data-mining approach in which a committee of individual classification or estimation models are generated and a prediction is made by combining these individual predictions.
Estimation
A predictive data mining task requiring the prediction of an observation’s continuous outcome value.
F1 score
A measure combining precision and sensitivity into a single metric.