Chapter 25 How to Develop and Evaluate Naive Classifier Strategies Flashcards
Given a classification model, how do you know if the model has skill or not?
P 253
This is a common question on every classification predictive modeling project. The answer is to compare the results of a given classifier model to a baseline or naive classifier model.
If a classification model performs better than a naive classifier, then it has some skill. If a classifier model performs worse than the naive classifier, it does not have any skill.
How do Naïve classifiers make predictions?
P 253
A naive classifier model is one that does not use any sophistication in order to make a prediction, typically making a random or constant prediction
Given not all naive classifiers are created equal, and some perform better than others. How should we choose a naïve classifier?
P 253
We should use the best-performing naive classifier on all of our classification predictive modeling projects. We can use simple probability to evaluate the performance of different naive classifier models and confirm the one strategy that should always be used as the naive classifier.
The probabilistic version of Accuracy:
P(yhat=y) = P(yhat=0) × P(y=0) + P(yhat=1) × P(y=1)
What’s the best performing naïve classifier regardless of the number of classes or imbalanced data?
P 258
The majority class naive classifier is the method that should be used to calculate a baseline performance on your classification predictive modeling problems. It works just as well for those datasets with an equal number of class labels, and for problems with more than two class labels, e.g. multiclass classification problems.
The scikit-learn machine learning library provides an implementation of the majority class naive
classification algorithm as part of the DummyClassifier class. To use the naive classifier, the class must be defined and the strategy argument set to ‘most frequent’ to ensure that the majority class is predicted. The class can then be fit on a training dataset and used to make predictions.
strategies:
Random Guess: Set the strategy argument to ‘uniform’.
Select Random Class: Set the strategy argument to ‘stratified’.
Majority Class: Set the strategy argument to ‘most_frequent’.