CH3 Classification Flashcards
What is binary classifier?
Let’s simplify the problem for now and only try to identify one digit—for example, the number 5. This “5-detector” will be an example of a binary classifier, capable of
distinguishing between just two classes, 5 and not-5
What is the advantage of the SGD?
Stochastic Gradient Descent (SGD) classifier, using Scikit-Learn’s SGDClassifier class. This clas‐ sifier has the advantage of being capable of handling very large datasets efficiently. This is in part because SGD deals with training instances independently, one at a time
(which also makes SGD well suited for online learning)
What are the performance measures that are available?
- measuring accuracy with cross-validation
- confusion matrix
- precision
- recall
- ROC curve
Why is accuracy generally not the preferred performance measure?
This demonstrates why accuracy is generally not the preferred performance measure for classifiers, especially when you are dealing with skewed datasets (i.e., when some
classes are much more frequent than others)
What is the general idea of the confusion matrix?
The general idea is to count the number of times instances of class A are classified as class B.H
How to compute the confusion matrix?
To compute the confusion matrix, you first need to have a set of predictions, so they can be compared to the actual targets.
Just like the cross_val_score() function, cross_val_predict() performs K-fold cross-validation, but instead of returning the evaluation scores, it returns the predictions made on each test fold. This means that you get a clean prediction for each instance in the training set (“clean” meaning that the prediction is made by a model
that never saw the data during training)
Now you are ready to get the confusion matrix using the confusion_matrix() func‐ tion.
What does the confusion matrix tell?
Each row in a confusion matrix represents an actual class, while each column represents a predicted class.
A perfect classifier would have only true positives and true negatives, so its confusion matrix would have nonzero values only on its main diago‐
nal (top left to bottom right)
How is precision calculated?
= TP / (TP + FP)
What is the formula for recall / sensitivity / true positive rate (TPR)?
= TP (TP + FN)
the ratio of positive instances that are correctly detected by the classifier
What is the F1-score?
It is often convenient to combine precision and recall into a single metric called the F1 score, in particular if you need a simple way to compare two classifiers. The F1
score is
the harmonic mean of precision and recall (Equation 3-3). Whereas the regular mean treats all values equally, the harmonic mean gives much more weight to low values. As a result, the classifier will only get a high F1
score if both recall and precision are
high.
What is the formula of F1-score?
TP / (TP + (FN + FP)/2)W
What is the precision/recall tradeoff?
increasing precision reduces recall, and vice versa
What is the decision function and decision theshold?
For each instance, it computes a score based on a decision function, and if that score is greater than a threshold, it assigns the instance to the positive
class, or else it assigns it to the negative class.
How to set different thresholds to compute the precision and recall?
you can call its decision_function() method, which returns a score for each instance, and then make predictions based on those scores using any
threshold you want
How do you decide which threshold to use?
For this you will first need to get the scores of all instances in the training set using the cross_val_predict() function again, but this time specifying that you want it to return decision scores instead of predictions
Now with these scores you can compute precision and recall for all possible thresh‐ olds using the precision_recall_curve() function
Finally, you can plot precision and recall as functions of the threshold value using Matplotlib