Topic 2: Classification Flashcards
What does the Naïve Bayes classifier assume about input features?
It assumes that input features are independent given the class label.
What is the purpose of the m-estimate in Naïve Bayes classifiers?
To estimate posterior probabilities, accounting for prior probability and confidence in data.
How is the ROC curve used in binary classification?
It plots the true positive rate versus the false positive rate as the discrimination threshold is varied.
What is the difference between regression and classification problems?
Regression deals with continuous output values, while classification deals with discrete classes.
Name a simple probabilistic classifier studied in classification tasks.
Naïve Bayes classifier.
What is the purpose of splitting a dataset into training, validation, and test sets?
To train, select, and evaluate the performance of a model.
What does the area under the ROC curve (AUC) indicate?
The trade-off between true positive and false positive rates; a higher AUC indicates better classifier performance.
What is classification?
A supervised learning task where the goal is to assign labels to data points based on input features.
What is the difference between binary and multiclass classification?
Binary classification involves two possible outcomes (e.g., spam vs. not spam).
Multiclass classification involves three or more possible outcomes (e.g., dog, cat, rabbit).
Name three common algorithms for classification.
Decision trees, logistic regression, and support vector machines (SVM).
What is a classifier?
A model that has been trained on labeled data to predict labels for new data points.
What is classification?
A supervised learning task where the goal is to assign labels to data points based on input features.
What is the difference between binary and multiclass classification?
Binary classification involves two possible outcomes (e.g., spam vs. not spam).
Multiclass classification involves three or more possible outcomes (e.g., dog, cat, rabbit).
Name three common algorithms for classification.
Decision trees, logistic regression, and support vector machines (SVM).
What is a classifier?
A model that has been trained on labeled data to predict labels for new data points.
What is accuracy?
The ratio of correctly predicted observations to the total observations: TP+TN / TP+TN+FP+FN
Why might accuracy not be enough in some cases?
It can be misleading when dealing with imbalanced datasets where one class dominates.
What is precision?
The ratio of true positives to all predicted positives:
TP / TP+FP
What is recall?
TP / TP+FN
What is fallout?
FP / FP+TN
What is F-measure?
Also called F1 score. A harmonic mean of precision and recall, useful for evaluating models when there’s an imbalance between classes:
2PrecisionRecall/Precision+Recall
What is a confusion matrix?
A table that shows the performance of a classification algorithm by comparing actual vs. predicted classes.
What is a support vector machine (SVM)?
A model that finds the hyperplane that best separates data points from different classes.
What is Naive Bayes classification?
A probabilistic classifier based on Bayes’ theorem, assuming features are conditionally independent.
How can you handle class imbalance?
Techniques include oversampling the minority class, undersampling the majority class, or using weighted loss functions.