Topic 2: Classification Flashcards

Question 1

Q

What does the Naïve Bayes classifier assume about input features?

Answer

A

It assumes that input features are independent given the class label.

Question 2

Q

What is the purpose of the m-estimate in Naïve Bayes classifiers?

Answer

A

To estimate posterior probabilities, accounting for prior probability and confidence in data.

Question 3

Q

How is the ROC curve used in binary classification?

Answer

A

It plots the true positive rate versus the false positive rate as the discrimination threshold is varied.

Question 4

Q

What is the difference between regression and classification problems?

Answer

A

Regression deals with continuous output values, while classification deals with discrete classes.

Question 5

Q

Name a simple probabilistic classifier studied in classification tasks.

Answer

A

Naïve Bayes classifier.

Question 6

Q

What is the purpose of splitting a dataset into training, validation, and test sets?

Answer

A

To train, select, and evaluate the performance of a model.

Question 7

Q

What does the area under the ROC curve (AUC) indicate?

Answer

A

The trade-off between true positive and false positive rates; a higher AUC indicates better classifier performance.

Question 8

Q

What is classification?

Answer

A

A supervised learning task where the goal is to assign labels to data points based on input features.

Question 9

Q

What is the difference between binary and multiclass classification?

Answer

A

Binary classification involves two possible outcomes (e.g., spam vs. not spam).
Multiclass classification involves three or more possible outcomes (e.g., dog, cat, rabbit).

Question 10

Q

Name three common algorithms for classification.

Answer

A

Decision trees, logistic regression, and support vector machines (SVM).

Question 11

Q

What is a classifier?

Answer

A

A model that has been trained on labeled data to predict labels for new data points.

Question 12

Q

Question 13

Q

What is classification?

Answer

A

A supervised learning task where the goal is to assign labels to data points based on input features.

Question 14

Q

What is the difference between binary and multiclass classification?

Answer

A

Binary classification involves two possible outcomes (e.g., spam vs. not spam).
Multiclass classification involves three or more possible outcomes (e.g., dog, cat, rabbit).

Question 15

Q

Name three common algorithms for classification.

Answer

A

Decision trees, logistic regression, and support vector machines (SVM).

Question 16

Q

What is a classifier?

Answer

A

A model that has been trained on labeled data to predict labels for new data points.

Question 17

Q

What is accuracy?

Answer

A

The ratio of correctly predicted observations to the total observations: TP+TN / TP+TN+FP+FN

Question 18

Q

Why might accuracy not be enough in some cases?

Answer

A

It can be misleading when dealing with imbalanced datasets where one class dominates.

Question 19

Q

What is precision?

Answer

A

The ratio of true positives to all predicted positives:
TP / TP+FP

Question 20

Q

What is recall?

Answer

A

TP / TP+FN

Question 21

Q

What is fallout?

Answer

A

FP / FP+TN

Question 22

Q

What is F-measure?

Answer

A

Also called F1 score. A harmonic mean of precision and recall, useful for evaluating models when there’s an imbalance between classes:
2PrecisionRecall/Precision+Recall

Question 23

Q

What is a confusion matrix?

Answer

A

A table that shows the performance of a classification algorithm by comparing actual vs. predicted classes.

Question 24

Q

What is a support vector machine (SVM)?

Answer

A

A model that finds the hyperplane that best separates data points from different classes.

Question 25

Q

What is Naive Bayes classification?

Answer

A

A probabilistic classifier based on Bayes’ theorem, assuming features are conditionally independent.

Question 26

Q

How can you handle class imbalance?

Answer

A

Techniques include oversampling the minority class, undersampling the majority class, or using weighted loss functions.