8 - Naïve Bayes Classification Flashcards

1
Q

What is the basis of Naïve Bayes classification methods?

A

Bayes Theorem

Developed by Reverend Thomas Bayes, it updates knowledge about data parameters by combining prior knowledge with new information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the prior distribution represent in Bayes Theorem?

A

Previous knowledge about the data parameters

It is denoted as p(Y = y*).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the posterior distribution in the context of Bayes Theorem?

A

Updated parameter knowledge after observing data

Denoted as p(Y = y* | X*).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a dataset with predictors X and response variable Y, how many class values can Y take in the given example?

A

Three possible class values: y1, y2, and y3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the objective of using Bayes Theorem in classification?

A

Identify the most likely class for a combination of predictor variable values

Specifically, find which of y1, y2, or y3 is most likely for the combination X*.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does p(Y = y* | X*) represent?

A

The likelihood of class value y* given observed predictor values X*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you classify a record using the maximum a posteriori hypothesis?

A

Classify as the value of Y with the highest posterior probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the class conditional independence assumption?

A

It allows writing p(X* | Y = y*) as the product of independent events

For example, p(X* | Y = y) = p(X1 | Y = y) × p(X2 | Y = y*).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two predictor variables used in the wine classification example?

A

Alcohol content and sugar content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the prior probability of a wine being Red if there are 500 red wines out of 1000 total?

A

0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the marginal probability of Alcohol_flag being High in the wine dataset?

A

0.486

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the marginal probability of Sugar_flag being Low in the wine dataset?

A

0.584

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is p(Alcohol_flag High | Type Red)?

A

0.436

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the conditional probability p(Sugar_flag Low | Type White)?

A

0.4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the Naïve Bayes algorithm classify a wine with low alcohol and low sugar content?

A

It classifies it as Red based on higher posterior probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the posterior probability of a low alcohol, low sugar wine being Red?

A

72.15%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the prior probability of a wine being White?

A

0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the probability of a low alcohol, low sugar wine being White?

A

30.92%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What happens to the classification when comparing prior and posterior probabilities?

A

Posterior probabilities can significantly change based on new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the posterior probability of a wine being Red given high alcohol and high sugar content?

A

25.02%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Fill in the blank: The denominator in Bayes Theorem, p(X*), is known as the _______.

A

Marginal probability of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the posterior probability of a wine being red given high alcohol and high sugar?

A

25.02%

This is calculated using the Naïve Bayes algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the posterior probability of a wine being white given high alcohol and high sugar?

A

79.53%

This indicates the Naïve Bayes algorithm classifies the wine as white.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the Naïve Bayes classification for low alcohol and high sugar wine?

A

White

This classification is based on the alcohol and sugar content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the Naïve Bayes classification for high alcohol and low sugar wine?

A

Red

This classification is based on the alcohol and sugar content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the accuracy of the Naïve Bayes model when predicting wine types?

A

65.93%

This is calculated from the model’s predictions on a test data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the accuracy of the Naïve Bayes model for classifying red wines?

A

79.32%

This is the proportion of correctly classified red wines from the test data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the accuracy of the Naïve Bayes model for classifying white wines?

A

61.48%

This is the proportion of correctly classified white wines from the test data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the baseline accuracy for the wine types if half are red and half are white?

A

50%

This serves as a comparison to evaluate the performance of the Naïve Bayes model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the predictor variables used in the Naïve Bayes classification model?

A
  • Alcohol_flag
  • Sugar_flag

These variables are used to classify the type of wine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What Python library is used to implement the Naïve Bayes algorithm?

A

sklearn

Specifically, the MultinomialNB class is used for the implementation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is the first step in using the Naïve Bayes algorithm in Python?

A

Import required libraries

Including pandas, numpy, and sklearn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Fill in the blank: The contingency table helps to obtain the _______ needed for Naïve Bayes calculations.

A

marginal and conditional probabilities

These probabilities are essential for the calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What R package is used for Naïve Bayes classification?

A

e1071

This package contains the naiveBayes function for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What command in R is used to run the Naïve Bayes estimator?

A

naiveBayes()

This function builds the model using the specified formula and data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What do A-priori probabilities represent in the Naïve Bayes model?

A

Values of p(Y)

These probabilities indicate the likelihood of each class before any evidence is considered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What do conditional probabilities represent in the Naïve Bayes model?

A

Values of p(Y | X)

These probabilities indicate the likelihood of each class given the predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the purpose of the predict() command in R for Naïve Bayes?

A

To classify each record in the test data set

This generates predictions based on the trained model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Fill in the blank: The contingency table of actual versus predicted wine types in R is created using the _______ command.

A

table()

This command creates a cross-tabulation of the actual and predicted values.

40
Q

What information does Bayes Theorem update about the data parameters?

A

Bayes Theorem updates our previous knowledge about the data parameters based on new evidence.

41
Q

What does the prior probability represent?

A

The prior probability represents our initial belief about the probability of a parameter before observing any evidence.

42
Q

What formula represents how the data behave within the target variable’s class values?

A

The formula representing how the data behave within the target variable’s class values is p(X | Y).

43
Q

What formula represents how the data behave without reference to the class values?

A

The formula is p(X).

44
Q

What is the formula from the previous exercise called?

A

The formula is called the likelihood.

45
Q

What does the posterior probability represent?

A

The posterior probability represents the updated probability of a hypothesis after considering new evidence.

46
Q

What do we use for a prior probability if we have no prior knowledge about the parameters?

A

We use a uniform distribution for the prior probability.

47
Q

How does the maximum a posteriori hypothesis help us to classify a record?

A

It helps classify a record by selecting the class that maximizes the posterior probability.

48
Q

What is the class conditional independence assumption?

A

The assumption that the features are independent given the class label.

49
Q

How do we write p(X* ∣ Y = y) if we have two predictor variables X = {X1 = x1, X2 = x2}?

A

We write it as p(X1 = x1, X2 = x2 | Y = y*).

50
Q

Create two contingency tables for which variables?

A

One with Type and Alcohol_flag, and another with Type and Sugar_flag.

51
Q

What is the prior probability of Type = Red and Type = White?

A

Calculated from the contingency tables.

52
Q

What can we calculate regarding alcohol content from the contingency tables?

A

The probability of high and low alcohol content.

53
Q

What can we calculate regarding sugar content from the contingency tables?

A

The probability of high and low sugar content.

54
Q

What are the conditional probabilities for Alcohol_flag given Type = Red?

A

p(Alcohol_flag = High | Type = Red) and p(Alcohol_flag = Low | Type = Red).

55
Q

What are the conditional probabilities for Alcohol_flag given Type = White?

A

p(Alcohol_flag = High | Type = White) and p(Alcohol_flag = Low | Type = White).

56
Q

What are the conditional probabilities for Sugar_flag given Type = Red?

A

p(Sugar_flag = High | Type = Red) and p(Sugar_flag = Low | Type = Red).

57
Q

What are the conditional probabilities for Sugar_flag given Type = White?

A

p(Sugar_flag = High | Type = White) and p(Sugar_flag = Low | Type = White).

58
Q

How likely is it that a randomly selected wine is red?

A

Discussed based on prior probabilities.

59
Q

How likely is it that a randomly selected wine has high alcohol content?

A

Discussed based on prior probabilities.

60
Q

How likely is it that a randomly selected wine has low sugar content?

A

Discussed based on prior probabilities.

61
Q

What might a typical white wine have as its alcohol and sugar content?

A

Discussed based on conditional probabilities.

62
Q

What might a typical red wine have as its alcohol and sugar content?

A

Discussed based on conditional probabilities.

63
Q

What do side-by-side bar graphs for Type compare?

A

They compare Alcohol_flag and Sugar_flag.

64
Q

What is the posterior probability of Type = Red for a wine that is low in alcohol and high in sugar?

A

Calculated based on the relevant probabilities.

65
Q

What is the posterior probability of Type = White for the same wine?

A

Calculated based on the relevant probabilities.

66
Q

Which type is more probable for a wine with low alcohol and high sugar content?

A

Determined from posterior probabilities.

67
Q

What is the posterior probability of Type = Red for a wine that is high in alcohol and low in sugar?

A

Calculated based on the relevant probabilities.

68
Q

Which type is more probable for a wine with high alcohol and low sugar content?

A

Determined from posterior probabilities.

69
Q

What does the Naïve Bayes classifier classify wines based on?

A

Alcohol and sugar content.

70
Q

How do we evaluate the Naïve Bayes model on the wines_test data set?

A

Display results in a contingency table.

71
Q

What values do we find for the Naïve Bayes model in the contingency table?

A

Accuracy and error rate.

72
Q

How often does the Naïve Bayes model correctly classify red wines?

A

Determined from the contingency table.

73
Q

How often does the Naïve Bayes model correctly classify white wines?

A

Determined from the contingency table.

74
Q

What should be done with the variables Death, Sex, and Educ?

A

Convert all variables to factors.

75
Q

What two contingency tables should be created for the framingham_nb data sets?

A

One with Death and Sex, and another with Death and Educ.

76
Q

What is the probability a randomly selected person is alive or dead?

A

Calculated from the contingency tables.

77
Q

What is the probability a randomly selected person is male?

A

Calculated from the contingency tables.

78
Q

What is the probability a randomly selected person has an Educ value of 3?

A

Calculated from the contingency tables.

79
Q

What are the probabilities that a dead person is male with education level 1?

A

Calculated from the contingency tables.

80
Q

What are the probabilities that a living person is male with education level 1?

A

Calculated from the contingency tables.

81
Q

What are the probabilities that a living person is female with education level 2?

A

Calculated from the contingency tables.

82
Q

What are the probabilities that a dead person is female with education level 2?

A

Calculated from the contingency tables.

83
Q

What do side-by-side bar graphs for Death compare?

A

One with an overlay of Sex and the other with an overlay of Educ.

84
Q

If we know a person is dead, are they more likely to be male or female?

A

Determined from the bar graphs.

85
Q

If we know a person is alive, are they more likely to be male or female?

A

Determined from the bar graphs.

86
Q

If we know a person is dead, what education level are they most likely to have?

A

Determined from the bar graphs.

87
Q

If we know a person is alive, what education level are they most likely to have?

A

Determined from the bar graphs.

88
Q

Which education levels are more prevalent for dead persons?

A

Determined from the graphs.

89
Q

Which education levels are more prevalent for living persons?

A

Determined from the graphs.

90
Q

What is the posterior probability of Death = 0 for a male with education level 1?

A

Calculated based on the relevant probabilities.

91
Q

What is the posterior probability of Death = 1 for a male with education level 1?

A

Calculated based on the relevant probabilities.

92
Q

What is the posterior probability of Death = 0 for a female with education level 2?

A

Calculated based on the relevant probabilities.

93
Q

What is the posterior probability of Death = 1 for a female with education level 2?

A

Calculated based on the relevant probabilities.

94
Q

What does the Naïve Bayes classifier classify persons based on?

A

Sex and education.

95
Q

How do we evaluate the Naïve Bayes model on the framingham_nb_test data set?

A

Display results in a contingency table.

96
Q

How often does the Naïve Bayes model correctly classify dead persons?

A

Determined from the contingency table.

97
Q

How often does the Naïve Bayes model correctly classify living persons?

A

Determined from the contingency table.