Naive Bayes Model Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is an assumption of Naive Bayes model?

A
  • independence among predictors.

The effect of the value of a predictor variable on a given class is not affected by the values of other predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain Bayes’ theorem

A

calculating the posterior probability, which is the likelihood of an event occurring after taking into consideration new information. The weather data set will help you build a model to decide whether to go outside and play soccer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Naive Bayes

A

calculates posterior probabilities and makes predictions based on which outcome has the highest probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of machine learning model is Naive Bayes model?

A

supervised learning - classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bayes’ Theorem equation

A

equation - find the probability of an event, A, given that another event B is true.

P(A|B) = P(B|A) * P(A) / P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

P(A)

A

P(A) probability of the outcome overall. the prior probability of event A before any evidence (feature) is seen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

P(B|A)

A

conditional probability. the probability of B, given A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

P(B)

A

probability of the value of the predictor variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In probability, what does “A” represent?

A

class label: one of the possible outcomes or categories within a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In probability, what does “B” represent?

A

predictor value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does P(A|B) stand for?

A
  • the posterior probability
  • the probability of the class label (A) after the evidence (B, feature) has been seen.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In context of probability, what is conditional independence?

A

Variables B and C are independent of one another on the condition that a third variable, A,assumption that each predictor variable (different Bs in the formula) is independent from the others, conditional on the class (A).

conditional independence is about how variables (Bs) interact with each other when you take into account the influence of a third variable (A).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What’s the conditional independence equation?

A

P(B|C, A) = P(B|A)

the probability of B, given C and A, is equal to the probability of B, given A.

Or given A, introducing C does not change the probability of B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Naive Bayes assumption (in reality)

A

the predictor variables (B and C) are assumed to be conditionally independent of each other, given the target variable (A).

very often is not actually true.

However, Naive Bayes models still often perform well in spite of the data violating the assumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Naive Bayes assumption on predictor variables

A

the individual predictor variables (Bs, Cs) are assumed to contribute equally to the model’s prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name 2 Naive Bayes assumption

A
  1. predictor variables are independent of each other
  2. predictor variables have equal contribution to model’s prediction
17
Q

Name 3 Advantages of Naive Bayes

A
  1. simplest classification algorithm
  2. faster training time
  3. highly scalable
18
Q

Naive Bayes Use Cases

A
  1. document analysis/classification
  2. spam filtering
19
Q

Name 2 Disadvantages of Naive Bayes

A
  1. Few datasets have truly conditionally independent
  2. “zero frequency” problem: when a category or event has not been observed in the training data. This leads to a probability of zero for that category
20
Q

Zero Frequency problem

A

dataset you’re using has no occurrences of a class label and some value of a predictor variable together. This would mean that there is a probability of zero. Since the final posterior probability is found by multiplying all of the individual probabilities together, the probability of zero would automatically make the result zero.

21
Q

What type of variables are BernoulliNB used for?

A

Used for binary/Boolean features

22
Q

What type of dataset is CategoricalNB used for? And name a use case?

A
  • Handles categorical features.
  • Uses a multinomial distribution to model the probability of each feature value given a class.
    *Specifically designed for categorical data.
23
Q

What is ComplementNB used for?

A
  • Primarily designed to handle imbalanced datasets.
  • Calculates the probability of a feature not belonging to a class to improve performance on imbalanced data.
  • Typically works with multinomial data (like text)
24
Q

What is GaussianNB used for?

A

Used for continuous features, normally distributed features

25
Q

What is MultinomialNB used for and what are some use cases?

A

Used for multinomial (discrete) features

Suitable for: Text classification, document categorization, spam filtering, and other tasks involving count data.

26
Q

What does setting stratify=y mean? When should it be done?

A

If our master data has a class split of 80/20, stratifying ensures that this proportion is maintained in both the training and test data.

=ytells the function that it should use the class ratio found in theyvariable (our target).

27
Q

When to stratify=y and what are the consequences of not doing it?

A
  • greater your class imbalance, the more important it is to stratify when you split the data.
  • If we didn’t stratify, then the function would split the data randomly, and we could get an unlucky split that doesn’t get any of the minority class in the test data, which means we get an ineffective model evaluation.