L3 - Naïve Bayes Flashcards

1
Q

What are the Naïve bayes learning outcomes?

A

Describe the basic principles of Bayes Theorem

Apply specialised methods and data structures needed to analyse text data in R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Bayes theorem?

A

A statistical principle for combining prior knowledge of the classes with new evidence gathered from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the formula for Bayes theorem?

A

P(A│B)=(P(A”∩” B))/(P(B)) = (P(B│A)∗P(A))/(P(B))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to test conditional probability for independence?

A

P(A│B) = P (A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Formula for P(A|B) and P(B|A)? Rewrite this as the intersection set?

A

P(A∩B)= P(A|B) * P(B)

P(B∩A)= P(B|A) * P(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give the proof for (1)

A
(1) P(A│B)=(P(A"∩" B))/(P(B))
 Similarly: 
 (2) P(B│A)=(P(B"∩" A))/(P(A))
 So we have:
 (3)P(A∩B)= P(A|B) * P(B) 
 (4)P(B∩A)= P(B|A) * P(A) 
 and we know: 
 (5) P(A∩B) = P(B∩A)
 So by re-writing (1) we have:
 (6)  =
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the components of bayes theorem?

A

Posterior probability - P(A|B)
Likelihood - P(B|A)
A priori probability - P(A)
Marginal likelihood - P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is it called Naïve Bayes?

A

IT makes strong assumption that it has independence among features
The features do not affect one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the steps involved for: When we observe a msg contains ‘Viagra’ and ‘Unsubscribe’ but not either ‘Money’ or ‘Groceries’, what is the probability that this msg is spam?

A

Assumption - naïve bayes assumes independence amongst features (e.g. P(A|B) = P(A))
Just focus on numerator to start with
(1) For independent events we have : P(A ∩B) = P(A)P(B),
(2) Rewrite P(B|A) in numerator = P(W1|Spam)
P(W2not|Spam)…*P(Spam)
Search inside the spam class for each feature (e.g. spam or not)
(3) Calculate ham using the same formula but change the class
(4) Calculate Spam/Ham
(5) Denominator (normalisation/scaling)- Probability of features being spam /Probability of said features being ham and spam

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is normalisation/scaling for naïve classifiers?

A

The probability of features being the variable of interest/the probability of features being the variable of interest
P(A|features) / P(B|features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the naïve bayes classification algorithm? What are the components?

A
PCL(|F1…Fn) = 1/Z
 CL = class label 
 F1..Fn = n features 
 1/z = scaling factor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can be we find the probability of P(B|features) when we have calculated P(A|features)?

A
Use the same equation for P(A|features) 
 But replace with the information in the likelihood table for the alterative class label
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the classification work for naïve bayes algorithm?

A

Training - calculate likelihood tables
Testing - given new unseen data
(1) - finds its probability of it belonging to each class using the likelihood tables
(2) Picks the probable class (e.g. which class is more likely after normalisation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does Naïve Bayes classify?

A

Picks the most probable class given the features observed (e.g. which is more likely after normalisation)

(1) Calculate the posterior probability for each class.
(2) The class with the highest posterior probability is the outcome of prediction.
* ** you calculate both classes and pick the one with probability that is higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between classifier and normalisation?

A

Normalisation - takes the probability of the classes and converts them into percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is KNN lazy?

A

It just compares similar features when classifying

Naïve bayes learns from the inputted likelihood table

17
Q

Name one problem with naïve bayes?

A

If one of the features is a 0 (e.g. 0/20) then this will deem the whole equation as 0
0 features overrides all other features

18
Q

How to overcome the naïve bayes problem?

A

Use Laplace estimator/smoothing

Non-zero probability - adds a small number to each of the features

19
Q

Explain how to overcome the problem in more detail?

A

Numerator - add 1
Denominator - balance by changing the denominator by the same amount (doesn’t have to be 1)
Do not change prior knowledge (e.g. a priori probabilities A or B)

20
Q

How to use naïve bayes with continuous features?

A

Discretisation (binning)

Categorises the numeric data into different bins

21
Q

How to set bins?

A
Prior knowledge (e.g. spam more likely in the day time so set parameters for day and night)
 Simply use quantiles
22
Q

Name some strengths for Naïve bayes? Explain.

A

(1) Robust to irrelevant features (some algorithms affected by unusual features)
For example you might include eye colour as a feature to classify gender, eye colour is actually completely irrelevant to gender but when you use probability as a classifier this bears out in the evidence for example male and females will be have nearly identical blue eyes
(2) Robust to missing data - some classifiers discard whole observation with missing data naïve will just drop the one missing feature (e.g. will reduce the sample size by 1 if only missing 1 missing attribute)

23
Q

Name and explain some weaknesses of naïve bayes?

A

(1) strong assumption that all features are independent - faulty assumptions which does not exist in the real world
Not important to obtain precise probabilities so long as predictions are accurate
(2) strong prediction but weak probability estimates - some other algorithms are better at providing probability estimates because of the independence assumption

24
Q

What are Bayesian methods? What is Naïve?

A

Bayes classifiers are classification methods based on Bayes’ theorem
Naïve Bayes classifier is the simplest one among them, which assumes features are independent